diff --git a/Dockerfile b/Dockerfile index b831247..371134b 100644 --- a/Dockerfile +++ b/Dockerfile @@ -9,6 +9,7 @@ ENV PYTHONUNBUFFERED 1 RUN apt-get clean && apt-get -y update && apt-get -y install --no-install-recommends \ build-essential \ nginx \ + atomicparsley \ curl && rm -rf /var/lib/apt/lists/* # get newest patched ffmpeg and ffprobe builds for amd64 fall back to repo ffmpeg for arm64 diff --git a/docs/FAQ.md b/docs/FAQ.md new file mode 100644 index 0000000..3491939 --- /dev/null +++ b/docs/FAQ.md @@ -0,0 +1,31 @@ +# Frequently Asked Questions + +## 1. Scope of this project +Tube Archivist is *Your self hosted YouTube media server*, which also defines the primary scope of what this project tries to do: +- **Self hosted**: This assumes you have full control over the underlying operating system and hardware and can configure things to work properly with Docker, it's volumes and networks as well as whatever disk storage and filesystem you choose to use. +- **YouTube**: Downloading, indexing and playing videos from YouTube, there are currently no plans to expand this to any additional platforms. +- **Media server**: This project tries to be a stand alone media server in it's own web interface. + +Additionally to that, progress is also happening on: +- **API**: Endpoints for additional integrations. +- **Browser Extension**: To integrate between youtube.com and Tube Archivist. + +Defining the scope is important for the success of any project: +- A scope too broad will result in development effort spreading too thin and will run into danger that his project tries to do too many things and none of them well. +- A too narrow scope will make this project uninteresting and will exclude audiences that could also benefit from this project. +- Not defining a scope will easily lead to misunderstandings and false hopes of where this project tries to go. + +Of course this is subject to change, as this project continues to grow and more people contribute. + +## 2. Emby/Plex/Jellyfin/Kodi integrations +Although there are similarities between these excellent projects and Tube Archivist, they have a very different use case. Trying to fit the metadata relations and database structure of a YouTube archival project into these media servers that specialize in Movies and TV shows is always going to be limiting. + +Part of the scope is to be its own media server, so that's where the focus and effort of this project is. That being said, the nature of self hosted and open source software gives you all the possible freedom to use your media as you wish. + +## 3. To Docker or not to Docker +This project is a classical docker application: There are multiple moving parts that need to be able to interact with each other and need to be compatible with multiple architectures and operating systems. Additionally Docker also drastically reduces development complexity which is highly appreciated. + +So Docker is the only supported installation method. If you don't have any experience with Docker, consider investing the time to learn this very useful technology. + +## 4. Finetuning Elasticsearch +A minimal configuration of Elasticsearch (ES) is provided in the example docker-compose.yml file. ES is highly configurable and very interesting to learn more about. Refer to the [documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html) if you want to get into it. diff --git a/docs/Home.md b/docs/Home.md index 6a0ba77..fbe7f20 100644 --- a/docs/Home.md +++ b/docs/Home.md @@ -2,6 +2,7 @@ Welcome to the official Tube Archivist Wiki. This is an up-to-date documentation of user functionality. Table of contents: +* [FAQ](FAQ): Frequently asked questions what this project is and tries to do * [Channels](Channels): Browse your channels, handle channel subscriptions * [Playlists](Playlists): Browse your indexed playlists, handle playlist subscriptions * [Downloads](Downloads): Scanning subscriptions, handle download queue diff --git a/docs/Settings.md b/docs/Settings.md index 22d1484..c002b03 100644 --- a/docs/Settings.md +++ b/docs/Settings.md @@ -27,9 +27,16 @@ Additional settings passed to yt-dlp. - **Embed Metadata**: This saves the available tags directly into the media file by passing `--embed-metadata` to yt-dlp. - **Embed Thumbnail**: This will save the thumbnail into the media file by passing `--embed-thumbnail` to yt-dlp. +## Subtitles +- **Download Setting**: Select the subtitle language you like to download. Add a comma separated list for multiple languages. +- **Source Settings**: User created subtitles are provided from the uploader and are usually the video script. Auto generated is from YouTube, quality varies, particularly for auto translated tracks. +- **Index Settings**: Enabling subtitle indexing will add the lines to Elasticsearch and will make subtitles searchable. This will increase the index size and is not recommended on low-end hardware. + ## Integrations All third party integrations of TubeArchivist will **always** be *opt in*. -- **returnyoutubedislike.com**: This will get dislikes and average ratings for each video back by integarting with the API from [returnyoutubedislike.com](https://www.returnyoutubedislike.com/). +- **API**: Your access token for the Tube Archivist API. +- **returnyoutubedislike.com**: This will get return dislikes and average ratings for each video by integrating with the API from [returnyoutubedislike.com](https://www.returnyoutubedislike.com/). +- **Cast**: Enable Google Cast for videos. Requires a valid SSL certificate and works only in Google Chrome. # Scheduler Setup Schedule settings expect a cron like format, where the first value is minute, second is hour and third is day of the week. Day 0 is Sunday, day 1 is Monday etc. @@ -69,7 +76,7 @@ Create a zip file of the metadata and select **Max auto backups to keep** to aut Additional database functionality. ## Manual Media Files Import -So far this depends on the video you are trying to import to be still available on YouTube to get the metadata. Add the files you like to import to the */cache/import* folder. Then start the process from the settings page *Manual Media Files Import*. Make sure to follow one of the two methods below. +So far this depends on the video you are trying to import to be still available on YouTube to get the metadata. Add the files you'd like to import to the */cache/import* folder. Then start the process from the settings page *Manual Media Files Import*. Make sure to follow one of the two methods below. ### Method 1: Add a matching *.json* file with the media file. Both files need to have the same base name, for example: @@ -86,6 +93,7 @@ Detect the YouTube ID from filename, this accepts the default yt-dlp naming conv ### Some notes: - This will **consume** the files you put into the import folder: Files will get converted to mp4 if needed (this might take a long time...) and moved to the archive, *.json* files will get deleted upon completion to avoid having duplicates on the next run. +- There should be no subdirectories added to */cache/import*, only video files. If your existing video library has video files inside subdirectories, you can get all the files into one directory by running `find ./ -mindepth 2 -type f -exec mv '{}' . \;` from the top-level directory of your existing video library. You can also delete any remaining empty subdirectories with `find ./ -mindepth 1 -type d -delete`. - Maybe start with a subset of your files to import to make sure everything goes well... - Follow the logs to monitor progress and errors: `docker-compose logs -f tubearchivist`. diff --git a/tubearchivist/home/config.json b/tubearchivist/home/config.json index 8c4249a..1b6000e 100644 --- a/tubearchivist/home/config.json +++ b/tubearchivist/home/config.json @@ -25,6 +25,7 @@ "add_thumbnail": false, "subtitle": false, "subtitle_source": false, + "subtitle_index": false, "throttledratelimit": false, "integrate_ryd": false }, diff --git a/tubearchivist/home/src/frontend/forms.py b/tubearchivist/home/src/frontend/forms.py index 77648cf..3e1b353 100644 --- a/tubearchivist/home/src/frontend/forms.py +++ b/tubearchivist/home/src/frontend/forms.py @@ -70,8 +70,14 @@ class ApplicationSettingsForm(forms.Form): SUBTITLE_SOURCE_CHOICES = [ ("", "-- change subtitle source settings"), + ("user", "only download user created"), ("auto", "also download auto generated"), - ("user", "only download uploader"), + ] + + SUBTITLE_INDEX_CHOICES = [ + ("", "-- change subtitle index settings --"), + ("0", "disable subtitle index"), + ("1", "enable subtitle index"), ] subscriptions_channel_size = forms.IntegerField(required=False) @@ -91,6 +97,9 @@ class ApplicationSettingsForm(forms.Form): downloads_subtitle_source = forms.ChoiceField( widget=forms.Select, choices=SUBTITLE_SOURCE_CHOICES, required=False ) + downloads_subtitle_index = forms.ChoiceField( + widget=forms.Select, choices=SUBTITLE_INDEX_CHOICES, required=False + ) downloads_integrate_ryd = forms.ChoiceField( widget=forms.Select, choices=RYD_CHOICES, required=False ) diff --git a/tubearchivist/home/src/index/reindex.py b/tubearchivist/home/src/index/reindex.py index 0eea2e3..b694021 100644 --- a/tubearchivist/home/src/index/reindex.py +++ b/tubearchivist/home/src/index/reindex.py @@ -204,7 +204,9 @@ class Reindex: video.build_json() if not video.json_data: video.deactivate() + return + video.delete_subtitles() # add back video.json_data["player"] = player video.json_data["date_downloaded"] = date_downloaded @@ -218,6 +220,7 @@ class Reindex: thumb_handler.delete_vid_thumb(youtube_id) to_download = (youtube_id, video.json_data["vid_thumb_url"]) thumb_handler.download_vid([to_download], notify=False) + return @staticmethod def reindex_single_channel(channel_id): diff --git a/tubearchivist/home/src/index/video.py b/tubearchivist/home/src/index/video.py index e2695d4..bc6f272 100644 --- a/tubearchivist/home/src/index/video.py +++ b/tubearchivist/home/src/index/video.py @@ -27,7 +27,8 @@ class YoutubeSubtitle: def sub_conf_parse(self): """add additional conf values to self""" languages_raw = self.video.config["downloads"]["subtitle"] - self.languages = [i.strip() for i in languages_raw.split(",")] + if languages_raw: + self.languages = [i.strip() for i in languages_raw.split(",")] def get_subtitles(self): """check what to do""" @@ -61,6 +62,9 @@ class YoutubeSubtitle: video_media_url = self.video.json_data["media_url"] media_url = video_media_url.replace(".mp4", f"-{lang}.vtt") all_formats = all_subtitles.get(lang) + if not all_formats: + return False + subtitle = [i for i in all_formats if i["ext"] == "vtt"][0] subtitle.update( {"lang": lang, "source": "auto", "media_url": media_url} @@ -120,8 +124,9 @@ class YoutubeSubtitle: parser.process() subtitle_str = parser.get_subtitle_str() self._write_subtitle_file(dest_path, subtitle_str) - query_str = parser.create_bulk_import(self.video, source) - self._index_subtitle(query_str) + if self.video.config["downloads"]["subtitle_index"]: + query_str = parser.create_bulk_import(self.video, source) + self._index_subtitle(query_str) @staticmethod def _write_subtitle_file(dest_path, subtitle_str): @@ -157,6 +162,7 @@ class SubtitleParser: self._parse_cues() self._match_text_lines() self._add_id() + self._timestamp_check() def _parse_cues(self): """split into cues""" @@ -179,7 +185,8 @@ class SubtitleParser: clean = re.sub(self.stamp_reg, "", line) clean = re.sub(self.tag_reg, "", clean) cue_dict["lines"].append(clean) - if clean and clean not in self.all_text_lines: + if clean.strip() and clean not in self.all_text_lines[-4:]: + # remove immediate duplicates self.all_text_lines.append(clean) return cue_dict @@ -199,11 +206,25 @@ class SubtitleParser: try: self.all_text_lines.remove(line) except ValueError: - print("failed to process:") - print(line) + continue self.matched.append(new_cue) + def _timestamp_check(self): + """check if end timestamp is bigger than start timestamp""" + for idx, cue in enumerate(self.matched): + # this + end = int(re.sub("[^0-9]", "", cue.get("end"))) + # next + try: + next_cue = self.matched[idx + 1] + except IndexError: + continue + + start_next = int(re.sub("[^0-9]", "", next_cue.get("start"))) + if end > start_next: + self.matched[idx]["end"] = next_cue.get("start") + def _add_id(self): """add id to matched cues""" for idx, _ in enumerate(self.matched): @@ -404,7 +425,7 @@ class YoutubeVideo(YouTubeItem, YoutubeSubtitle): os.remove(file_path) self.del_in_es() - self._delete_subtitles() + self.delete_subtitles() def _get_ryd_stats(self): """get optional stats from returnyoutubedislikeapi.com""" @@ -434,7 +455,7 @@ class YoutubeVideo(YouTubeItem, YoutubeSubtitle): self.json_data["subtitles"] = subtitles handler.download_subtitles(relevant_subtitles=subtitles) - def _delete_subtitles(self): + def delete_subtitles(self): """delete indexed subtitles""" data = {"query": {"term": {"youtube_id": {"value": self.youtube_id}}}} _, _ = ElasticWrap("ta_subtitle/_delete_by_query").post(data=data) diff --git a/tubearchivist/home/templates/home/channel_id.html b/tubearchivist/home/templates/home/channel_id.html index 3f0522c..6a42c10 100644 --- a/tubearchivist/home/templates/home/channel_id.html +++ b/tubearchivist/home/templates/home/channel_id.html @@ -133,6 +133,7 @@ {% endfor %} {% else %}
Try going to the downloads page to start the scan and download tasks.
{% endif %} diff --git a/tubearchivist/home/templates/home/home.html b/tubearchivist/home/templates/home/home.html index 6433178..62d2e9a 100644 --- a/tubearchivist/home/templates/home/home.html +++ b/tubearchivist/home/templates/home/home.html @@ -73,6 +73,7 @@ {% endfor %} {% else %}If you've already added a channel or playlist, try going to the downloads page to start the scan and download tasks.
{% endif %} diff --git a/tubearchivist/home/templates/home/playlist_id.html b/tubearchivist/home/templates/home/playlist_id.html index a5cfb0b..41d8268 100644 --- a/tubearchivist/home/templates/home/playlist_id.html +++ b/tubearchivist/home/templates/home/playlist_id.html @@ -114,6 +114,7 @@ {% endfor %} {% else %}Try going to the downloads page to start the scan and download tasks.
{% endif %} diff --git a/tubearchivist/home/templates/home/settings.html b/tubearchivist/home/templates/home/settings.html index 3b9ffe9..740ca40 100644 --- a/tubearchivist/home/templates/home/settings.html +++ b/tubearchivist/home/templates/home/settings.html @@ -94,6 +94,9 @@ Embed thumbnail into the mediafile.Subtitles download setting: {{ config.downloads.subtitle }}
Choose which subtitles to download, add comma separated two letter language ISO code,
@@ -105,12 +108,20 @@
Download only user generated, or also less accurate auto generated subtitles.
{{ app_form.downloads_subtitle_source }}
Index and make subtitles searchable: {{ config.downloads.subtitle_index }}
+ Store subtitle lines in Elasticsearch. Not recommended for low-end hardware.API token:
-{{ api_token }}
+API token:
+{{ api_token }}
+ +Integrate with returnyoutubedislike.com to get dislikes and average ratings back: {{ config.downloads.integrate_ryd }}
diff --git a/tubearchivist/home/views.py b/tubearchivist/home/views.py index a0cac87..660ec73 100644 --- a/tubearchivist/home/views.py +++ b/tubearchivist/home/views.py @@ -715,7 +715,6 @@ class SettingsView(View): """get existing or create new token of user""" # pylint: disable=no-member token = Token.objects.get_or_create(user=request.user)[0] - print(token) return token @staticmethod @@ -758,6 +757,11 @@ def process(request): if request.method == "POST": current_user = request.user.id post_dict = json.loads(request.body.decode()) + if post_dict.get("reset-token"): + print("revoke API token") + request.user.auth_token.delete() + return JsonResponse({"success": True}) + post_handler = PostData(post_dict, current_user) if post_handler.to_exec: task_result = post_handler.run_task() diff --git a/tubearchivist/static/script.js b/tubearchivist/static/script.js index 82ed2e3..ec3aa40 100644 --- a/tubearchivist/static/script.js +++ b/tubearchivist/static/script.js @@ -235,6 +235,14 @@ function findPlaylists(button) { }, 500); } +function resetToken() { + var payload = JSON.stringify({'reset-token': true}); + sendPost(payload); + var message = document.createElement("p"); + message.innerText = "Token revoked"; + document.getElementById("text-reveal").replaceWith(message); +} + // delete from file system function deleteConfirm() { to_show = document.getElementById("delete-button");