Subtitle improvements, #build

Changes:
- merges subtitle suport for JS video player
- merges hint what to do when no videos found
- merges better indexing and error handeling of subtitles
This commit is contained in:
simon 2022-02-12 19:11:47 +07:00
commit 428cc315e4
No known key found for this signature in database
GPG Key ID: 2C15AA5E89985DD4
14 changed files with 115 additions and 14 deletions

View File

@ -9,6 +9,7 @@ ENV PYTHONUNBUFFERED 1
RUN apt-get clean && apt-get -y update && apt-get -y install --no-install-recommends \
build-essential \
nginx \
atomicparsley \
curl && rm -rf /var/lib/apt/lists/*
# get newest patched ffmpeg and ffprobe builds for amd64 fall back to repo ffmpeg for arm64

31
docs/FAQ.md Normal file
View File

@ -0,0 +1,31 @@
# Frequently Asked Questions
## 1. Scope of this project
Tube Archivist is *Your self hosted YouTube media server*, which also defines the primary scope of what this project tries to do:
- **Self hosted**: This assumes you have full control over the underlying operating system and hardware and can configure things to work properly with Docker, it's volumes and networks as well as whatever disk storage and filesystem you choose to use.
- **YouTube**: Downloading, indexing and playing videos from YouTube, there are currently no plans to expand this to any additional platforms.
- **Media server**: This project tries to be a stand alone media server in it's own web interface.
Additionally to that, progress is also happening on:
- **API**: Endpoints for additional integrations.
- **Browser Extension**: To integrate between youtube.com and Tube Archivist.
Defining the scope is important for the success of any project:
- A scope too broad will result in development effort spreading too thin and will run into danger that his project tries to do too many things and none of them well.
- A too narrow scope will make this project uninteresting and will exclude audiences that could also benefit from this project.
- Not defining a scope will easily lead to misunderstandings and false hopes of where this project tries to go.
Of course this is subject to change, as this project continues to grow and more people contribute.
## 2. Emby/Plex/Jellyfin/Kodi integrations
Although there are similarities between these excellent projects and Tube Archivist, they have a very different use case. Trying to fit the metadata relations and database structure of a YouTube archival project into these media servers that specialize in Movies and TV shows is always going to be limiting.
Part of the scope is to be its own media server, so that's where the focus and effort of this project is. That being said, the nature of self hosted and open source software gives you all the possible freedom to use your media as you wish.
## 3. To Docker or not to Docker
This project is a classical docker application: There are multiple moving parts that need to be able to interact with each other and need to be compatible with multiple architectures and operating systems. Additionally Docker also drastically reduces development complexity which is highly appreciated.
So Docker is the only supported installation method. If you don't have any experience with Docker, consider investing the time to learn this very useful technology.
## 4. Finetuning Elasticsearch
A minimal configuration of Elasticsearch (ES) is provided in the example docker-compose.yml file. ES is highly configurable and very interesting to learn more about. Refer to the [documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html) if you want to get into it.

View File

@ -2,6 +2,7 @@
Welcome to the official Tube Archivist Wiki. This is an up-to-date documentation of user functionality.
Table of contents:
* [FAQ](FAQ): Frequently asked questions what this project is and tries to do
* [Channels](Channels): Browse your channels, handle channel subscriptions
* [Playlists](Playlists): Browse your indexed playlists, handle playlist subscriptions
* [Downloads](Downloads): Scanning subscriptions, handle download queue

View File

@ -27,9 +27,16 @@ Additional settings passed to yt-dlp.
- **Embed Metadata**: This saves the available tags directly into the media file by passing `--embed-metadata` to yt-dlp.
- **Embed Thumbnail**: This will save the thumbnail into the media file by passing `--embed-thumbnail` to yt-dlp.
## Subtitles
- **Download Setting**: Select the subtitle language you like to download. Add a comma separated list for multiple languages.
- **Source Settings**: User created subtitles are provided from the uploader and are usually the video script. Auto generated is from YouTube, quality varies, particularly for auto translated tracks.
- **Index Settings**: Enabling subtitle indexing will add the lines to Elasticsearch and will make subtitles searchable. This will increase the index size and is not recommended on low-end hardware.
## Integrations
All third party integrations of TubeArchivist will **always** be *opt in*.
- **returnyoutubedislike.com**: This will get dislikes and average ratings for each video back by integarting with the API from [returnyoutubedislike.com](https://www.returnyoutubedislike.com/).
- **API**: Your access token for the Tube Archivist API.
- **returnyoutubedislike.com**: This will get return dislikes and average ratings for each video by integrating with the API from [returnyoutubedislike.com](https://www.returnyoutubedislike.com/).
- **Cast**: Enable Google Cast for videos. Requires a valid SSL certificate and works only in Google Chrome.
# Scheduler Setup
Schedule settings expect a cron like format, where the first value is minute, second is hour and third is day of the week. Day 0 is Sunday, day 1 is Monday etc.
@ -69,7 +76,7 @@ Create a zip file of the metadata and select **Max auto backups to keep** to aut
Additional database functionality.
## Manual Media Files Import
So far this depends on the video you are trying to import to be still available on YouTube to get the metadata. Add the files you like to import to the */cache/import* folder. Then start the process from the settings page *Manual Media Files Import*. Make sure to follow one of the two methods below.
So far this depends on the video you are trying to import to be still available on YouTube to get the metadata. Add the files you'd like to import to the */cache/import* folder. Then start the process from the settings page *Manual Media Files Import*. Make sure to follow one of the two methods below.
### Method 1:
Add a matching *.json* file with the media file. Both files need to have the same base name, for example:
@ -86,6 +93,7 @@ Detect the YouTube ID from filename, this accepts the default yt-dlp naming conv
### Some notes:
- This will **consume** the files you put into the import folder: Files will get converted to mp4 if needed (this might take a long time...) and moved to the archive, *.json* files will get deleted upon completion to avoid having duplicates on the next run.
- There should be no subdirectories added to */cache/import*, only video files. If your existing video library has video files inside subdirectories, you can get all the files into one directory by running `find ./ -mindepth 2 -type f -exec mv '{}' . \;` from the top-level directory of your existing video library. You can also delete any remaining empty subdirectories with `find ./ -mindepth 1 -type d -delete`.
- Maybe start with a subset of your files to import to make sure everything goes well...
- Follow the logs to monitor progress and errors: `docker-compose logs -f tubearchivist`.

View File

@ -25,6 +25,7 @@
"add_thumbnail": false,
"subtitle": false,
"subtitle_source": false,
"subtitle_index": false,
"throttledratelimit": false,
"integrate_ryd": false
},

View File

@ -70,8 +70,14 @@ class ApplicationSettingsForm(forms.Form):
SUBTITLE_SOURCE_CHOICES = [
("", "-- change subtitle source settings"),
("user", "only download user created"),
("auto", "also download auto generated"),
("user", "only download uploader"),
]
SUBTITLE_INDEX_CHOICES = [
("", "-- change subtitle index settings --"),
("0", "disable subtitle index"),
("1", "enable subtitle index"),
]
subscriptions_channel_size = forms.IntegerField(required=False)
@ -91,6 +97,9 @@ class ApplicationSettingsForm(forms.Form):
downloads_subtitle_source = forms.ChoiceField(
widget=forms.Select, choices=SUBTITLE_SOURCE_CHOICES, required=False
)
downloads_subtitle_index = forms.ChoiceField(
widget=forms.Select, choices=SUBTITLE_INDEX_CHOICES, required=False
)
downloads_integrate_ryd = forms.ChoiceField(
widget=forms.Select, choices=RYD_CHOICES, required=False
)

View File

@ -204,7 +204,9 @@ class Reindex:
video.build_json()
if not video.json_data:
video.deactivate()
return
video.delete_subtitles()
# add back
video.json_data["player"] = player
video.json_data["date_downloaded"] = date_downloaded
@ -218,6 +220,7 @@ class Reindex:
thumb_handler.delete_vid_thumb(youtube_id)
to_download = (youtube_id, video.json_data["vid_thumb_url"])
thumb_handler.download_vid([to_download], notify=False)
return
@staticmethod
def reindex_single_channel(channel_id):

View File

@ -27,6 +27,7 @@ class YoutubeSubtitle:
def sub_conf_parse(self):
"""add additional conf values to self"""
languages_raw = self.video.config["downloads"]["subtitle"]
if languages_raw:
self.languages = [i.strip() for i in languages_raw.split(",")]
def get_subtitles(self):
@ -61,6 +62,9 @@ class YoutubeSubtitle:
video_media_url = self.video.json_data["media_url"]
media_url = video_media_url.replace(".mp4", f"-{lang}.vtt")
all_formats = all_subtitles.get(lang)
if not all_formats:
return False
subtitle = [i for i in all_formats if i["ext"] == "vtt"][0]
subtitle.update(
{"lang": lang, "source": "auto", "media_url": media_url}
@ -120,6 +124,7 @@ class YoutubeSubtitle:
parser.process()
subtitle_str = parser.get_subtitle_str()
self._write_subtitle_file(dest_path, subtitle_str)
if self.video.config["downloads"]["subtitle_index"]:
query_str = parser.create_bulk_import(self.video, source)
self._index_subtitle(query_str)
@ -157,6 +162,7 @@ class SubtitleParser:
self._parse_cues()
self._match_text_lines()
self._add_id()
self._timestamp_check()
def _parse_cues(self):
"""split into cues"""
@ -179,7 +185,8 @@ class SubtitleParser:
clean = re.sub(self.stamp_reg, "", line)
clean = re.sub(self.tag_reg, "", clean)
cue_dict["lines"].append(clean)
if clean and clean not in self.all_text_lines:
if clean.strip() and clean not in self.all_text_lines[-4:]:
# remove immediate duplicates
self.all_text_lines.append(clean)
return cue_dict
@ -199,11 +206,25 @@ class SubtitleParser:
try:
self.all_text_lines.remove(line)
except ValueError:
print("failed to process:")
print(line)
continue
self.matched.append(new_cue)
def _timestamp_check(self):
"""check if end timestamp is bigger than start timestamp"""
for idx, cue in enumerate(self.matched):
# this
end = int(re.sub("[^0-9]", "", cue.get("end")))
# next
try:
next_cue = self.matched[idx + 1]
except IndexError:
continue
start_next = int(re.sub("[^0-9]", "", next_cue.get("start")))
if end > start_next:
self.matched[idx]["end"] = next_cue.get("start")
def _add_id(self):
"""add id to matched cues"""
for idx, _ in enumerate(self.matched):
@ -404,7 +425,7 @@ class YoutubeVideo(YouTubeItem, YoutubeSubtitle):
os.remove(file_path)
self.del_in_es()
self._delete_subtitles()
self.delete_subtitles()
def _get_ryd_stats(self):
"""get optional stats from returnyoutubedislikeapi.com"""
@ -434,7 +455,7 @@ class YoutubeVideo(YouTubeItem, YoutubeSubtitle):
self.json_data["subtitles"] = subtitles
handler.download_subtitles(relevant_subtitles=subtitles)
def _delete_subtitles(self):
def delete_subtitles(self):
"""delete indexed subtitles"""
data = {"query": {"term": {"youtube_id": {"value": self.youtube_id}}}}
_, _ = ElasticWrap("ta_subtitle/_delete_by_query").post(data=data)

View File

@ -133,6 +133,7 @@
{% endfor %}
{% else %}
<h2>No videos found...</h2>
<p>Try going to the <a href="{% url 'downloads' %}">downloads page</a> to start the scan and download tasks.</p>
{% endif %}
</div>
</div>

View File

@ -73,6 +73,7 @@
{% endfor %}
{% else %}
<h2>No videos found...</h2>
<p>If you've already added a channel or playlist, try going to the <a href="{% url 'downloads' %}">downloads page</a> to start the scan and download tasks.</p>
{% endif %}
</div>
</div>

View File

@ -114,6 +114,7 @@
{% endfor %}
{% else %}
<h2>No videos found...</h2>
<p>Try going to the <a href="{% url 'downloads' %}">downloads page</a> to start the scan and download tasks.</p>
{% endif %}
</div>
</div>

View File

@ -94,6 +94,9 @@
<i>Embed thumbnail into the mediafile.</i><br>
{{ app_form.downloads_add_thumbnail }}
</div>
</div>
<div class="settings-group">
<h2 id="format">Subtitles</h2>
<div class="settings-item">
<p>Subtitles download setting: <span class="settings-current">{{ config.downloads.subtitle }}</span><br>
<i>Choose which subtitles to download, add comma separated two letter language ISO code,<br>
@ -105,12 +108,20 @@
<i>Download only user generated, or also less accurate auto generated subtitles.</i><br>
{{ app_form.downloads_subtitle_source }}
</div>
<div class="settings-item">
<p>Index and make subtitles searchable: <span class="settings-current">{{ config.downloads.subtitle_index }}</span></p>
<i>Store subtitle lines in Elasticsearch. Not recommended for low-end hardware.</i><br>
{{ app_form.downloads_subtitle_index }}
</div>
</div>
<div class="settings-group">
<h2 id="integrations">Integrations</h2>
<div class="settings-item">
<p>API token:</p>
<p>API token: <button type="button" onclick="textReveal()" id="text-reveal-button">Show</button></p>
<div id="text-reveal" class="description-text">
<p>{{ api_token }}</p>
<button class="danger-button" type="button" onclick="resetToken()">Revoke</button>
</div>
</div>
<div class="settings-item">
<p>Integrate with <a href="https://returnyoutubedislike.com/">returnyoutubedislike.com</a> to get dislikes and average ratings back: <span class="settings-current">{{ config.downloads.integrate_ryd }}</span></p>

View File

@ -715,7 +715,6 @@ class SettingsView(View):
"""get existing or create new token of user"""
# pylint: disable=no-member
token = Token.objects.get_or_create(user=request.user)[0]
print(token)
return token
@staticmethod
@ -758,6 +757,11 @@ def process(request):
if request.method == "POST":
current_user = request.user.id
post_dict = json.loads(request.body.decode())
if post_dict.get("reset-token"):
print("revoke API token")
request.user.auth_token.delete()
return JsonResponse({"success": True})
post_handler = PostData(post_dict, current_user)
if post_handler.to_exec:
task_result = post_handler.run_task()

View File

@ -235,6 +235,14 @@ function findPlaylists(button) {
}, 500);
}
function resetToken() {
var payload = JSON.stringify({'reset-token': true});
sendPost(payload);
var message = document.createElement("p");
message.innerText = "Token revoked";
document.getElementById("text-reveal").replaceWith(message);
}
// delete from file system
function deleteConfirm() {
to_show = document.getElementById("delete-button");