diff --git a/README.md b/README.md index 4706b06..76da5dc 100644 --- a/README.md +++ b/README.md @@ -180,6 +180,7 @@ For some architectures it might be required to run Redis JSON on a nonstandard p ### Updating Tube Archivist You will see the current version number of **Tube Archivist** in the footer of the interface. There is a daily version check task querying tubearchivist.com, notifying you of any new releases in the footer. To take advantage of the latest fixes and improvements, make sure you are running the *latest and greatest*. +* This project is tested for updates between one or two releases maximum. Further updates back may or may not be supported and you might have to reset your index and configurations to update. Ideally apply new updates at least once per month. * There can be breaking changes between updates, particularly as the application grows, new environment variables or settings might be required for you to set in the your docker-compose file. *Always* check the **release notes**: Any breaking changes will be marked there. * All testing and development is done with the Elasticsearch version number as mentioned in the provided *docker-compose.yml* file. This will be updated when a new release of Elasticsearch is available. Running an older version of Elasticsearch is most likely not going to result in any issues, but it's still recommended to run the same version as mentioned. Use `bbilly1/tubearchivist-es` to automatically get the recommended version. diff --git a/docker-compose.yml b/docker-compose.yml index 72b65b9..5be45c0 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -34,7 +34,7 @@ services: depends_on: - archivist-es archivist-es: - image: bbilly1/tubearchivist-es # only for amd64, or use official es 8.6.0 + image: bbilly1/tubearchivist-es # only for amd64, or use official es 8.6.2 container_name: archivist-es restart: unless-stopped environment: diff --git a/docs/Search.md b/docs/Search.md index 2c7717b..f6ad652 100644 --- a/docs/Search.md +++ b/docs/Search.md @@ -1,31 +1,32 @@ # Search Page Accessible at `/search/` of your **Tube Archivist**, search your archive for Videos, Channels and Playlists - or even full text search throughout your indexed subtitles. +Just start typing to start a **simple** search *or* **start your query with a primary keyword** to search for a specific type and narrow down the result with secondary keywords. Secondary keywords can be in any order. Use *yes* or *no* for boolean values. + +- This will return 30 results per query, pagination is not implemented yet. - All your queries are case insensitive and are normalized to lowercase. - All your queries are analyzed for the english language, this means *singular*, *plural* and word variations like *-ing*, *-ed*, *-able* etc are treated as synonyms. +- Keyword value parsing begins with the `keyword:` name all the way until the end of query or the next keyword, e.g. in `video:learn python channel:corey`, the keyword `video` has value `learn python`. - Fuzzy search is activated for all your searches by default. This can catch typos in your queries or in the matching documents with one to two letters difference, depending on the query length. You can configure fuzziness with the secondary keyword `fuzzy:`, e.g: - `fuzzy:0` or `fuzzy:no`: Deactivate fuzzy matching. - `fuzzy:1`: Set fuzziness to one letter difference. - `fuzzy:2`: Set fuzziness to two letters difference. - All text searches are ranked, meaning the better a match the higher ranked the result. Unless otherwise stated, queries with multiple words are processed with the `and` operator, meaning all words need to match so each word will narrow down the result. -- This will return 30 results per query, pagination is not implemented yet. - -Just start typing to start a *simple* search or start your query with a primary keyword to search for a specific type and narrow down the result with secondary keywords. Secondary keywords can be in any order. Use *yes* or *no* for boolean values. ## Simple -Start your query without a keyword to make a simple query. This will search in *video titles*, *channel names* and *playlist titles* and will return matching videos, channels and playlists. Keyword searches will return more results in a particular category due to the fact that more fields are searched for matches. +Start your query without a keyword to make a simple query (primary keyword `simple:` is implied). This will search in *video titles*, *channel names* and *playlist titles* and will return matching videos, channels and playlists. Keyword searches will return more results in a particular category due to the fact that more fields are searched for matches. Simple queries do not have any secondary keywords. ## Video -Start your query with the primary keyword `video:` to search for videos only. This will search through the *video titles*, *tags* and *category* fields. Narrow your search down with secondary keywords: +Start your query with the **primary keyword** `video:` to search for videos only. This will search through the *video titles*, *tags* and *category* fields. Narrow your search down with secondary keywords: - `channel:` search for videos matching the channel name. - `active:` is a boolean value, to search for videos that are still active on youtube or that are not active any more. **Example**: -- `video:learn python channel:corey shafer active:yes`: This will return all videos with the term *Learn Python* from the channel *Corey Shafer* that are still *Active* on YouTube. +- `video:learn python channel:corey schafer active:yes`: This will return all videos with the term *Learn Python* from the channel *Corey Schafer* that are still *Active* on YouTube. - `video: channel:tom scott active:no`: Note the omitted term after the primary key, this will show all videos from the channel *Tom Scott* that are no longer active on YouTube. ## Channel -Start with the `channel:` primary keyword to search for channels matching your query. This will search through the *channel name* and *channel description* fields. Narrow your search down with secondary keywords: +Start with the `channel:` **primary keyword** to search for channels matching your query. This will search through the *channel name* and *channel description* fields. Narrow your search down with secondary keywords: - `subscribed:` is a boolean value, search for channels that you are subscribed to or not. - `active:` is a boolean value, to search for channels that are still active on YouTube or that are no longer active. @@ -34,7 +35,7 @@ Start with the `channel:` primary keyword to search for channels matching your q - `channel: active:no`: Note the omitted term after the primary key, this will return all channels that are no longer active on YouTube. ## Playlist -Start your query with the primary keyword `playlist:` to search for playlists only. This will search through the *playlist title* and *playlist description* fields. Narrow down your search with these secondary keywords: +Start your query with the **primary keyword** `playlist:` to search for playlists only. This will search through the *playlist title* and *playlist description* fields. Narrow down your search with these secondary keywords: - `subscribed`: is a boolean value, search for playlists that you are subscribed to or not. - `active:` is a boolean value, to search for playlists that are still active on YouTube or that are no longer active. @@ -44,7 +45,7 @@ Start your query with the primary keyword `playlist:` to search for playlists on - `playlist:html css active:yes`: Search for playlists containing *HTML CSS* that are still active on YouTube. ## Full -Start a full text search by beginning your query with the primary keyword `full:`. This will search through your indexed Subtitles showing segments with possible matches. This will only show any results if you have activated *subtitle download and index* on the settings page. The operator for full text searches is `or` meaning when searching for multiple words not all words need to match, but additional words will change the ranking of the result, the more words match and the better they match, the higher ranked the result. The matching words will get highlighted in the text preview. +Start a full text search by beginning your query with the **primary keyword** `full:`. This will search through your indexed Subtitles showing segments with possible matches. This will only show any results if you have activated *subtitle download and index* on the settings page. The operator for full text searches is `or` meaning when searching for multiple words not all words need to match, but additional words will change the ranking of the result, the more words match and the better they match, the higher ranked the result. The matching words will get highlighted in the text preview. Clicking the play button on the thumbnail will open the inplace player at the timestamp from where the segment starts. Same when clicking the video title, this will open the video page and put the player at the segment timestamp. This will overwrite any previous playback position. diff --git a/docs/Settings.md b/docs/Settings.md index b665135..9ca1519 100644 --- a/docs/Settings.md +++ b/docs/Settings.md @@ -52,16 +52,11 @@ Cookies are used to store your session and contain your access token to your goo ### Auto import Easiest way to import your cookie is to use the **Tube Archivist Companion** [browser extension](https://github.com/tubearchivist/browser-extension) for Firefox and Chrome. -### Alternative Manual Export your cookie -- Install **Cookies.txt** addon for [chrome](https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid) or [firefox](https://addons.mozilla.org/firefox/addon/cookies-txt). -- Visit YouTube and login with whichever YouTube account you wish to use to generate the cookies. -- Click on the extension icon in the toolbar - it will drop down showing the active cookies for YT. -- Click Export to export the cookies, filename is by default *cookies.google.txt*. +### Manual import +Alternatively you can also manually import your cookie into Tube Archivist. Export your cookie as a *Netscape* formatted text file, name it *cookies.google.txt* and put it into the *cache/import* folder. After that you can enable the option on the settings page and your cookie file will get imported. -### Alternative Manual Import your cookie -Place the file *cookies.google.txt* into the *cache/import* folder of Tube Archivist and enable the cookie import. Once you click on *Update Application Configurations* to save your changes, your cookie will get imported and stored internally. - -Once imported, a **Validate Cookie File** button will show, where you can confirm if your cookie is working or not. +- There are various tools out there that allow you to export cookies from your browser. This project doesn't make any specific recommendations. +- Once imported, a **Validate Cookie File** button will show, where you can confirm if your cookie is working or not. ### Use your cookie Once imported, additionally to the advantages above, your [Watch Later](https://www.youtube.com/playlist?list=WL) and [Liked Videos](https://www.youtube.com/playlist?list=LL) become a regular playlist you can download and subscribe to as any other [playlist](Playlists). diff --git a/tubearchivist/config/management/commands/ta_startup.py b/tubearchivist/config/management/commands/ta_startup.py index 852447a..a74358d 100644 --- a/tubearchivist/config/management/commands/ta_startup.py +++ b/tubearchivist/config/management/commands/ta_startup.py @@ -5,6 +5,7 @@ Functionality: """ import os +from time import sleep from django.core.management.base import BaseCommand, CommandError from home.src.es.connect import ElasticWrap @@ -151,6 +152,12 @@ class Command(BaseCommand): for index_name in index_list: path = f"{index_name}/_update_by_query" response, status_code = ElasticWrap(path).post(data=data) + if status_code == 503: + message = f" 🗙 {index_name} retry failed migration." + self.stdout.write(self.style.ERROR(message)) + sleep(10) + response, status_code = ElasticWrap(path).post(data=data) + if status_code == 200: updated = response.get("updated", 0) if not updated: diff --git a/tubearchivist/config/settings.py b/tubearchivist/config/settings.py index 91486f5..865586d 100644 --- a/tubearchivist/config/settings.py +++ b/tubearchivist/config/settings.py @@ -265,4 +265,4 @@ CORS_ALLOW_HEADERS = list(default_headers) + [ # TA application settings TA_UPSTREAM = "https://github.com/tubearchivist/tubearchivist" -TA_VERSION = "v0.3.3" +TA_VERSION = "v0.3.4" diff --git a/tubearchivist/home/src/download/queue.py b/tubearchivist/home/src/download/queue.py index 14298dc..9bf4492 100644 --- a/tubearchivist/home/src/download/queue.py +++ b/tubearchivist/home/src/download/queue.py @@ -158,13 +158,9 @@ class PendingList(PendingIndex): """manage the pending videos list""" yt_obs = { - "default_search": "ytsearch", - "quiet": True, - "check_formats": "selected", "noplaylist": True, "writethumbnail": True, "simulate": True, - "socket_timeout": 3, } def __init__(self, youtube_ids=False): @@ -244,6 +240,7 @@ class PendingList(PendingIndex): for idx, (youtube_id, vid_type) in enumerate(self.missing_videos): print(f"{youtube_id} ({vid_type}): add to download queue") + self._notify_add(idx) video_details = self.get_youtube_details(youtube_id, vid_type) if not video_details: continue @@ -256,8 +253,6 @@ class PendingList(PendingIndex): url = video_details["vid_thumb_url"] ThumbManager(youtube_id).download_video_thumb(url) - self._notify_add(idx) - if bulk_list: # add last newline bulk_list.append("\n") diff --git a/tubearchivist/home/src/download/yt_dlp_base.py b/tubearchivist/home/src/download/yt_dlp_base.py index fd59a9d..5cecc35 100644 --- a/tubearchivist/home/src/download/yt_dlp_base.py +++ b/tubearchivist/home/src/download/yt_dlp_base.py @@ -20,8 +20,9 @@ class YtWrap: "default_search": "ytsearch", "quiet": True, "check_formats": "selected", - "socket_timeout": 3, + "socket_timeout": 10, "extractor_retries": 3, + "retries": 10, } def __init__(self, obs_request, config=False): diff --git a/tubearchivist/home/src/download/yt_dlp_handler.py b/tubearchivist/home/src/download/yt_dlp_handler.py index 3485548..be7c71a 100644 --- a/tubearchivist/home/src/download/yt_dlp_handler.py +++ b/tubearchivist/home/src/download/yt_dlp_handler.py @@ -192,7 +192,7 @@ class VideoDownloader: "vid_type", VideoTypeEnum.VIDEOS.value ) video_type = VideoTypeEnum(tmp_vid_type) - print(f"Downloading type: {video_type}") + print(f"{youtube_id}: Downloading type: {video_type}") success = self._dl_single_vid(youtube_id) if not success: @@ -204,7 +204,7 @@ class VideoDownloader: "title": "Indexing....", "message": "Add video metadata to index.", } - RedisArchivist().set_message(self.MSG, mess_dict, expire=60) + RedisArchivist().set_message(self.MSG, mess_dict, expire=120) vid_dict = index_new_video( youtube_id, @@ -223,8 +223,10 @@ class VideoDownloader: if queue.has_item(): message = "Continue with next video." + expire = False else: message = "Download queue is finished." + expire = 10 self.move_to_archive(vid_dict) mess_dict = { @@ -233,7 +235,7 @@ class VideoDownloader: "title": "Completed", "message": message, } - RedisArchivist().set_message(self.MSG, mess_dict, expire=10) + RedisArchivist().set_message(self.MSG, mess_dict, expire=expire) self._delete_from_pending(youtube_id) # post processing @@ -260,7 +262,7 @@ class VideoDownloader: "title": "Looking for videos to download", "message": "Scanning your download queue.", } - RedisArchivist().set_message(self.MSG, mess_dict, expire=True) + RedisArchivist().set_message(self.MSG, mess_dict) pending = PendingList() pending.get_download() to_add = [ @@ -293,8 +295,11 @@ class VideoDownloader: title = "Downloading: " + response["info_dict"]["title"] try: + size = response.get("_total_bytes_str") + if size.strip() == "N/A": + size = response.get("_total_bytes_estimate_str", "N/A") + percent = response["_percent_str"] - size = response["_total_bytes_str"] speed = response["_speed_str"] eta = response["_eta_str"] message = f"{percent} of {size} at {speed} - time left: {eta}" @@ -318,7 +323,6 @@ class VideoDownloader: def _build_obs_basic(self): """initial obs""" self.obs = { - "default_search": "ytsearch", "merge_output_format": "mp4", "outtmpl": ( self.config["application"]["cache_dir"] @@ -326,13 +330,9 @@ class VideoDownloader: ), "progress_hooks": [self._progress_hook], "noprogress": True, - "quiet": True, "continuedl": True, - "retries": 3, "writethumbnail": False, "noplaylist": True, - "check_formats": "selected", - "socket_timeout": 3, } def _build_obs_user(self): diff --git a/tubearchivist/home/src/index/comments.py b/tubearchivist/home/src/index/comments.py index e9ec0d9..32cea55 100644 --- a/tubearchivist/home/src/index/comments.py +++ b/tubearchivist/home/src/index/comments.py @@ -210,6 +210,9 @@ class CommentList: return total_videos = len(self.video_ids) + if notify: + self._notify(f"add comments for {total_videos} videos", False) + for idx, video_id in enumerate(self.video_ids): comment = Comments(video_id, config=self.config) if notify: @@ -219,16 +222,16 @@ class CommentList: comment.upload_comments() if notify: - self.notify_final(total_videos) + self._notify(f"added comments for {total_videos} videos", 5) @staticmethod - def notify_final(total_videos): - """send final notification""" + def _notify(message, expire): + """send notification""" key = "message:download" message = { "status": key, "level": "info", "title": "Download and index comments finished", - "message": f"added comments for {total_videos} videos", + "message": message, } - RedisArchivist().set_message(key, message, expire=4) + RedisArchivist().set_message(key, message, expire=expire) diff --git a/tubearchivist/home/templates/home/search.html b/tubearchivist/home/templates/home/search.html index b333baa..5bbfe15 100644 --- a/tubearchivist/home/templates/home/search.html +++ b/tubearchivist/home/templates/home/search.html @@ -7,30 +7,80 @@ -
-

Video Results

-
-

No videos found.

+ -
-

Channel Results

-
-

No channels found.

+
+
+

Example queries

+
    +
  • music video — basic search
  • +
  • video: active:no — all videos deleted from YouTube
  • +
  • video:learn javascript channel:corey schafer active:yes
  • +
  • channel:linux subscribed:yes
  • +
  • playlist:backend engineering active:yes subscribed:yes
  • +
-
-
-

Playlist Results

-
-

No playlists found.

-
-
-
-

Fulltext Results

-
-

No fulltext results found.

+
+

Keywords cheatsheet

+

For detailed usage check wiki.

+
+
    +
  • simple: (implied) — search in video titles, channel names and playlist titles
  • +
  • + video: — search in video titles, tags and category field +
      +
    • channel: — channel name
    • +
    • active:yes/no — whether the video is still active on YouTube
    • +
    +
  • +
  • + channel: — search in channel name and channel description +
      +
    • subscribed:yes/no — whether you are subscribed to the channel
    • +
    • active:yes/no — whether the video is still active on YouTube
    • +
    +
  • +
  • + playlist: — search in channel name and channel description +
      +
    • subscribed:yes/no — whether you are subscribed to the channel
    • +
    • active:yes/no — whether the video is still active on YouTube
    • +
    +
  • +
  • + full: — search in video subtitles +
      +
    • lang: — subtitles language (use two-letter ISO country code, same as the one from settings page)
    • +
    • source:auto/user — auto to search though auto-generated subtitles only, or user to search through user-uploaded subtitles only
    • +
    +
  • +
+
diff --git a/tubearchivist/requirements.txt b/tubearchivist/requirements.txt index 00a9e1a..721df00 100644 --- a/tubearchivist/requirements.txt +++ b/tubearchivist/requirements.txt @@ -10,4 +10,4 @@ requests==2.28.2 ryd-client==0.0.6 uWSGI==2.0.21 whitenoise==6.4.0 -yt_dlp==2023.2.17 +yt_dlp==2023.3.3 diff --git a/tubearchivist/static/css/style.css b/tubearchivist/static/css/style.css index 56ab4e3..dbf76f9 100644 --- a/tubearchivist/static/css/style.css +++ b/tubearchivist/static/css/style.css @@ -892,10 +892,24 @@ video:-webkit-full-screen { width: 100%; } -.multi-search-result { +.multi-search-result, #multi-search-results-placeholder { padding: 1rem 0; } +#multi-search-results-placeholder span { + font-family: monospace; + color: var(--accent-font-dark); + background-color: var(--highlight-bg); +} + +#multi-search-results-placeholder span.value { + color: var(--accent-font-light); +} + +#multi-search-results-placeholder ul { + margin-top: 10px; +} + /* channel overview page */ .channel-list.list { display: block; diff --git a/tubearchivist/static/script.js b/tubearchivist/static/script.js index 8dfff95..ff8e8cc 100644 --- a/tubearchivist/static/script.js +++ b/tubearchivist/static/script.js @@ -865,21 +865,33 @@ function setProgressBar(videoId, currentTime, duration) { // multi search form let searchTimeout = null; +let searchHttpRequest = null; function searchMulti(query) { clearTimeout(searchTimeout); searchTimeout = setTimeout(function () { - if (query.length > 1) { - let http = new XMLHttpRequest(); - http.onreadystatechange = function () { - if (http.readyState === 4) { - let response = JSON.parse(http.response); + if (query.length > 0) { + if (searchHttpRequest) { + searchHttpRequest.abort(); + } + searchHttpRequest = new XMLHttpRequest(); + searchHttpRequest.onreadystatechange = function () { + if (searchHttpRequest.readyState === 4) { + const response = JSON.parse(searchHttpRequest.response); populateMultiSearchResults(response.results, response.queryType); } }; - http.open('GET', `/api/search/?query=${query}`, true); - http.setRequestHeader('X-CSRFToken', getCookie('csrftoken')); - http.setRequestHeader('Content-type', 'application/json'); - http.send(); + searchHttpRequest.open('GET', `/api/search/?query=${query}`, true); + searchHttpRequest.setRequestHeader('X-CSRFToken', getCookie('csrftoken')); + searchHttpRequest.setRequestHeader('Content-type', 'application/json'); + searchHttpRequest.send(); + } else { + if (searchHttpRequest) { + searchHttpRequest.abort(); + searchHttpRequest = null; + } + // show the placeholder container and hide the results container + document.getElementById('multi-search-results').style.display = 'none'; + document.getElementById('multi-search-results-placeholder').style.display = 'block'; } }, 500); } @@ -890,6 +902,9 @@ function getViewDefaults(view) { } function populateMultiSearchResults(allResults, queryType) { + // show the results container and hide the placeholder container + document.getElementById('multi-search-results').style.display = 'block'; + document.getElementById('multi-search-results-placeholder').style.display = 'none'; // videos let defaultVideo = getViewDefaults('home'); let allVideos = allResults.video_results;