Merge branch 'testing' into feat-redis-result

2025-07-04 00:01:09 +00:00 · 2023-03-07 16:07:17 +07:00 · 2023-03-07 16:07:17 +07:00 · 2850988bfe
commit 2850988bfe
parent 4fb5744cb3 2d6c0bd02b
14 changed files with 155 additions and 73 deletions
--- a/README.md
+++ b/README.md
@ -180,6 +180,7 @@ For some architectures it might be required to run Redis JSON on a nonstandard p

 ### Updating Tube Archivist
 You will see the current version number of **Tube Archivist** in the footer of the interface. There is a daily version check task querying tubearchivist.com, notifying you of any new releases in the footer. To take advantage of the latest fixes and improvements, make sure you are running the *latest and greatest*.  
+* This project is tested for updates between one or two releases maximum. Further updates back may or may not be supported and you might have to reset your index and configurations to update. Ideally apply new updates at least once per month.
 * There can be breaking changes between updates, particularly as the application grows, new environment variables or settings might be required for you to set in the your docker-compose file. *Always* check the **release notes**: Any breaking changes will be marked there.  
 * All testing and development is done with the Elasticsearch version number as mentioned in the provided *docker-compose.yml* file. This will be updated when a new release of Elasticsearch is available. Running an older version of Elasticsearch is most likely not going to result in any issues, but it's still recommended to run the same version as mentioned. Use `bbilly1/tubearchivist-es` to automatically get the recommended version.

--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -34,7 +34,7 @@ services:
    depends_on:
      - archivist-es
  archivist-es:
-    image: bbilly1/tubearchivist-es         # only for amd64, or use official es 8.6.0
+    image: bbilly1/tubearchivist-es         # only for amd64, or use official es 8.6.2
    container_name: archivist-es
    restart: unless-stopped
    environment:
--- a/docs/Search.md
+++ b/docs/Search.md
@ -1,31 +1,32 @@
 # Search Page
 Accessible at `/search/` of your **Tube Archivist**, search your archive for Videos, Channels and Playlists - or even full text search throughout your indexed subtitles.

+Just start typing to start a **simple** search *or* **start your query with a primary keyword** to search for a specific type and narrow down the result with secondary keywords. Secondary keywords can be in any order. Use *yes* or *no* for boolean values.
+
+- This will return 30 results per query, pagination is not implemented yet.
 - All your queries are case insensitive and are normalized to lowercase.
 - All your queries are analyzed for the english language, this means *singular*, *plural* and word variations like *-ing*, *-ed*, *-able* etc are treated as synonyms.
+- Keyword value parsing begins with the `keyword:` name all the way until the end of query or the next keyword, e.g. in `video:learn python channel:corey`, the keyword `video` has value `learn python`.
 - Fuzzy search is activated for all your searches by default. This can catch typos in your queries or in the matching documents with one to two letters difference, depending on the query length. You can configure fuzziness with the secondary keyword `fuzzy:`, e.g:
  - `fuzzy:0` or `fuzzy:no`: Deactivate fuzzy matching.
  - `fuzzy:1`: Set fuzziness to one letter difference.
  - `fuzzy:2`: Set fuzziness to two letters difference.
 - All text searches are ranked, meaning the better a match the higher ranked the result. Unless otherwise stated, queries with multiple words are processed with the `and` operator, meaning all words need to match so each word will narrow down the result.
- This will return 30 results per query, pagination is not implemented yet.
-
-Just start typing to start a *simple* search or start your query with a primary keyword to search for a specific type and narrow down the result with secondary keywords. Secondary keywords can be in any order. Use *yes* or *no* for boolean values.

 ## Simple
-Start your query without a keyword to make a simple query. This will search in *video titles*, *channel names* and *playlist titles* and will return matching videos, channels and playlists. Keyword searches will return more results in a particular category due to the fact that more fields are searched for matches.
+Start your query without a keyword to make a simple query (primary keyword `simple:` is implied). This will search in *video titles*, *channel names* and *playlist titles* and will return matching videos, channels and playlists. Keyword searches will return more results in a particular category due to the fact that more fields are searched for matches. Simple queries do not have any secondary keywords.

 ## Video
-Start your query with the primary keyword `video:` to search for videos only. This will search through the *video titles*, *tags* and *category* fields. Narrow your search down with secondary keywords:
+Start your query with the **primary keyword** `video:` to search for videos only. This will search through the *video titles*, *tags* and *category* fields. Narrow your search down with secondary keywords:
 - `channel:` search for videos matching the channel name.
 - `active:` is a boolean value, to search for videos that are still active on youtube or that are not active any more.

 **Example**:
- `video:learn python channel:corey shafer active:yes`: This will return all videos with the term *Learn Python* from the channel *Corey Shafer* that are still *Active* on YouTube.
+- `video:learn python channel:corey schafer active:yes`: This will return all videos with the term *Learn Python* from the channel *Corey Schafer* that are still *Active* on YouTube.
 - `video: channel:tom scott active:no`: Note the omitted term after the primary key, this will show all videos from the channel *Tom Scott* that are no longer active on YouTube.

 ## Channel
-Start with the `channel:` primary keyword to search for channels matching your query. This will search through the *channel name* and *channel description* fields. Narrow your search down with secondary keywords:
+Start with the `channel:` **primary keyword** to search for channels matching your query. This will search through the *channel name* and *channel description* fields. Narrow your search down with secondary keywords:
 - `subscribed:` is a boolean value, search for channels that you are subscribed to or not.
 - `active:` is a boolean value, to search for channels that are still active on YouTube or that are no longer active.

@ -34,7 +35,7 @@ Start with the `channel:` primary keyword to search for channels matching your q
 - `channel: active:no`: Note the omitted term after the primary key, this will return all channels that are no longer active on YouTube.

 ## Playlist
-Start your query with the primary keyword `playlist:` to search for playlists only. This will search through the *playlist title* and *playlist description* fields. Narrow down your search with these secondary keywords:
+Start your query with the **primary keyword** `playlist:` to search for playlists only. This will search through the *playlist title* and *playlist description* fields. Narrow down your search with these secondary keywords:
 - `subscribed`: is a boolean value, search for playlists that you are subscribed to or not.
 - `active:` is a boolean value, to search for playlists that are still active on YouTube or that are no longer active.

@ -44,7 +45,7 @@ Start your query with the primary keyword `playlist:` to search for playlists on
 - `playlist:html css active:yes`: Search for playlists containing *HTML CSS* that are still active on YouTube.

 ## Full
-Start a full text search by beginning your query with the primary keyword `full:`. This will search through your indexed Subtitles showing segments with possible matches. This will only show any results if you have activated *subtitle download and index* on the settings page. The operator for full text searches is `or` meaning when searching for multiple words not all words need to match, but additional words will change the ranking of the result, the more words match and the better they match, the higher ranked the result. The matching words will get highlighted in the text preview.
+Start a full text search by beginning your query with the **primary keyword** `full:`. This will search through your indexed Subtitles showing segments with possible matches. This will only show any results if you have activated *subtitle download and index* on the settings page. The operator for full text searches is `or` meaning when searching for multiple words not all words need to match, but additional words will change the ranking of the result, the more words match and the better they match, the higher ranked the result. The matching words will get highlighted in the text preview.

 Clicking the play button on the thumbnail will open the inplace player at the timestamp from where the segment starts. Same when clicking the video title, this will open the video page and put the player at the segment timestamp. This will overwrite any previous playback position.

--- a/docs/Settings.md
+++ b/docs/Settings.md
@ -52,16 +52,11 @@ Cookies are used to store your session and contain your access token to your goo
 ### Auto import
 Easiest way to import your cookie is to use the **Tube Archivist Companion** [browser extension](https://github.com/tubearchivist/browser-extension) for Firefox and Chrome.

-### Alternative Manual Export your cookie
- Install **Cookies.txt** addon for [chrome](https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid) or [firefox](https://addons.mozilla.org/firefox/addon/cookies-txt).
- Visit YouTube and login with whichever YouTube account you wish to use to generate the cookies.
- Click on the extension icon in the toolbar - it will drop down showing the active cookies for YT.
- Click Export to export the cookies, filename is by default *cookies.google.txt*.
+### Manual import
+Alternatively you can also manually import your cookie into Tube Archivist. Export your cookie as a *Netscape* formatted text file, name it *cookies.google.txt* and put it into the *cache/import* folder. After that you can enable the option on the settings page and your cookie file will get imported.

-### Alternative Manual Import your cookie
-Place the file *cookies.google.txt* into the *cache/import* folder of Tube Archivist and enable the cookie import. Once you click on *Update Application Configurations* to save your changes, your cookie will get imported and stored internally.
-
-Once imported, a **Validate Cookie File** button will show, where you can confirm if your cookie is working or not.
+- There are various tools out there that allow you to export cookies from your browser. This project doesn't make any specific recommendations.
+- Once imported, a **Validate Cookie File** button will show, where you can confirm if your cookie is working or not.

 ### Use your cookie
 Once imported, additionally to the advantages above, your [Watch Later](https://www.youtube.com/playlist?list=WL) and [Liked Videos](https://www.youtube.com/playlist?list=LL) become a regular playlist you can download and subscribe to as any other [playlist](Playlists).
--- a/tubearchivist/config/management/commands/ta_startup.py
+++ b/tubearchivist/config/management/commands/ta_startup.py
@ -5,6 +5,7 @@ Functionality:
 """

 import os
+from time import sleep

 from django.core.management.base import BaseCommand, CommandError
 from home.src.es.connect import ElasticWrap
@ -151,6 +152,12 @@ class Command(BaseCommand):
        for index_name in index_list:
            path = f"{index_name}/_update_by_query"
            response, status_code = ElasticWrap(path).post(data=data)
+            if status_code == 503:
+                message = f"    🗙 {index_name} retry failed migration."
+                self.stdout.write(self.style.ERROR(message))
+                sleep(10)
+                response, status_code = ElasticWrap(path).post(data=data)
+
            if status_code == 200:
                updated = response.get("updated", 0)
                if not updated:
--- a/tubearchivist/config/settings.py
+++ b/tubearchivist/config/settings.py
@ -265,4 +265,4 @@ CORS_ALLOW_HEADERS = list(default_headers) + [

 # TA application settings
 TA_UPSTREAM = "https://github.com/tubearchivist/tubearchivist"
-TA_VERSION = "v0.3.3"
+TA_VERSION = "v0.3.4"
--- a/tubearchivist/home/src/download/queue.py
+++ b/tubearchivist/home/src/download/queue.py
@ -158,13 +158,9 @@ class PendingList(PendingIndex):
    """manage the pending videos list"""

    yt_obs = {
-        "default_search": "ytsearch",
-        "quiet": True,
-        "check_formats": "selected",
        "noplaylist": True,
        "writethumbnail": True,
        "simulate": True,
-        "socket_timeout": 3,
    }

    def __init__(self, youtube_ids=False):
@ -244,6 +240,7 @@ class PendingList(PendingIndex):

        for idx, (youtube_id, vid_type) in enumerate(self.missing_videos):
            print(f"{youtube_id} ({vid_type}): add to download queue")
+            self._notify_add(idx)
            video_details = self.get_youtube_details(youtube_id, vid_type)
            if not video_details:
                continue
@ -256,8 +253,6 @@ class PendingList(PendingIndex):
            url = video_details["vid_thumb_url"]
            ThumbManager(youtube_id).download_video_thumb(url)

-            self._notify_add(idx)
-
        if bulk_list:
            # add last newline
            bulk_list.append("\n")
--- a/tubearchivist/home/src/download/yt_dlp_base.py
+++ b/tubearchivist/home/src/download/yt_dlp_base.py
@ -20,8 +20,9 @@ class YtWrap:
        "default_search": "ytsearch",
        "quiet": True,
        "check_formats": "selected",
-        "socket_timeout": 3,
+        "socket_timeout": 10,
        "extractor_retries": 3,
+        "retries": 10,
    }

    def __init__(self, obs_request, config=False):
--- a/tubearchivist/home/src/download/yt_dlp_handler.py
+++ b/tubearchivist/home/src/download/yt_dlp_handler.py
@ -192,7 +192,7 @@ class VideoDownloader:
                "vid_type", VideoTypeEnum.VIDEOS.value
            )
            video_type = VideoTypeEnum(tmp_vid_type)
-            print(f"Downloading type: {video_type}")
+            print(f"{youtube_id}: Downloading type: {video_type}")

            success = self._dl_single_vid(youtube_id)
            if not success:
@ -204,7 +204,7 @@ class VideoDownloader:
                "title": "Indexing....",
                "message": "Add video metadata to index.",
            }
-            RedisArchivist().set_message(self.MSG, mess_dict, expire=60)
+            RedisArchivist().set_message(self.MSG, mess_dict, expire=120)

            vid_dict = index_new_video(
                youtube_id,
@ -223,8 +223,10 @@ class VideoDownloader:

            if queue.has_item():
                message = "Continue with next video."
+                expire = False
            else:
                message = "Download queue is finished."
+                expire = 10

            self.move_to_archive(vid_dict)
            mess_dict = {
@ -233,7 +235,7 @@ class VideoDownloader:
                "title": "Completed",
                "message": message,
            }
-            RedisArchivist().set_message(self.MSG, mess_dict, expire=10)
+            RedisArchivist().set_message(self.MSG, mess_dict, expire=expire)
            self._delete_from_pending(youtube_id)

        # post processing
@ -260,7 +262,7 @@ class VideoDownloader:
            "title": "Looking for videos to download",
            "message": "Scanning your download queue.",
        }
-        RedisArchivist().set_message(self.MSG, mess_dict, expire=True)
+        RedisArchivist().set_message(self.MSG, mess_dict)
        pending = PendingList()
        pending.get_download()
        to_add = [
@ -293,8 +295,11 @@ class VideoDownloader:
        title = "Downloading: " + response["info_dict"]["title"]

        try:
+            size = response.get("_total_bytes_str")
+            if size.strip() == "N/A":
+                size = response.get("_total_bytes_estimate_str", "N/A")
+
            percent = response["_percent_str"]
-            size = response["_total_bytes_str"]
            speed = response["_speed_str"]
            eta = response["_eta_str"]
            message = f"{percent} of {size} at {speed} - time left: {eta}"
@ -318,7 +323,6 @@ class VideoDownloader:
    def _build_obs_basic(self):
        """initial obs"""
        self.obs = {
-            "default_search": "ytsearch",
            "merge_output_format": "mp4",
            "outtmpl": (
                self.config["application"]["cache_dir"]
@ -326,13 +330,9 @@ class VideoDownloader:
            ),
            "progress_hooks": [self._progress_hook],
            "noprogress": True,
-            "quiet": True,
            "continuedl": True,
-            "retries": 3,
            "writethumbnail": False,
            "noplaylist": True,
-            "check_formats": "selected",
-            "socket_timeout": 3,
        }

    def _build_obs_user(self):
--- a/tubearchivist/home/src/index/comments.py
+++ b/tubearchivist/home/src/index/comments.py
@ -210,6 +210,9 @@ class CommentList:
            return

        total_videos = len(self.video_ids)
+        if notify:
+            self._notify(f"add comments for {total_videos} videos", False)
+
        for idx, video_id in enumerate(self.video_ids):
            comment = Comments(video_id, config=self.config)
            if notify:
@ -219,16 +222,16 @@ class CommentList:
                comment.upload_comments()

        if notify:
-            self.notify_final(total_videos)
+            self._notify(f"added comments for {total_videos} videos", 5)

    @staticmethod
-    def notify_final(total_videos):
-        """send final notification"""
+    def _notify(message, expire):
+        """send notification"""
        key = "message:download"
        message = {
            "status": key,
            "level": "info",
            "title": "Download and index comments finished",
-            "message": f"added comments for {total_videos} videos",
+            "message": message,
        }
-        RedisArchivist().set_message(key, message, expire=4)
+        RedisArchivist().set_message(key, message, expire=expire)
--- a/tubearchivist/home/templates/home/search.html
+++ b/tubearchivist/home/templates/home/search.html
@ -7,30 +7,80 @@
    </div>
    <div class="multi-search-box">
        {{ search_form }}
-        <p>Start typing or use <span class="settings-current">video:</span>, <span class="settings-current">channel:</span>, <span class="settings-current">playlist:</span> or <span class="settings-current">full:</span> keywords for advanced queries. <a href="https://github.com/tubearchivist/tubearchivist/wiki/Search" target="_blank">Learn more</a>.</p>
    </div>
-    <div class="multi-search-result">
-        <h2>Video Results</h2>
-        <div id="video-results" class="video-list {{ all_styles.home }} {% if all_styles.home == "grid" %}grid-{{ grid_items }}{% endif %}">
-            <p>No videos found.</p>
+    <div id="multi-search-results" style="display: none;">
+        <div class="multi-search-result">
+            <h2>Video Results</h2>
+            <div id="video-results" class="video-list {{ all_styles.home }} {% if all_styles.home == "grid" %}grid-{{ grid_items }}{% endif %}">
+                <p>No videos found.</p>
+            </div>
+        </div>
+        <div class="multi-search-result">
+            <h2>Channel Results</h2>
+            <div id="channel-results" class="channel-list {{ all_styles.channel }}">
+                <p>No channels found.</p>
+            </div>
+        </div>
+        <div class="multi-search-result">
+            <h2>Playlist Results</h2>
+            <div id="playlist-results" class="playlist-list {{ all_styles.playlist }}">
+                <p>No playlists found.</p>
+            </div>
+        </div>
+        <div class="multi-search-result">
+            <h2>Fulltext Results</h2>
+            <div id="fulltext-results" class="video-list list">
+                <p>No fulltext results found.</p>
+            </div>
        </div>
    </div>
-    <div class="multi-search-result">
-        <h2>Channel Results</h2>
-        <div id="channel-results" class="channel-list {{ all_styles.channel }}">
-            <p>No channels found.</p>
+    <div id="multi-search-results-placeholder" style="display: block;">
+        <div>
+            <h2>Example queries</h2>
+            <ul>
+                <li><span class="value">music video</span> — basic search</li>
+                <li><span>video: active:</span><span class="value">no</span> — all videos deleted from YouTube</li>
+                <li><span>video:</span><span class="value">learn javascript</span><span> channel:</span><span class="value">corey schafer</span><span> active:</span><span class="value">yes</span></li>
+                <li><span>channel:</span><span class="value">linux</span><span> subscribed:</span><span class="value">yes</span></li>
+                <li><span>playlist:</span><span class="value">backend engineering</span><span> active:</span><span class="value">yes</span><span> subscribed:</span><span class="value">yes</span></li>
+            </ul>
        </div>
-    </div>
-    <div class="multi-search-result">
-        <h2>Playlist Results</h2>
-        <div id="playlist-results" class="playlist-list {{ all_styles.playlist }}">
-            <p>No playlists found.</p>
-        </div>
-    </div>
-    <div class="multi-search-result">
-        <h2>Fulltext Results</h2>
-        <div id="fulltext-results" class="video-list list">
-            <p>No fulltext results found.</p>
+        <div>
+            <h2>Keywords cheatsheet</h2>
+            <p>For detailed usage check <a href="https://github.com/tubearchivist/tubearchivist/wiki/Search" target="_blank">wiki</a>.</p>
+            <div>
+                <ul>
+                    <li><span>simple:</span> (implied) — search in video titles, channel names and playlist titles</li>
+                    <li>
+                        <span>video:</span> — search in video titles, tags and category field
+                        <ul>
+                            <li><span>channel:</span> — channel name</li>
+                            <li><span>active:</span><span class="value">yes/no</span> — whether the video is still active on YouTube</li>
+                        </ul>
+                    </li>
+                    <li>
+                        <span>channel:</span> — search in channel name and channel description
+                        <ul>
+                            <li><span>subscribed:</span><span class="value">yes/no</span> — whether you are subscribed to the channel</li>
+                            <li><span>active:</span><span class="value">yes/no</span> — whether the video is still active on YouTube</li>
+                        </ul>
+                    </li>
+                    <li>
+                        <span>playlist:</span> — search in channel name and channel description
+                        <ul>
+                            <li><span>subscribed:</span><span class="value">yes/no</span> — whether you are subscribed to the channel</li>
+                            <li><span>active:</span><span class="value">yes/no</span> — whether the video is still active on YouTube</li>
+                        </ul>
+                    </li>
+                    <li>
+                        <span>full:</span> — search in video subtitles
+                        <ul>
+                            <li><span>lang:</span> — subtitles language (use two-letter ISO country code, same as the one from settings page)</li>
+                            <li><span>source:</span><span class="value">auto/user</span> — <i>auto</i> to search though auto-generated subtitles only, or <i>user</i> to search through user-uploaded subtitles only</li>
+                        </ul>
+                    </li>
+                </ul>
+            </div>
        </div>
    </div>
 </div>
--- a/tubearchivist/requirements.txt
+++ b/tubearchivist/requirements.txt
@ -10,4 +10,4 @@ requests==2.28.2
 ryd-client==0.0.6
 uWSGI==2.0.21
 whitenoise==6.4.0
-yt_dlp==2023.2.17
+yt_dlp==2023.3.3
--- a/tubearchivist/static/css/style.css
+++ b/tubearchivist/static/css/style.css
@ -892,10 +892,24 @@ video:-webkit-full-screen {
    width: 100%;
 }

-.multi-search-result {
+.multi-search-result, #multi-search-results-placeholder {
    padding: 1rem 0;
 }

+#multi-search-results-placeholder span {
+    font-family: monospace;
+    color: var(--accent-font-dark);
+    background-color: var(--highlight-bg);
+}
+
+#multi-search-results-placeholder span.value {
+    color: var(--accent-font-light);
+}
+
+#multi-search-results-placeholder ul {
+    margin-top: 10px;
+}
+
 /* channel overview page */
 .channel-list.list {
    display: block;
--- a/tubearchivist/static/script.js
+++ b/tubearchivist/static/script.js
@ -865,21 +865,33 @@ function setProgressBar(videoId, currentTime, duration) {

 // multi search form
 let searchTimeout = null;
+let searchHttpRequest = null;
 function searchMulti(query) {
  clearTimeout(searchTimeout);
  searchTimeout = setTimeout(function () {
-    if (query.length > 1) {
-      let http = new XMLHttpRequest();
-      http.onreadystatechange = function () {
-        if (http.readyState === 4) {
-          let response = JSON.parse(http.response);
+    if (query.length > 0) {
+      if (searchHttpRequest) {
+        searchHttpRequest.abort();
+      }
+      searchHttpRequest = new XMLHttpRequest();
+      searchHttpRequest.onreadystatechange = function () {
+        if (searchHttpRequest.readyState === 4) {
+          const response = JSON.parse(searchHttpRequest.response);
          populateMultiSearchResults(response.results, response.queryType);
        }
      };
-      http.open('GET', `/api/search/?query=${query}`, true);
-      http.setRequestHeader('X-CSRFToken', getCookie('csrftoken'));
-      http.setRequestHeader('Content-type', 'application/json');
-      http.send();
+      searchHttpRequest.open('GET', `/api/search/?query=${query}`, true);
+      searchHttpRequest.setRequestHeader('X-CSRFToken', getCookie('csrftoken'));
+      searchHttpRequest.setRequestHeader('Content-type', 'application/json');
+      searchHttpRequest.send();
+    } else {
+      if (searchHttpRequest) {
+        searchHttpRequest.abort();
+        searchHttpRequest = null;
+      }
+      // show the placeholder container and hide the results container
+      document.getElementById('multi-search-results').style.display = 'none';
+      document.getElementById('multi-search-results-placeholder').style.display = 'block';
    }
  }, 500);
 }
@ -890,6 +902,9 @@ function getViewDefaults(view) {
 }

 function populateMultiSearchResults(allResults, queryType) {
+  // show the results container and hide the placeholder container
+  document.getElementById('multi-search-results').style.display = 'block';
+  document.getElementById('multi-search-results-placeholder').style.display = 'none';
  // videos
  let defaultVideo = getViewDefaults('home');
  let allVideos = allResults.video_results;