tubearchivist/tubearchivist

mirror of https://github.com/tubearchivist/tubearchivist.git synced 2025-07-01 23:01:11 +00:00

Go to file

simon ea2d0bcb6c initial wiki pages

2021-09-26 11:21:54 +07:00

.github/workflows

Reducing max-complexity to 12 after code refactor #25

2021-09-22 12:03:52 +07:00

minimal viable product

2021-09-06 00:10:14 +07:00

icon assets for all those fancy dashboards

2021-09-15 10:23:38 +07:00

initial wiki pages

2021-09-26 11:21:54 +07:00

implement os.listdir sanitizer for hidden files, #30

2021-09-25 18:59:54 +07:00

.dockerignore

improved deployment for testing environment

2021-09-10 16:04:46 +07:00

.gitignore

improved deployment for testing environment

2021-09-10 16:04:46 +07:00

CONTRIBUTING.md

initial wiki pages

2021-09-26 11:21:54 +07:00

deploy.sh

fix read waiting for return issue

2021-09-22 18:11:05 +07:00

docker-compose.yml

minimal viable product

2021-09-06 00:10:14 +07:00

Dockerfile

expose should be 8000 #9

2021-09-16 15:36:14 +07:00

LICENSE

minimal viable product

2021-09-06 00:10:14 +07:00

nginx.conf

minimal viable product

2021-09-06 00:10:14 +07:00

README.md

a word about updating tube archivist

2021-09-23 22:40:38 +07:00

run.sh

ignore static files when already created

2021-09-16 21:59:36 +07:00

uwsgi.ini

minimal viable product

2021-09-06 00:10:14 +07:00

version_check.py

linting everything in black

2021-09-21 16:25:22 +07:00

README.md

Your self hosted YouTube media server

Core functionality

Subscribe to your favorite YouTube channels
Download Videos using yt-dlp
Index and make videos searchable
Play videos
Keep track of viewed and unviewed videos

Screenshots

Home Page

All Channels

Single Channel

Video Page

Downloads Page

Problem Tube Archivist tries to solve

Once your YouTube video collection grows, it becomes hard to search and find a specific video. That's where Tube Archivist comes in: By indexing your video collection with metadata from YouTube, you can organize, search and enjoy your archived YouTube videos without hassle offline through a convenient web interface.

Installation

Take a look at the example docker-compose.yml file provided. Tube Archivist depends on three main components split up into separate docker containers:

Tube Archivist

The main Python application that displays and serves your video collection, built with Django.

Serves the interface on port 8000
Needs a mandatory volume for the video archive at /youtube
And another recommended volume to save the cache for thumbnails and artwork at /cache.
The environment variables ES_URL and REDIS_HOST are needed to tell Tube Archivist where Elasticsearch and Redis respectively are located.
The environment variables HOST_UID and HOST_GID allows Tube Archivist to chown the video files to the main host system user instead of the container user.

Elasticsearch

Stores video meta data and makes everything searchable. Also keeps track of the download queue.

Needs to be accessible over the default port 9200
Needs a volume at /usr/share/elasticsearch/data to store data

Follow the documentation for additional installation details.

Redis JSON

Functions as a cache and temporary link between the application and the file system. Used to store and display messages and configuration variables.

Needs to be accessible over the default port 6379
Takes an optional volume at /data to make your configuration changes permanent.

Getting Started

Go through the settings page and look at the available options. Particularly set Download Format to your desired video quality before downloading. Tube Archivist downloads the best available quality by default.
Subscribe to some of your favorite YouTube channels on the channels page.
On the downloads page, click on Rescan subscriptions to add videos from the subscribed channels to your Download queue or click on Add to download queue to manually add Video IDs, links, channels or playlists.
Click on Download queue and let Tube Archivist to it's thing.
Enjoy your archived collection!

Import your existing library

So far this depends on the video you are trying to import to be still available on YouTube to get the metadata. Add the files you like to import to the /cache/import folder. Then start the process from the settings page Manual media files import. Make sure to follow one of the two methods below.

Method 1:

Add a matching .json file with the media file. Both files need to have the same base name, for example:

For the media file: <base-name>.mp4
For the JSON file: <base-name>.info.json
Alternate JSON file: <base-name>.json

Tube Archivist then looks for the 'id' key within the JSON file to identify the video.

Method 2:

Detect the YouTube ID from filename, this accepts the default yt-dlp naming convention for file names like:

<base-name>[<youtube-id>].mp4
The YouTube ID in square brackets at the end of the filename is the crucial part.

Some notes:

This will consume the files you put into the import folder: Files will get converted to mp4 if needed (this might take a long time...) and moved to the archive, .json files will get deleted upon completion to avoid having duplicates on the next run.
Maybe start with a subset of your files to import to make sure everything goes well...
Follow the logs to monitor progress and errors: docker-compose logs -f tubearchivist.

Backup and restore

From the settings page you can backup your metadata into a zip file. The file will get stored at cache/backup and will contain the necessary files to restore the Elasticsearch index formatted nd-json files as well a complete export of the index in a set of conventional json files.

The restore functionality will expect the same zip file in cache/backup and will recreate the index from the snapshot.

BE AWARE: This will replace your current index with the one from the backup file.

Potential pitfalls

vm.max_map_count

Elastic Search in Docker requires the kernel setting of the host machine vm.max_map_count to be set to at least 262144.

To temporary set the value run:

sudo sysctl -w vm.max_map_count=262144

To apply the change permanently depends on your host operating system:

For example on Ubuntu Server add vm.max_map_count = 262144 to the file /etc/sysctl.conf.
On Arch based systems create a file /etc/sysctl.d/max_map_count.conf with the content vm.max_map_count = 262144.
On any other platform look up in the documentation on how to pass kernel parameters.

Permissions for elasticsearch

If you see a message similar to AccessDeniedException[/usr/share/elasticsearch/data/nodes] when initially starting elasticsearch, that means the container is not allowed to write files to the volume.
That's most likely the case when you run docker-compose as an unprivileged user. To fix that issue, shutdown the container and on your host machine run:

chown 1000:0 /path/to/mount/point

This will match the permissions with the UID and GID of elasticsearch within the container and should fix the issue.

Updating Tube Archivist

You will see the current version number of Tube Archivist in the footer of the interface so you can compare it with the latest release to make sure you are running the latest and greatest.

There can be breaking changes between updates, particularly as the application grows, new environment variables or settings might be required. Any breaking changes will be marked in the release notes.
All testing is done with the Elasticsearch version number as mentioned in the provided docker-compose.yml file. Running an older version of Elasticsearch is most likely not going to result in any issues, but it's still recommended to run the same version as mentioned.

Roadmap

This should be considered as a minimal viable product, there is an extensive list of future functions and improvements planned.

Functionality

Access control
User roles
Delete videos and channel
Create playlists
Podcast mode to serve channel as mp3
Implement PyFilesystem for flexible video storage
Dynamic download queue
Un-ignore videos
Backup and restore [2021-09-22]
Scan your file system to index already downloaded videos [2021-09-14]

UI

Create a github wiki for user documentation
Show similar videos on video page
Multi language support
Grid and list view for both channel and video list pages
Show total video downloaded vs total videos available in channel

Known limitations

Video files created by Tube Archivist need to be mp4 video files for best browser compatibility.
Every limitation of yt-dlp will also be present in Tube Archivist. If yt-dlp can't download or extract a video for any reason, Tube Archivist won't be able to either.
For now this is meant to be run in a trusted network environment.

Languages

Python 56%

TypeScript 39.8%

CSS 3%

Shell 0.8%

Dockerfile 0.2%

Other 0.1%