tubearchivist/README.md

![Tube Archivist](assets/tube-archivist-banner.jpg?raw=true "Tube Archivist Banner")  

<center><h1>Your self hosted YouTube media server</h1></center>


## Core functionality
* Subscribe to your favorite YouTube channels
* Download Videos using **yt-dlp**
* Index and make videos searchable
* Play videos
* Keep track of viewed and unviewed videos

## Screenshots
![home screenshot](assets/tube-archivist-screenshot-home.png?raw=true "Tube Archivist Home")  
*Home Page*

![channels screenshot](assets/tube-archivist-screenshot-channels.png?raw=true "Tube Archivist Channels")  
*All Channels*

![single channel screenshot](assets/tube-archivist-screenshot-single-channel.png?raw=true "Tube Archivist Single Channel")  
*Single Channel*

![video page screenshot](assets/tube-archivist-screenshot-video.png?raw=true "Tube Archivist Video Page")  
*Video Page*

![video page screenshot](assets/tube-archivist-screenshot-download.png?raw=true "Tube Archivist Video Page")  
*Downloads Page*
  
## Problem Tube Archivist tries to solve
Once your YouTube video collection grows, it becomes hard to search and find a specific video. That's where Tube Archivist comes in: By indexing your video collection with metadata from YouTube, you can organize, search and enjoy your archived YouTube videos without hassle offline through a convenient web interface.

## Installation
Take a look at the example `docker-compose.yml` file provided. Tube Archivist depends on three main components split up into separate docker containers:  

### Tube Archivist
The main Python application that displays and serves your video collection, built with Django.
  - Serves the interface on port `8000`
  - Needs a mandatory volume for the video archive at **/youtube**
  - And another recommended volume to save the cache for thumbnails and artwork at **/cache**.
  - The environment variables `ES_URL` and `REDIS_HOST` are needed to tell Tube Archivist where Elasticsearch and Redis respectively are located.
  - The environment variables `HOST_UID` and `HOST_GID` allows Tube Archivist to `chown` the video files to the main host system user instead of the container user.

### Elasticsearch
Stores video meta data and makes everything searchable. Also keeps track of the download queue.
  - Needs to be accessible over the default port `9200`
  - Needs a volume at **/usr/share/elasticsearch/data** to store data

Follow the [documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html) for additional installation details.

### Redis JSON
Functions as a cache and temporary link between the application and the file system. Used to store and display messages and configuration variables.
  - Needs to be accessible over the default port `6379`
  - Takes an optional volume at **/data** to make your configuration changes permanent.

## Getting Started
1. Go through the **settings** page and look at the available options. Particularly set *Download Format* to your desired video quality before downloading. **Tube Archivist** downloads the best available quality by default.
2. Subscribe to some of your favorite YouTube channels on the **channels** page. 
3. On the **downloads** page, click on *Rescan subscriptions* to add videos from the subscribed channels to your Download queue or click on *Add to download queue* to manually add Video IDs, links, channels or playlists.
4. Click on *Download queue* and let Tube Archivist to it's thing. 
5. Enjoy your archived collection!
  
## Import your existing library
So far this depends on the video you are trying to import to be still available on YouTube to get the metadata. Add the files you like to import to the */cache/import* folder. Then start the process from the settings page *Manual media files import*. Make sure to follow one of the two methods below.

### Method 1:
Add a matching *.json* file with the media file. Both files need to have the same base name, for example:
- For the media file: \<base-name>.mp4
- For the JSON file: \<base-name>.info.json
- Alternate JSON file: \<base-name>.json

**Tube Archivist** then looks for the 'id' key within the JSON file to identify the video.

### Method 2:
Detect the YouTube ID from filename, this accepts the default yt-dlp naming convention for file names like:
- \<base-name>[\<youtube-id>].mp4
- The YouTube ID in square brackets at the end of the filename is the crucial part.

### Some notes:
- This will **consume** the files you put into the import folder: Files will get converted to mp4 if needed (this might take a long time...) and moved to the archive, *.json* files will get deleted upon completion to avoid having duplicates on the next run.
- Maybe start with a subset of your files to import to make sure everything goes well...
- Follow the logs to monitor progress and errors: `docker-compose logs -f tubearchivist`.

## Backup and restore
From the settings page you can backup your metadata into a zip file. The file will get stored at *cache/backup* and will contain the necessary files to restore the Elasticsearch index formated **nd-json** files as well a complete export of the index in a set of conventional **json** files.  

The restore functionality will expect the same zip file in *cache/backup* and will recreate the index from the snapshot.  

BE AWARE: This will **replace** your current index with the one from the backup file.

## Potential pitfalls
### vm.max_map_count
**Elastic Search** in Docker requires the kernel setting of the host machine `vm.max_map_count` to be set to at least 262144.

To temporary set the value run:  
```
sudo sysctl -w vm.max_map_count=262144
```  

To apply the change permanently depends on your host operating system:  
- For example on Ubuntu Server add `vm.max_map_count = 262144` to the file */etc/sysctl.conf*.
- On Arch based systems create a file */etc/sysctl.d/max_map_count.conf* with the content `vm.max_map_count = 262144`. 
- On any other platform look up in the documentation on how to pass kernel parameters.

### Permissions for elasticsearch
If you see a message similar to `AccessDeniedException[/usr/share/elasticsearch/data/nodes]` when initially starting elasticsearch, that means the container is not allowed to write files to the volume.  
That's most likely the case when you run `docker-compose` as an unprivileged user. To fix that issue, shutdown the container and on your host machine run:
```
chown 1000:0 /path/to/mount/point
```
This will match the permissions with the **UID** and **GID** of elasticsearch within the container and should fix the issue.

## Roadmap
This should be considered as a **minimal viable product**, there is an extensive list of future functions and improvements planned.

### Functionality
- [ ] Access control
- [ ] User roles
- [ ] Delete videos and channel
- [ ] Create playlists
- [ ] Podcast mode to serve channel as mp3
- [ ] Implement [PyFilesystem](https://github.com/PyFilesystem/pyfilesystem2) for flexible video storage
- [ ] Dynamic download queue
- [ ] Un-ignore videos
- [X] Backup and restore [2021-09-22]
- [X] Scan your file system to index already downloaded videos [2021-09-14]

### UI
- [ ] Create a github wiki for user documentation
- [ ] Show similar videos on video page
- [ ] Multi language support
- [ ] Grid and list view for both channel and video list pages
- [ ] Show total video downloaded vs total videos available in channel


## Known limitations
- Video files created by Tube Archivist need to be **mp4** video files for best browser compatibility.
- Every limitation of **yt-dlp** will also be present in Tube Archivist. If **yt-dlp** can't download or extract a video for any reason, Tube Archivist won't be able to either.
- For now this is meant to be run in a trusted network environment.
method desc for media import 2021-09-14 11:33:55 +00:00			`![Tube Archivist](assets/tube-archivist-banner.jpg?raw=true "Tube Archivist Banner")`
minimal viable product 2021-09-05 17:10:14 +00:00
Readme grammar and spelling Updated grammar, spelling and punctuation 2021-09-17 01:18:03 +00:00			`<center><h1>Your self hosted YouTube media server</h1></center>`
minimal viable product 2021-09-05 17:10:14 +00:00

			`## Core functionality`
Readme grammar and spelling Updated grammar, spelling and punctuation 2021-09-17 01:18:03 +00:00			`* Subscribe to your favorite YouTube channels`
minimal viable product 2021-09-05 17:10:14 +00:00			`* Download Videos using yt-dlp`
			`* Index and make videos searchable`
			`* Play videos`
			`* Keep track of viewed and unviewed videos`

add some screenshots to readme 2021-09-09 09:47:37 +00:00			`## Screenshots`
			`![home screenshot](assets/tube-archivist-screenshot-home.png?raw=true "Tube Archivist Home")`
			`Home Page`

			`![channels screenshot](assets/tube-archivist-screenshot-channels.png?raw=true "Tube Archivist Channels")`
			`All Channels`

			`![single channel screenshot](assets/tube-archivist-screenshot-single-channel.png?raw=true "Tube Archivist Single Channel")`
			`Single Channel`

			`![video page screenshot](assets/tube-archivist-screenshot-video.png?raw=true "Tube Archivist Video Page")`
			`Video Page`

			`![video page screenshot](assets/tube-archivist-screenshot-download.png?raw=true "Tube Archivist Video Page")`
			`Downloads Page`

minimal viable product 2021-09-05 17:10:14 +00:00			`## Problem Tube Archivist tries to solve`
Readme grammar and spelling Updated grammar, spelling and punctuation 2021-09-17 01:18:03 +00:00			`Once your YouTube video collection grows, it becomes hard to search and find a specific video. That's where Tube Archivist comes in: By indexing your video collection with metadata from YouTube, you can organize, search and enjoy your archived YouTube videos without hassle offline through a convenient web interface.`
minimal viable product 2021-09-05 17:10:14 +00:00
			`## Installation`
Readme grammar and spelling Updated grammar, spelling and punctuation 2021-09-17 01:18:03 +00:00			Take a look at the example `docker-compose.yml` file provided. Tube Archivist depends on three main components split up into separate docker containers:
minimal viable product 2021-09-05 17:10:14 +00:00
			`### Tube Archivist`
			`The main Python application that displays and serves your video collection, built with Django.`
			- Serves the interface on port `8000`
			`- Needs a mandatory volume for the video archive at /youtube`
			`- And another recommended volume to save the cache for thumbnails and artwork at /cache.`
			- The environment variables `ES_URL` and `REDIS_HOST` are needed to tell Tube Archivist where Elasticsearch and Redis respectively are located.
Readme grammar and spelling Updated grammar, spelling and punctuation 2021-09-17 01:18:03 +00:00			- The environment variables `HOST_UID` and `HOST_GID` allows Tube Archivist to `chown` the video files to the main host system user instead of the container user.
minimal viable product 2021-09-05 17:10:14 +00:00
			`### Elasticsearch`
			`Stores video meta data and makes everything searchable. Also keeps track of the download queue.`
Readme grammar and spelling Updated grammar, spelling and punctuation 2021-09-17 01:18:03 +00:00			- Needs to be accessible over the default port `9200`
minimal viable product 2021-09-05 17:10:14 +00:00			`- Needs a volume at /usr/share/elasticsearch/data to store data`

			`Follow the [documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html) for additional installation details.`

			`### Redis JSON`
Readme grammar and spelling Updated grammar, spelling and punctuation 2021-09-17 01:18:03 +00:00			`Functions as a cache and temporary link between the application and the file system. Used to store and display messages and configuration variables.`
			- Needs to be accessible over the default port `6379`
minimal viable product 2021-09-05 17:10:14 +00:00			`- Takes an optional volume at /data to make your configuration changes permanent.`

			`## Getting Started`
manual import and extended roadmap after feedback 2021-09-13 15:58:27 +00:00			`1. Go through the settings page and look at the available options. Particularly set Download Format to your desired video quality before downloading. Tube Archivist downloads the best available quality by default.`
Readme grammar and spelling Updated grammar, spelling and punctuation 2021-09-17 01:18:03 +00:00			`2. Subscribe to some of your favorite YouTube channels on the channels page.`
extended getting started and fixed typo 2021-09-10 09:06:36 +00:00			`3. On the downloads page, click on Rescan subscriptions to add videos from the subscribed channels to your Download queue or click on Add to download queue to manually add Video IDs, links, channels or playlists.`
minimal viable product 2021-09-05 17:10:14 +00:00			`4. Click on Download queue and let Tube Archivist to it's thing.`
			`5. Enjoy your archived collection!`
manual import and extended roadmap after feedback 2021-09-13 15:58:27 +00:00
			`## Import your existing library`
Readme grammar and spelling Updated grammar, spelling and punctuation 2021-09-17 01:18:03 +00:00			`So far this depends on the video you are trying to import to be still available on YouTube to get the metadata. Add the files you like to import to the /cache/import folder. Then start the process from the settings page Manual media files import. Make sure to follow one of the two methods below.`
manual import and extended roadmap after feedback 2021-09-13 15:58:27 +00:00
			`### Method 1:`
			`Add a matching .json file with the media file. Both files need to have the same base name, for example:`
			`- For the media file: \<base-name>.mp4`
			`- For the JSON file: \<base-name>.info.json`
			`- Alternate JSON file: \<base-name>.json`

			`Tube Archivist then looks for the 'id' key within the JSON file to identify the video.`

			`### Method 2:`
Readme grammar and spelling Updated grammar, spelling and punctuation 2021-09-17 01:18:03 +00:00			`Detect the YouTube ID from filename, this accepts the default yt-dlp naming convention for file names like:`
manual import and extended roadmap after feedback 2021-09-13 15:58:27 +00:00			`- \<base-name>[\<youtube-id>].mp4`
Readme grammar and spelling Updated grammar, spelling and punctuation 2021-09-17 01:18:03 +00:00			`- The YouTube ID in square brackets at the end of the filename is the crucial part.`
manual import and extended roadmap after feedback 2021-09-13 15:58:27 +00:00
			`### Some notes:`
Readme grammar and spelling Updated grammar, spelling and punctuation 2021-09-17 01:18:03 +00:00			`- This will consume the files you put into the import folder: Files will get converted to mp4 if needed (this might take a long time...) and moved to the archive, .json files will get deleted upon completion to avoid having duplicates on the next run.`
method desc for media import 2021-09-14 11:33:55 +00:00			`- Maybe start with a subset of your files to import to make sure everything goes well...`
manual import and extended roadmap after feedback 2021-09-13 15:58:27 +00:00			- Follow the logs to monitor progress and errors: `docker-compose logs -f tubearchivist`.

updated roadmap and section about backup-restore 2021-09-22 07:56:29 +00:00			`## Backup and restore`
			`From the settings page you can backup your metadata into a zip file. The file will get stored at cache/backup and will contain the necessary files to restore the Elasticsearch index formated nd-json files as well a complete export of the index in a set of conventional json files.`

			`The restore functionality will expect the same zip file in cache/backup and will recreate the index from the snapshot.`

			`BE AWARE: This will replace your current index with the one from the backup file.`
minimal viable product 2021-09-05 17:10:14 +00:00
			`## Potential pitfalls`
add a section about elasticsearch permission error and english is hard 2021-09-16 08:59:20 +00:00			`### vm.max_map_count`
			Elastic Search in Docker requires the kernel setting of the host machine `vm.max_map_count` to be set to at least 262144.
minimal viable product 2021-09-05 17:10:14 +00:00
			`To temporary set the value run:`
			```
			`sudo sysctl -w vm.max_map_count=262144`
			```

			`To apply the change permanently depends on your host operating system:`
			- For example on Ubuntu Server add `vm.max_map_count = 262144` to the file /etc/sysctl.conf.
			- On Arch based systems create a file /etc/sysctl.d/max_map_count.conf with the content `vm.max_map_count = 262144`.
			`- On any other platform look up in the documentation on how to pass kernel parameters.`

add a section about elasticsearch permission error and english is hard 2021-09-16 08:59:20 +00:00			`### Permissions for elasticsearch`
			If you see a message similar to `AccessDeniedException[/usr/share/elasticsearch/data/nodes]` when initially starting elasticsearch, that means the container is not allowed to write files to the volume.
			That's most likely the case when you run `docker-compose` as an unprivileged user. To fix that issue, shutdown the container and on your host machine run:
			```
			`chown 1000:0 /path/to/mount/point`
			```
			`This will match the permissions with the UID and GID of elasticsearch within the container and should fix the issue.`
minimal viable product 2021-09-05 17:10:14 +00:00
			`## Roadmap`
Readme grammar and spelling Updated grammar, spelling and punctuation 2021-09-17 01:18:03 +00:00			`This should be considered as a minimal viable product, there is an extensive list of future functions and improvements planned.`
manual import and extended roadmap after feedback 2021-09-13 15:58:27 +00:00
			`### Functionality`
Readme grammar and spelling Updated grammar, spelling and punctuation 2021-09-17 01:18:03 +00:00			`- [ ] Access control`
minimal viable product 2021-09-05 17:10:14 +00:00			`- [ ] User roles`
			`- [ ] Delete videos and channel`
			`- [ ] Create playlists`
manual import and extended roadmap after feedback 2021-09-13 15:58:27 +00:00			`- [ ] Podcast mode to serve channel as mp3`
			`- [ ] Implement [PyFilesystem](https://github.com/PyFilesystem/pyfilesystem2) for flexible video storage`
updated roadmap and section about backup-restore 2021-09-22 07:56:29 +00:00			`- [ ] Dynamic download queue`
			`- [ ] Un-ignore videos`
			`- [X] Backup and restore [2021-09-22]`
Readme grammar and spelling Updated grammar, spelling and punctuation 2021-09-17 01:18:03 +00:00			`- [X] Scan your file system to index already downloaded videos [2021-09-14]`
manual import and extended roadmap after feedback 2021-09-13 15:58:27 +00:00
			`### UI`
updated roadmap and section about backup-restore 2021-09-22 07:56:29 +00:00			`- [ ] Create a github wiki for user documentation`
minimal viable product 2021-09-05 17:10:14 +00:00			`- [ ] Show similar videos on video page`
			`- [ ] Multi language support`
manual import and extended roadmap after feedback 2021-09-13 15:58:27 +00:00			`- [ ] Grid and list view for both channel and video list pages`
			`- [ ] Show total video downloaded vs total videos available in channel`
minimal viable product 2021-09-05 17:10:14 +00:00

			`## Known limitations`
			`- Video files created by Tube Archivist need to be mp4 video files for best browser compatibility.`
extended getting started and fixed typo 2021-09-10 09:06:36 +00:00			`- Every limitation of yt-dlp will also be present in Tube Archivist. If yt-dlp can't download or extract a video for any reason, Tube Archivist won't be able to either.`
minimal viable product 2021-09-05 17:10:14 +00:00			`- For now this is meant to be run in a trusted network environment.`