Compare commits

..

No commits in common. "master" and "v0.2.3" have entirely different histories.

526 changed files with 13978 additions and 31749 deletions

View File

@ -5,13 +5,6 @@
# Docker
.docker
# Backend development
backend/static
backend/staticfiles
# Frontend development
frontend/node_modules
# Python
tubearchivist/__pycache__/
tubearchivist/*/__pycache__/
@ -24,5 +17,8 @@ venv/
# Unneeded graphics
assets/*
# Unneeded docs
docs/*
# for local testing only
testing.sh

1
.gitattributes vendored
View File

@ -1 +0,0 @@
docker_assets\run.sh eol=lf

View File

@ -6,17 +6,15 @@ body:
- type: markdown
attributes:
value: |
Thanks for taking the time to help improve this project! Please read the [how to open an issue](https://github.com/tubearchivist/tubearchivist/blob/master/CONTRIBUTING.md#how-to-open-an-issue) guide carefully before continuing.
Thanks for taking the time to help improve this project!
- type: checkboxes
id: latest
attributes:
label: "I've read the documentation"
label: Latest and Greatest
options:
- label: I'm running the latest version of Tube Archivist and have read the [release notes](https://github.com/tubearchivist/tubearchivist/releases/latest).
required: true
- label: I have read the [how to open an issue](https://github.com/tubearchivist/tubearchivist/blob/master/CONTRIBUTING.md#how-to-open-an-issue) guide, particularly the [bug report](https://github.com/tubearchivist/tubearchivist/blob/master/CONTRIBUTING.md#bug-report) section.
required: true
- type: input
id: os

View File

@ -1,14 +1,37 @@
name: Feature Request
description: This Project currently doesn't take any new feature requests.
description: Create a new feature request
title: "[Feature Request]: "
body:
- type: checkboxes
id: block
- type: markdown
attributes:
label: "This project doesn't accept any new feature requests for the foreseeable future. There is no shortage of ideas and the next development steps are clear for years to come."
value: |
Thanks for taking the time to help improve this project!
- type: checkboxes
id: already
attributes:
label: Already implemented?
options:
- label: I understand that this issue will be closed without comment.
- label: I have read through the [wiki](https://github.com/tubearchivist/tubearchivist/wiki).
required: true
- label: I will resist the temptation and I will not submit this issue. If I submit this, I understand I might get blocked from this repo.
- label: I understand the [scope](https://github.com/tubearchivist/tubearchivist/wiki/FAQ) of this project and am aware of the [known limitations](https://github.com/tubearchivist/tubearchivist#known-limitations) and my idea is not already on the [roadmap](https://github.com/tubearchivist/tubearchivist#roadmap).
required: true
- type: textarea
id: description
attributes:
label: Your Feature Request
value: "## Is your feature request related to a problem? Please describe.\n\n## Describe the solution you'd like\n\n## Additional context"
placeholder: Tell us what you see!
validations:
required: true
- type: checkboxes
id: help
attributes:
label: Your help is needed!
description: This project is ambitious as it is, please contribute.
options:
- label: Yes I can help with this feature request!
required: false

View File

@ -1,23 +0,0 @@
name: Frontend Migration
description: Tracking our new React based frontend
title: "[Frontend Migration]: "
labels: ["react migration"]
body:
- type: dropdown
id: domain
attributes:
label: Domain
options:
- Frontend
- Backend
- Combined
validations:
required: true
- type: textarea
id: description
attributes:
label: Description
placeholder: Organizing our React frontend migration
validations:
required: true

View File

@ -13,7 +13,9 @@ body:
attributes:
label: Installation instructions
options:
- label: I have read the [how to open an issue](https://github.com/tubearchivist/tubearchivist/blob/master/CONTRIBUTING.md#how-to-open-an-issue) guide, particularly the [installation help](https://github.com/tubearchivist/tubearchivist/blob/master/CONTRIBUTING.md#installation-help) section.
- label: I have read and understand the [installation instructions](https://github.com/tubearchivist/tubearchivist#installing-and-updating).
required: true
- label: My issue is not described in the [potential pitfalls](https://github.com/tubearchivist/tubearchivist#potential-pitfalls) section.
required: true
- type: input
@ -38,6 +40,6 @@ body:
attributes:
label: Relevant log output
description: Please copy and paste any relevant Docker logs. This will be automatically formatted into code, so no need for backticks.
render: Shell
render: shell
validations:
required: true

View File

@ -1 +0,0 @@
blank_issues_enabled: false

View File

@ -1,3 +0,0 @@
Thank you for taking the time to improve this project. Please take a look at the [How to make a Pull Request](https://github.com/tubearchivist/tubearchivist/blob/master/CONTRIBUTING.md#how-to-make-a-pull-request) section to help get your contribution merged.
You can delete this text before submitting.

22
.github/workflows/lint_python.yml vendored Normal file
View File

@ -0,0 +1,22 @@
name: lint_python
on: [pull_request, push]
jobs:
lint_python:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- run: pip install --upgrade pip wheel
- run: pip install bandit black codespell flake8 flake8-bugbear
flake8-comprehensions isort
- run: black --check --diff --line-length 79 .
- run: codespell
- run: flake8 . --count --max-complexity=10 --max-line-length=79
--show-source --statistics
- run: isort --check-only --line-length 79 --profile black .
# - run: pip install -r tubearchivist/requirements.txt
# - run: mkdir --parents --verbose .mypy_cache
# - run: mypy --ignore-missing-imports --install-types --non-interactive .
# - run: python3 tubearchivist/manage.py test || true
# - run: shopt -s globstar && pyupgrade --py36-plus **/*.py || true
# - run: safety check

View File

@ -1,47 +0,0 @@
name: Lint, Test, Build, and Push Docker Image
on:
push:
branches:
- '**'
tags:
- '**'
pull_request:
branches:
- '**'
jobs:
lint:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Set up Node.js
uses: actions/setup-node@v3
with:
node-version: '23'
- name: Install frontend dependencies
run: |
cd frontend
npm install
- name: Cache pre-commit environment
uses: actions/cache@v3
with:
path: |
~/.cache/pre-commit
key: ${{ runner.os }}-pre-commit-${{ hashFiles('**/.pre-commit-config.yaml') }}
restore-keys: |
${{ runner.os }}-pre-commit-
- name: Install dependencies
run: |
pip install pre-commit
pre-commit install
- name: Run pre-commit
run: |
pre-commit run --all-files

View File

@ -1,43 +0,0 @@
name: python_unit_tests
on:
push:
paths:
- '**/*.py'
pull_request:
paths:
- '**/*.py'
jobs:
unit-tests:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y gcc libldap2-dev libsasl2-dev libssl-dev
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Cache pip
uses: actions/cache@v4
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
restore-keys: |
${{ runner.os }}-pip-
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r backend/requirements-dev.txt
- name: Run unit tests
run: pytest backend

12
.gitignore vendored
View File

@ -1,16 +1,8 @@
# python testing cache
__pycache__
.venv
# django testing
backend/static
backend/staticfiles
backend/.env
# django testing db
db.sqlite3
# vscode custom conf
.vscode
# JavaScript stuff
node_modules
.editorconfig

View File

@ -1,49 +0,0 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: end-of-file-fixer
- repo: https://github.com/psf/black
rev: 25.1.0
hooks:
- id: black
alias: python
files: ^backend/
args: ["--line-length=79"]
- repo: https://github.com/pycqa/isort
rev: 6.0.1
hooks:
- id: isort
name: isort (python)
alias: python
files: ^backend/
args: ["--profile", "black", "-l 79"]
- repo: https://github.com/pycqa/flake8
rev: 7.1.2
hooks:
- id: flake8
alias: python
files: ^backend/
args: ["--max-complexity=10", "--max-line-length=79"]
- repo: https://github.com/codespell-project/codespell
rev: v2.4.1
hooks:
- id: codespell
exclude: ^frontend/package-lock.json
- repo: https://github.com/pre-commit/mirrors-eslint
rev: v9.22.0
hooks:
- id: eslint
name: eslint
files: \.[jt]sx?$
types: [file]
entry: npm run --prefix ./frontend lint
pass_filenames: false
- repo: https://github.com/pre-commit/mirrors-prettier
rev: v4.0.0-alpha.8
hooks:
- id: prettier
entry: npm run --prefix ./frontend format
pass_filenames: false
exclude: '.*(\.svg|/migrations/).*'

View File

@ -1,207 +1,27 @@
# Contributing to Tube Archivist
## Contributing to Tube Archivist
Welcome, and thanks for showing interest in improving Tube Archivist!
If you haven't already, the best place to start is the README. This will give you an overview on what the project is all about.
## Table of Content
- [Beta Testing](#beta-testing)
- [How to open an issue](#how-to-open-an-issue)
- [Bug Report](#bug-report)
- [Feature Request](#feature-request)
- [Installation Help](#installation-help)
- [How to make a Pull Request](#how-to-make-a-pull-request)
- [Contributions beyond the scope](#contributions-beyond-the-scope)
- [User Scripts](#user-scripts)
- [Improve to the Documentation](#improve-to-the-documentation)
- [Development Environment](#development-environment)
---
## Report a bug
## Beta Testing
Be the first to help test new features/improvements and provide feedback! Regular `:unstable` builds are available for early access. These are for the tinkerers and the brave. Ideally, use a testing environment first, before upgrading your main installation.
If you notice something is not working as expected, check to see if it has been previously reported in the [open issues](https://github.com/tubearchivist/tubearchivist/issues).
If it has not yet been disclosed, go ahead and create an issue.
If the issue doesn't move forward due to a lack of response, I assume it's solved and will close it after some time to keep the list fresh.
There is always something that can get missed during development. Look at the commit messages tagged with `#build` - these are the unstable builds and give a quick overview of what has changed.
## Wiki
- Test the features mentioned, play around, try to break it.
- Test the update path by installing the `:latest` release first, then upgrade to `:unstable` to check for any errors.
- Test the unstable build on a fresh install.
Then provide feedback - even if you don't encounter any issues! You can do this in the `#beta-testing` channel on the [Discord](https://tubearchivist.com/discord) Discord server.
This helps ensure a smooth update for the stable release. Plus you get to test things out early!
## How to open an issue
Please read this carefully before opening any [issue](https://github.com/tubearchivist/tubearchivist/issues) on GitHub.
**Do**:
- Do provide details and context, this matters a lot and makes it easier for people to help.
- Do familiarize yourself with the project first, some questions answer themselves when using the project for some time. Familiarize yourself with the [Readme](https://github.com/tubearchivist/tubearchivist) and the [documentation](https://docs.tubearchivist.com/), this covers a lot of the common questions, particularly the [FAQ](https://docs.tubearchivist.com/faq/).
- Do respond to questions within a day or two so issues can progress. If the issue doesn't move forward due to a lack of response, we'll assume it's solved and we'll close it after some time to keep the list fresh.
**Don't**:
- Don't open *duplicates*, that includes open and closed issues.
- Don't open an issue for something that's already on the [roadmap](https://github.com/tubearchivist/tubearchivist#roadmap), this needs your help to implement it, not another issue.
- Don't open an issue for something that's a [known limitation](https://github.com/tubearchivist/tubearchivist#known-limitations). These are *known* by definition and don't need another reminder. Some limitations may be solved in the future, maybe by you?
- Don't overwrite the *issue template*, they are there for a reason. Overwriting that shows that you don't really care about this project. It shows that you have a misunderstanding how open source collaboration works and just want to push your ideas through. Overwriting the template may result in a ban.
### Bug Report
Bug reports are highly welcome! This project has improved a lot due to your help by providing feedback when something doesn't work as expected. The developers can't possibly cover all edge cases in an ever changing environment like YouTube and yt-dlp.
Please keep in mind:
- Docker logs are the easiest way to understand what's happening when something goes wrong, *always* provide the logs upfront.
- Set the environment variable `DJANGO_DEBUG=True` to Tube Archivist and reproduce the bug for a better log output. Don't forget to remove that variable again after.
- A bug that can't be reproduced, is difficult or sometimes even impossible to fix. Provide very clear steps *how to reproduce*.
### Feature Request
This project doesn't take any new feature requests. This project doesn't lack ideas, see the currently open tasks and roadmap. New feature requests aren't helpful at this point in time. Thank you for your understanding.
### Installation Help
GitHub is most likely not the best place to ask for installation help. That's inherently individual and one on one.
1. First step is always, help yourself. Start at the [Readme](https://github.com/tubearchivist/tubearchivist) or the additional platform specific installation pages in the [docs](https://docs.tubearchivist.com/).
2. If that doesn't answer your question, open a `#support` thread on [Discord](https://www.tubearchivist.com/discord).
3. Only if that is not an option, open an issue here.
IMPORTANT: When receiving help, contribute back to the community by improving the installation instructions with your newly gained knowledge.
---
## How to make a Pull Request
Thank you for contributing and helping improve this project. Focus for the foreseeable future is on improving and building on existing functionality, *not* on adding and expanding the application.
This is a quick checklist to help streamline the process:
- For **code changes**, make your PR against the [testing branch](https://github.com/tubearchivist/tubearchivist/tree/testing). That's where all active development happens. This simplifies the later merging into *master*, minimizes any conflicts and usually allows for easy and convenient *fast-forward* merging.
- Show off your progress, even if not yet complete, by creating a [draft](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests#draft-pull-requests) PR first and switch it as *ready* when you are ready.
- Make sure all your code is linted and formatted correctly, see below.
### Documentation Changes
All documentation is intended to represent the state of the [latest](https://github.com/tubearchivist/tubearchivist/releases/latest) release.
- If your PR with code changes also requires changes to documentation *.md files here in this repo, create a separate PR for that, so it can be merged separately at release.
- You can make the PR directly against the *master* branch.
- If your PR requires changes on the [tubearchivist/docs](https://github.com/tubearchivist/docs), make the PR over there.
- Prepare your documentation updates at the same time as the code changes, so people testing your PR can consult the prepared docs if needed.
### Code formatting and linting
This project uses the excellent [pre-commit](https://github.com/pre-commit/pre-commit) library. The [pre-commit-config.yml](https://github.com/tubearchivist/tubearchivist/blob/master/.pre-commit-config.yaml) file is part of this repo.
**Quick Start**
- Run `pre-commit install` from the root of the repo.
- Next time you commit to your local git repo, the defined hooks will run.
- On first run, this will download and install the needed environments to your local machine, that can take some time. But that will be reused on sunsequent commits.
That is also running as a Git Hub action.
---
## Contributions beyond the scope
As you have read the [FAQ](https://docs.tubearchivist.com/faq/) and the [known limitations](https://github.com/tubearchivist/tubearchivist#known-limitations) and have gotten an idea what this project tries to do, there will be some obvious shortcomings that stand out, that have been explicitly excluded from the scope of this project, at least for the time being.
Extending the scope of this project will only be feasible with more [regular contributors](https://github.com/tubearchivist/tubearchivist/graphs/contributors) that are willing to help improve this project in the long run. Contributors that have an overall improvement of the project in mind and not just about implementing this *one* thing.
Small minor additions, or making a PR for a documented feature request or bug, even if that was and will be your only contribution to this project, are always welcome and is *not* what this is about.
Beyond that, general rules to consider:
- Maintainability is key: It's not just about implementing something and being done with it, it's about maintaining it, fixing bugs as they occur, improving on it and supporting it in the long run.
- Others can do it better: Some problems have been solved by very talented developers. These things don't need to be reinvented again here in this project.
- Develop for the 80%: New features and additions *should* be beneficial for 80% of the users. If you are trying to solve your own problem that only applies to you, maybe that would be better to do in your own fork or if possible by a standalone implementation using the API.
- If all of that sounds too strict for you, as stated above, start becoming a regular contributor to this project.
---
## User Scripts
Some of you might have created useful scripts or API integrations around this project. Sharing is caring! Please add a link to your script to the Readme [here](https://github.com/tubearchivist/tubearchivist#user-scripts).
- Your repo should have a `LICENSE` file with one of the common open source licenses. People are expected to fork, adapt and build upon your great work.
- Your script should not modify the *official* files of Tube Archivist. E.g. your symlink script should build links *outside* of your `/youtube` folder. Or your fancy script that creates a beautiful artwork gallery should do that *outside* of the `/cache` folder. Modifying the *official* files and folders of TA are probably not supported.
- On the top of the repo you should have a mention and a link back to the Tube Archivist repo. Clearly state to **not** to open any issues on the main TA repo regarding your script.
- Example template:
- `[<user>/<repo>](https://linktoyourrepo.com)`: A short one line description.
---
## Improve to the Documentation
The documentation available at [docs.tubearchivist.com](https://docs.tubearchivist.com/) and is build from a separate repo [tubearchivist/docs](https://github.com/tubearchivist/docs). The Readme there has additional instructions on how to make changes.
---
The wiki is where all user functions are documented in detail. These pages are mirrored into the **docs** folder of the repo. This allows for pull requests and all other features like regular code. Make any changes there, and I'll sync them with the wiki tab.
## Development Environment
This codebase is set up to be developed natively outside of docker as well as in a docker container. Developing outside of a docker container can be convenient, as IDE and hot reload usually works out of the box. But testing inside of a container is still essential, as there are subtle differences, especially when working with the filesystem and networking between containers.
I have learned the hard way, that working on a dockerized application outside of docker is very error prone and in general not a good idea. So if you want to test your changes, it's best to run them in a docker testing environment.
Note:
- Subtitles currently fail to load with `DJANGO_DEBUG=True`, that is due to incorrect `Content-Type` error set by Django's static file implementation. That's only if you run the Django dev server, Nginx sets the correct headers.
### Native Instruction
For convenience, it's recommended to still run Redis and ES in a docker container. Make sure both containers can be reachable over the network.
Set up your virtual environment and install the requirements defined in `requirements-dev.txt`.
There are options built in to load environment variables from a file using `load_dotenv`. Example `.env` file to place in the same folder as `manage.py`:
```
TA_HOST="localhost"
TA_USERNAME=tubearchivist
TA_PASSWORD=verysecret
TA_MEDIA_DIR="static/volume/media"
TA_CACHE_DIR="static"
TA_APP_DIR="."
REDIS_CON=redis://localhost:6379
ES_URL="http://localhost:9200"
ELASTIC_PASSWORD=verysecret
TZ=America/New_York
DJANGO_DEBUG=True
```
Then look at the container startup script `run.sh`, make sure all needed migrations and startup checks ran. To start the dev backend server from the same folder as `manage.py` run:
```bash
python manage.py runserver
```
The backend will be available on [localhost:8000/api/](localhost:8000/api/).
You'll probably also want to have a Celery worker instance running, refer to `run.sh` for that. The Beat Scheduler might not be needed.
Then from the frontend folder, install the dependencies with:
```bash
npm install
```
Then to start the frontend development server:
```bash
npm run dev
```
And the frontend should be available at [localhost:3000](localhost:3000).
### Docker Instructions
Set up docker on your development machine.
Clone this repository.
Functional changes should be made against the unstable `testing` branch, so check that branch out, then make a new branch for your work.
Edit the `docker-compose.yml` file and replace the [`image: bbilly1/tubearchivist` line](https://github.com/tubearchivist/tubearchivist/blob/4af12aee15620e330adf3624c984c3acf6d0ac8b/docker-compose.yml#L7) with `build: .`. Also make any other changes to the environment variables and so on necessary to run the application, just like you're launching the application as normal.
Run `docker compose up --build`. This will bring up the application. Kill it with `ctrl-c` or by running `docker compose down` from a new terminal window in the same directory.
Make your changes locally and re-run `docker compose up --build`. The `Dockerfile` is structured in a way that the actual application code is in the last layer so rebuilding the image with only code changes utilizes the build cache for everything else and will just take a few seconds.
### Develop environment inside a VM
You may find it nice to run everything inside of a VM for complete environment snapshots and encapsulation, though this is not strictly necessary. There's a `deploy.sh` script which has some helpers for this use case:
- This assumes a standard Ubuntu Server VM with docker and docker compose already installed.
- Configure your local DNS to resolve `tubearchivist.local` to the IP of the VM.
- To deploy the latest changes and rebuild the application to the testing VM run:
This is my setup I have landed on, YMMV:
- Clone the repo, work on it with your favorite code editor in your local filesystem. *testing* branch is the where all the changes are happening, might be unstable and is WIP.
- Then I have a VM running standard Ubuntu Server LTS with docker installed. The VM keeps my projects separate and offers convenient snapshot functionality. The VM also offers ways to simulate lowend environments by limiting CPU cores and memory. You can use this [Ansible Docker Ubuntu](https://github.com/bbilly1/ansible-playbooks) playbook to get started quickly. But you could also just run docker on your host system.
- The `Dockerfile` is structured in a way that the actual application code is in the last layer so rebuilding the image with only code changes utilizes the build cache for everything else and will just take a few seconds.
- Take a look at the `deploy.sh` file. I have my local DNS resolve `tubearchivist.local` to the IP of the VM for convenience. To deploy the latest changes and rebuild the application to the testing VM run:
```bash
./deploy.sh test
```
@ -209,7 +29,7 @@ You may find it nice to run everything inside of a VM for complete environment s
- The `test` argument takes another optional argument to build for a specific architecture valid options are: `amd64`, `arm64` and `multi`, default is `amd64`.
- This `deploy.sh` script is not meant to be universally usable for every possible environment but could serve as an idea on how to automatically rebuild containers to test changes - customize to your liking.
### Working with Elasticsearch
## Working with Elasticsearch
Additionally to the required services as listed in the example docker-compose file, the **Dev Tools** of [Kibana](https://www.elastic.co/guide/en/kibana/current/docker.html) are invaluable for running and testing Elasticsearch queries.
**Quick start**
@ -220,31 +40,41 @@ bin/elasticsearch-service-tokens create elastic/kibana kibana
Example docker compose, use same version as for Elasticsearch:
```yml
services:
kibana:
image: docker.elastic.co/kibana/kibana:0.0.0
container_name: kibana
environment:
kibana:
image: docker.elastic.co/kibana/kibana:0.0.0
container_name: kibana
environment:
- "ELASTICSEARCH_HOSTS=http://archivist-es:9200"
- "ELASTICSEARCH_SERVICEACCOUNTTOKEN=<your-token-here>"
ports:
ports:
- "5601:5601"
```
If you want to run queries on the Elasticsearch container directly from your host with for example `curl` or something like *postman*, you might want to **publish** the port 9200 instead of just **exposing** it.
**Persist Token**
The token will get stored in ES in the `config` folder, and not in the `data` folder. To persist the token between ES container rebuilds, you'll need to persist the config folder as an additional volume:
## Implementing a new feature
1. Create the token as described above
2. While the container is running, copy the current config folder out of the container, e.g.:
```
docker cp archivist-es:/usr/share/elasticsearch/config/ volume/es_config
```
3. Then stop all containers and mount this folder into the container as an additional volume:
```yml
- ./volume/es_config:/usr/share/elasticsearch/config
```
4. Start all containers back up.
Do you see anything on the roadmap that you would like to take a closer look at but you are not sure, what's the best way to tackle that? Or anything not on there yet you'd like to implement but are not sure how? Reach out on Discord and we'll look into it together.
Now your token will persist between ES container rebuilds.
## Making changes
To fix a bug or implement a feature, fork the repository and make all changes to the testing branch. When ready, create a pull request.
## Releases
There are three different docker tags:
- **latest**: As the name implies is the latest multiarch release for regular usage.
- **unstable**: Intermediate amd64 builds for quick testing and improved collaboration. Don't mix with a *latest* installation, for your testing environment only. This is untested and WIP and will have breaking changes between commits that might require a reset to resolve.
- **semantic versioning**: There will be a handful named version tags that will also have a matching release and tag on github.
If you want to see what's in your container, checkout the matching release tag. A merge to **master** usually means a *latest* or *unstable* release. If you want to preview changes in your testing environment, pull the *unstable* tag or clone the repository and build the docker container with the Dockerfile from the **testing** branch.
## Code formatting and linting
To keep things clean and consistent for everybody, there is a github action setup to lint and check the changes. You can test your code locally first if you want. For example if you made changes in the **video** module, run
```shell
./deploy.sh validate tubearchivist/home/src/index/video.py
```
to validate your changes. If you omit the path, all the project files will get checked. This is subject to change as the codebase improves.

View File

@ -1,66 +1,56 @@
# multi stage to build tube archivist
# build python wheel, download and extract ffmpeg, copy into final image
# first stage to build python wheel, copy into final image
FROM node:lts-alpine AS npm-builder
COPY frontend/package.json frontend/package-lock.json /
RUN npm i
FROM node:lts-alpine AS node-builder
# RUN npm config set registry https://registry.npmjs.org/
COPY --from=npm-builder ./node_modules /frontend/node_modules
COPY ./frontend /frontend
WORKDIR /frontend
RUN npm run build:deploy
WORKDIR /
# First stage to build python wheel
FROM python:3.11.13-slim-bookworm AS builder
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential gcc libldap2-dev libsasl2-dev libssl-dev git
# install requirements
COPY ./backend/requirements.txt /requirements.txt
RUN pip install --user -r requirements.txt
# build ffmpeg
FROM python:3.11.13-slim-bookworm AS ffmpeg-builder
FROM python:3.10.8-slim-bullseye AS builder
ARG TARGETPLATFORM
COPY docker_assets/ffmpeg_download.py ffmpeg_download.py
RUN python ffmpeg_download.py $TARGETPLATFORM
RUN apt-get update
RUN apt-get install -y --no-install-recommends build-essential gcc libldap2-dev libsasl2-dev libssl-dev
# install requirements
COPY ./tubearchivist/requirements.txt /requirements.txt
RUN pip install --user -r requirements.txt
# build final image
FROM python:3.11.13-slim-bookworm AS tubearchivist
FROM python:3.10.8-slim-bullseye as tubearchivist
ARG TARGETPLATFORM
ARG INSTALL_DEBUG
ENV PYTHONUNBUFFERED=1
ENV PYTHONUNBUFFERED 1
# copy build requirements
COPY --from=builder /root/.local /root/.local
ENV PATH=/root/.local/bin:$PATH
# copy ffmpeg
COPY --from=ffmpeg-builder ./ffmpeg/ffmpeg /usr/bin/ffmpeg
COPY --from=ffmpeg-builder ./ffprobe/ffprobe /usr/bin/ffprobe
# install distro packages needed
RUN apt-get clean && apt-get -y update && apt-get -y install --no-install-recommends \
nginx \
atomicparsley \
curl && rm -rf /var/lib/apt/lists/*
curl \
xz-utils && rm -rf /var/lib/apt/lists/*
# get newest patched ffmpeg and ffprobe builds for amd64 fall back to repo ffmpeg for arm64
RUN if [ "$TARGETPLATFORM" = "linux/amd64" ] ; then \
curl -s https://api.github.com/repos/yt-dlp/FFmpeg-Builds/releases/latest \
| grep browser_download_url \
| grep ".*master.*linux64.*tar.xz" \
| cut -d '"' -f 4 \
| xargs curl -L --output ffmpeg.tar.xz && \
tar -xf ffmpeg.tar.xz --strip-components=2 --no-anchored -C /usr/bin/ "ffmpeg" && \
tar -xf ffmpeg.tar.xz --strip-components=2 --no-anchored -C /usr/bin/ "ffprobe" && \
rm ffmpeg.tar.xz \
; elif [ "$TARGETPLATFORM" = "linux/arm64" ] ; then \
apt-get -y update && apt-get -y install --no-install-recommends ffmpeg && rm -rf /var/lib/apt/lists/* \
; fi
# install debug tools for testing environment
RUN if [ "$INSTALL_DEBUG" ] ; then \
apt-get -y update && apt-get -y install --no-install-recommends \
vim htop bmon net-tools iputils-ping procps lsof \
&& pip install --user ipython pytest pytest-django \
apt-get -y update && apt-get -y install --no-install-recommends \
vim htop bmon net-tools iputils-ping procps \
&& pip install --user ipython \
; fi
# make folders
@ -71,12 +61,9 @@ COPY docker_assets/nginx.conf /etc/nginx/sites-available/default
RUN sed -i 's/^user www\-data\;$/user root\;/' /etc/nginx/nginx.conf
# copy application into container
COPY ./backend /app
COPY ./tubearchivist /app
COPY ./docker_assets/run.sh /app
COPY ./docker_assets/backend_start.py /app
COPY ./docker_assets/beat_auto_spawn.sh /app
COPY --from=node-builder ./frontend/dist /app/static
COPY ./docker_assets/uwsgi.ini /app
# volumes
VOLUME /cache

295
README.md
View File

@ -1,184 +1,195 @@
![Tube Archivist](assets/tube-archivist-front.jpg?raw=true "Tube Archivist Banner")
[*more screenshots and video*](SHOWCASE.MD)
![Tube Archivist](assets/tube-archivist-banner.jpg?raw=true "Tube Archivist Banner")
<h1 align="center">Your self hosted YouTube media server</h1>
<div align="center">
<a href="https://hub.docker.com/r/bbilly1/tubearchivist" target="_blank"><img src="https://tiles.tilefy.me/t/tubearchivist-docker.png" alt="tubearchivist-docker" title="Tube Archivist Docker Pulls" height="50" width="190"/></a>
<a href="https://github.com/tubearchivist/tubearchivist/stargazers" target="_blank"><img src="https://tiles.tilefy.me/t/tubearchivist-github-star.png" alt="tubearchivist-github-star" title="Tube Archivist GitHub Stars" height="50" width="190"/></a>
<a href="https://github.com/tubearchivist/tubearchivist/forks" target="_blank"><img src="https://tiles.tilefy.me/t/tubearchivist-github-forks.png" alt="tubearchivist-github-forks" title="Tube Archivist GitHub Forks" height="50" width="190"/></a>
<a href="https://www.tubearchivist.com/discord" target="_blank"><img src="https://tiles.tilefy.me/t/tubearchivist-discord.png" alt="tubearchivist-discord" title="TA Discord Server Members" height="50" width="190"/></a>
<a href="https://github.com/bbilly1/tilefy" target="_blank"><img src="https://tiles.tilefy.me/t/tubearchivist-docker.png" alt="tubearchivist-docker" title="Tube Archivist Docker Pulls" height="50" width="200"/></a>
<a href="https://github.com/bbilly1/tilefy" target="_blank"><img src="https://tiles.tilefy.me/t/tubearchivist-github-star.png" alt="tubearchivist-github-star" title="Tube Archivist GitHub Stars" height="50" width="200"/></a>
<a href="https://github.com/bbilly1/tilefy" target="_blank"><img src="https://tiles.tilefy.me/t/tubearchivist-github-forks.png" alt="tubearchivist-github-forks" title="Tube Archivist GitHub Forks" height="50" width="200"/></a>
</div>
## Table of contents:
* [Docs](https://docs.tubearchivist.com/) with [FAQ](https://docs.tubearchivist.com/faq/), and API documentation
* [Wiki](https://github.com/tubearchivist/tubearchivist/wiki) with [FAQ](https://github.com/tubearchivist/tubearchivist/wiki/FAQ)
* [Core functionality](#core-functionality)
* [Resources](#resources)
* [Installing](#installing)
* [Screenshots](#screenshots)
* [Problem Tube Archivist tries to solve](#problem-tube-archivist-tries-to-solve)
* [Connect](#connect)
* [Extended Universe](#extended-universe)
* [Installing and updating](#installing-and-updating)
* [Getting Started](#getting-started)
* [Known limitations](#known-limitations)
* [Port Collisions](#port-collisions)
* [Common Errors](#common-errors)
* [Potential pitfalls](#potential-pitfalls)
* [Roadmap](#roadmap)
* [Known limitations](#known-limitations)
* [Donate](#donate)
------------------------
## Core functionality
Once your YouTube video collection grows, it becomes hard to search and find a specific video. That's where Tube Archivist comes in: By indexing your video collection with metadata from YouTube, you can organize, search and enjoy your archived YouTube videos without hassle offline through a convenient web interface. This includes:
* Subscribe to your favorite YouTube channels
* Download Videos using **[yt-dlp](https://github.com/yt-dlp/yt-dlp)**
* Download Videos using **yt-dlp**
* Index and make videos searchable
* Play videos
* Keep track of viewed and unviewed videos
## Resources
- [Discord](https://www.tubearchivist.com/discord): Connect with us on our Discord server.
## Tube Archivist on YouTube
[![ibracorp-youtube-video-thumb](assets/tube-archivist-ibracorp-O8H8Z01c0Ys.jpg)](https://www.youtube.com/watch?v=O8H8Z01c0Ys)
## Screenshots
![home screenshot](assets/tube-archivist-screenshot-home.png?raw=true "Tube Archivist Home")
*Home Page*
![channels screenshot](assets/tube-archivist-screenshot-channels.png?raw=true "Tube Archivist Channels")
*All Channels*
![single channel screenshot](assets/tube-archivist-screenshot-single-channel.png?raw=true "Tube Archivist Single Channel")
*Single Channel*
![video page screenshot](assets/tube-archivist-screenshot-video.png?raw=true "Tube Archivist Video Page")
*Video Page*
![video page screenshot](assets/tube-archivist-screenshot-download.png?raw=true "Tube Archivist Video Page")
*Downloads Page*
## Problem Tube Archivist tries to solve
Once your YouTube video collection grows, it becomes hard to search and find a specific video. That's where Tube Archivist comes in: By indexing your video collection with metadata from YouTube, you can organize, search and enjoy your archived YouTube videos without hassle offline through a convenient web interface.
## Connect
- [Discord](https://discord.gg/AFwz8nE7BK): Connect with us on our Discord server.
- [r/TubeArchivist](https://www.reddit.com/r/TubeArchivist/): Join our Subreddit.
## Extended Universe
- [Browser Extension](https://github.com/tubearchivist/browser-extension) Tube Archivist Companion, for [Firefox](https://addons.mozilla.org/addon/tubearchivist-companion/) and [Chrome](https://chrome.google.com/webstore/detail/tubearchivist-companion/jjnkmicfnfojkkgobdfeieblocadmcie)
- [Jellyfin Plugin](https://github.com/tubearchivist/tubearchivist-jf-plugin): Add your videos to Jellyfin
- [Plex Plugin](https://github.com/tubearchivist/tubearchivist-plex): Add your videos to Plex
- [Tube Archivist Metrics](https://github.com/tubearchivist/tubearchivist-metrics) to create statistics in Prometheus/OpenMetrics format.
## Installing
For minimal system requirements, the Tube Archivist stack needs around 2GB of available memory for a small testing setup and around 4GB of available memory for a mid to large sized installation. Minimal with dual core with 4 threads, better quad core plus.
This project requires docker. Ensure it is installed and running on your system.
## Installing and updating
Take a look at the example `docker-compose.yml` file provided. Use the *latest* or the named semantic version tag. The *unstable* tag is for intermediate testing and as the name implies, is **unstable** and not be used on your main installation but in a [testing environment](CONTRIBUTING.md).
The documentation has additional user provided instructions for [Unraid](https://docs.tubearchivist.com/installation/unraid/), [Synology](https://docs.tubearchivist.com/installation/synology/) and [Podman](https://docs.tubearchivist.com/installation/podman/).
For minimal system requirements, the Tube Archivist stack needs around 2GB of available memory for a small testing setup and around 4GB of available memory for a mid to large sized installation.
The instructions here should get you up and running quickly, for Docker beginners and full explanation about each environment variable, see the [docs](https://docs.tubearchivist.com/installation/docker-compose/).
Tube Archivist depends on three main components split up into separate docker containers:
Take a look at the example [docker-compose.yml](https://github.com/tubearchivist/tubearchivist/blob/master/docker-compose.yml) and configure the required environment variables.
### Tube Archivist
The main Python application that displays and serves your video collection, built with Django.
- Serves the interface on port `8000`
- Needs a volume for the video archive at **/youtube**
- And another volume to save application data at **/cache**.
- The environment variables `ES_URL` and `REDIS_HOST` are needed to tell Tube Archivist where Elasticsearch and Redis respectively are located.
- The environment variables `HOST_UID` and `HOST_GID` allows Tube Archivist to `chown` the video files to the main host system user instead of the container user. Those two variables are optional, not setting them will disable that functionality. That might be needed if the underlying filesystem doesn't support `chown` like *NFS*.
- Set the environment variable `TA_HOST` to match with the system running Tube Archivist. This can be a domain like *example.com*, a subdomain like *ta.example.com* or an IP address like *192.168.1.20*, add without the protocol and without the port. You can add multiple hostnames separated with a space. Any wrong configurations here will result in a `Bad Request (400)` response.
- Change the environment variables `TA_USERNAME` and `TA_PASSWORD` to create the initial credentials.
- `ELASTIC_PASSWORD` is for the password for Elasticsearch. The environment variable `ELASTIC_USER` is optional, should you want to change the username from the default *elastic*.
- For the scheduler to know what time it is, set your timezone with the `TZ` environment variable, defaults to *UTC*.
All environment variables are explained in detail in the docs [here](https://docs.tubearchivist.com/installation/env-vars/).
**TubeArchivist**:
| Environment Var | Value | |
| ----------- | ----------- | ----------- |
| TA_HOST | Server IP or hostname `http://tubearchivist.local:8000` | Required |
| TA_USERNAME | Initial username when logging into TA | Required |
| TA_PASSWORD | Initial password when logging into TA | Required |
| ELASTIC_PASSWORD | Password for ElasticSearch | Required |
| REDIS_CON | Connection string to Redis | Required |
| TZ | Set your timezone for the scheduler | Required |
| TA_PORT | Overwrite Nginx port | Optional |
| TA_BACKEND_PORT | Overwrite container internal backend server port | Optional |
| TA_ENABLE_AUTH_PROXY | Enables support for forwarding auth in reverse proxies | [Read more](https://docs.tubearchivist.com/configuration/forward-auth/) |
| TA_AUTH_PROXY_USERNAME_HEADER | Header containing username to log in | Optional |
| TA_AUTH_PROXY_LOGOUT_URL | Logout URL for forwarded auth | Optional |
| ES_URL | URL That ElasticSearch runs on | Optional |
| ES_DISABLE_VERIFY_SSL | Disable ElasticSearch SSL certificate verification | Optional |
| ES_SNAPSHOT_DIR | Custom path where elastic search stores snapshots for master/data nodes | Optional |
| HOST_GID | Allow TA to own the video files instead of container user | Optional |
| HOST_UID | Allow TA to own the video files instead of container user | Optional |
| ELASTIC_USER | Change the default ElasticSearch user | Optional |
| TA_LDAP | Configure TA to use LDAP Authentication | [Read more](https://docs.tubearchivist.com/configuration/ldap/) |
| DISABLE_STATIC_AUTH | Remove authentication from media files, (Google Cast...) | [Read more](https://docs.tubearchivist.com/installation/env-vars/#disable_static_auth) |
| TA_AUTO_UPDATE_YTDLP | Configure TA to automatically install the latest yt-dlp on container start | Optional |
| DJANGO_DEBUG | Return additional error messages, for debug only | Optional |
| TA_LOGIN_AUTH_MODE | Configure the order of login authentication backends (Default: single) | Optional |
| TA_LOGIN_AUTH_MODE value | Description |
| ------------------------ | ----------- |
| single | Only use a single backend (default, or LDAP, or Forward auth, selected by TA_LDAP or TA_ENABLE_AUTH_PROXY) |
| local | Use local password database only |
| ldap | Use LDAP backend only |
| forwardauth | Use reverse proxy headers only |
| ldap_local | Use LDAP backend in addition to the local password database |
**ElasticSearch**
| Environment Var | Value | State |
| ----------- | ----------- | ----------- |
| ELASTIC_PASSWORD | Matching password `ELASTIC_PASSWORD` from TubeArchivist | Required |
| http.port | Change the port ElasticSearch runs on | Optional |
## Update
Always use the *latest* (the default) or a named semantic version tag for the docker images. The *unstable* tags are only for your testing environment, there might not be an update path for these testing builds.
You will see the current version number of **Tube Archivist** in the footer of the interface. There is a daily version check task querying tubearchivist.com, notifying you of any new releases in the footer. To update, you need to update the docker images, the method for which will depend on your platform. For example, if you're using `docker-compose`, run `docker-compose pull` and then restart with `docker-compose up -d`. After updating, check the footer to verify you are running the expected version.
- This project is tested for updates between one or two releases maximum. Further updates back may or may not be supported and you might have to reset your index and configurations to update. Ideally apply new updates at least once per month.
- There can be breaking changes between updates, particularly as the application grows, new environment variables or settings might be required for you to set in the your docker-compose file. *Always* check the **release notes**: Any breaking changes will be marked there.
- All testing and development is done with the Elasticsearch version number as mentioned in the provided *docker-compose.yml* file. This will be updated when a new release of Elasticsearch is available. Running an older version of Elasticsearch is most likely not going to result in any issues, but it's still recommended to run the same version as mentioned. Use `bbilly1/tubearchivist-es` to automatically get the recommended version.
## Getting Started
1. Go through the **settings** page and look at the available options. Particularly set *Download Format* to your desired video quality before downloading. **Tube Archivist** downloads the best available quality by default. To support iOS or MacOS and some other browsers a compatible format must be specified. For example:
```
bestvideo[vcodec*=avc1]+bestaudio[acodec*=mp4a]/mp4
```
2. Subscribe to some of your favorite YouTube channels on the **channels** page.
3. On the **downloads** page, click on *Rescan subscriptions* to add videos from the subscribed channels to your Download queue or click on *Add to download queue* to manually add Video IDs, links, channels or playlists.
4. Click on *Start download* and let **Tube Archivist** to it's thing.
5. Enjoy your archived collection!
### Port Collisions
### Port collisions
If you have a collision on port `8000`, best solution is to use dockers *HOST_PORT* and *CONTAINER_PORT* distinction: To for example change the interface to port 9000 use `9000:8000` in your docker-compose file.
For more information on port collisions, check the docs.
Should that not be an option, the Tube Archivist container takes these two additional environment variables:
- **TA_PORT**: To actually change the port where nginx listens, make sure to also change the ports value in your docker-compose file.
- **TA_UWSGI_PORT**: To change the default uwsgi port 8080 used for container internal networking between uwsgi serving the django application and nginx.
## Common Errors
Here is a list of common errors and their solutions.
Changing any of these two environment variables will change the files *nginx.conf* and *uwsgi.ini* at startup using `sed` in your container.
### `vm.max_map_count`
### LDAP Authentication
You can configure LDAP with the following environment variables:
- `TA_LDAP` (ex: `true`) Set to anything besides empty string to use LDAP authentication **instead** of local user authentication.
- `TA_LDAP_SERVER_URI` (ex: `ldap://ldap-server:389`) Set to the uri of your LDAP server.
- `TA_LDAP_DISABLE_CERT_CHECK` (ex: `true`) Set to anything besides empty string to disable certificate checking when connecting over LDAPS.
- `TA_LDAP_BIND_DN` (ex: `uid=search-user,ou=users,dc=your-server`) DN of the user that is able to perform searches on your LDAP account.
- `TA_LDAP_BIND_PASSWORD` (ex: `yoursecretpassword`) Password for the search user.
- `TA_LDAP_USER_BASE` (ex: `ou=users,dc=your-server`) Search base for user filter.
- `TA_LDAP_USER_FILTER` (ex: `(objectClass=user)`) Filter for valid users. Login usernames are automatically matched using `uid` and does not need to be specified in this filter.
When LDAP authentication is enabled, django passwords (e.g. the password defined in TA_PASSWORD), will not allow you to login, only the LDAP server is used.
### Elasticsearch
**Note**: Tube Archivist depends on Elasticsearch 8.
Use `bbilly1/tubearchivist-es` to automatically get the recommended version, or use the official image with the version tag in the docker-compose file.
Stores video meta data and makes everything searchable. Also keeps track of the download queue.
- Needs to be accessible over the default port `9200`
- Needs a volume at **/usr/share/elasticsearch/data** to store data
Follow the [documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html) for additional installation details.
### Redis JSON
Functions as a cache and temporary link between the application and the file system. Used to store and display messages and configuration variables.
- Needs to be accessible over the default port `6379`
- Needs a volume at **/data** to make your configuration changes permanent.
### Redis on a custom port
For some architectures it might be required to run Redis JSON on a nonstandard port. To for example change the Redis port to **6380**, set the following values:
- Set the environment variable `REDIS_PORT=6380` to the *tubearchivist* service.
- For the *archivist-redis* service, change the ports to `6380:6380`
- Additionally set the following value to the *archivist-redis* service: `command: --port 6380 --loadmodule /usr/lib/redis/modules/rejson.so`
### Updating Tube Archivist
You will see the current version number of **Tube Archivist** in the footer of the interface so you can compare it with the latest release to make sure you are running the *latest and greatest*.
* There can be breaking changes between updates, particularly as the application grows, new environment variables or settings might be required for you to set in the your docker-compose file. *Always* check the **release notes**: Any breaking changes will be marked there.
* All testing and development is done with the Elasticsearch version number as mentioned in the provided *docker-compose.yml* file. This will be updated when a new release of Elasticsearch is available. Running an older version of Elasticsearch is most likely not going to result in any issues, but it's still recommended to run the same version as mentioned. Use `bbilly1/tubearchivist-es` to automatically get the recommended version.
### Alternative installation instructions:
- **arm64**: The Tube Archivist container is multi arch, so is Elasticsearch. RedisJSON doesn't offer arm builds, you can use `bbilly1/rejson`, an unofficial rebuild for arm64.
- **Helm Chart**: There is a Helm Chart available at https://github.com/insuusvenerati/helm-charts. Mostly self-explanatory but feel free to ask questions in the discord / subreddit.
- **Wiki**: There are additional helpful installation instructions in the [wiki](https://github.com/tubearchivist/tubearchivist/wiki/Installation) for Unraid, Truenas and Synology.
## Potential pitfalls
### vm.max_map_count
**Elastic Search** in Docker requires the kernel setting of the host machine `vm.max_map_count` to be set to at least 262144.
To temporary set the value run:
```
sudo sysctl -w vm.max_map_count=262144
```
To apply the change permanently depends on your host operating system:
- For example on Ubuntu Server add `vm.max_map_count = 262144` to the file `/etc/sysctl.conf`.
- On Arch based systems create a file `/etc/sysctl.d/max_map_count.conf` with the content `vm.max_map_count = 262144`.
- On any other platform look up in the documentation on how to pass kernel parameters.
- For example on Ubuntu Server add `vm.max_map_count = 262144` to the file */etc/sysctl.conf*.
- On Arch based systems create a file */etc/sysctl.d/max_map_count.conf* with the content `vm.max_map_count = 262144`.
- On any other platform look up in the documentation on how to pass kernel parameters.
### Permissions for elasticsearch
If you see a message similar to `Unable to access 'path.repo' (/usr/share/elasticsearch/data/snapshot)` or `failed to obtain node locks, tried [/usr/share/elasticsearch/data]` and `maybe these locations are not writable` when initially starting elasticsearch, that probably means the container is not allowed to write files to the volume.
If you see a message similar to `failed to obtain node locks, tried [/usr/share/elasticsearch/data]` and `maybe these locations are not writable` when initially starting elasticsearch, that probably means the container is not allowed to write files to the volume.
To fix that issue, shutdown the container and on your host machine run:
```
chown 1000:0 -R /path/to/mount/point
```
This will match the permissions with the **UID** and **GID** of elasticsearch process within the container and should fix the issue.
### Disk usage
The Elasticsearch index will turn to ***read only*** if the disk usage of the container goes above 95% until the usage drops below 90% again, you will see error messages like `disk usage exceeded flood-stage watermark`.
The Elasticsearch index will turn to *read only* if the disk usage of the container goes above 95% until the usage drops below 90% again, you will see error messages like `disk usage exceeded flood-stage watermark`, [link](https://github.com/tubearchivist/tubearchivist#disk-usage).
Similar to that, TubeArchivist will become all sorts of messed up when running out of disk space. There are some error messages in the logs when that happens, but it's best to make sure to have enough disk space before starting to download.
## `error setting rlimit`
If you are seeing errors like `failed to create shim: OCI runtime create failed` and `error during container init: error setting rlimits`, this means docker can't set these limits, usually because they are set at another place or are incompatible because of other reasons. Solution is to remove the `ulimits` key from the ES container in your docker compose and start again.
This can happen if you have nested virtualizations, e.g. LXC running Docker in Proxmox.
## Known limitations
- Video files created by Tube Archivist need to be playable in your browser of choice. Not every codec is compatible with every browser and might require some testing with format selection.
- Every limitation of **yt-dlp** will also be present in Tube Archivist. If **yt-dlp** can't download or extract a video for any reason, Tube Archivist won't be able to either.
- There is no flexibility in naming of the media files.
## Getting Started
1. Go through the **settings** page and look at the available options. Particularly set *Download Format* to your desired video quality before downloading. **Tube Archivist** downloads the best available quality by default. To support iOS or MacOS and some other browsers a compatible format must be specified. For example:
```
bestvideo[VCODEC=avc1]+bestaudio[ACODEC=mp4a]/mp4
```
2. Subscribe to some of your favorite YouTube channels on the **channels** page.
3. On the **downloads** page, click on *Rescan subscriptions* to add videos from the subscribed channels to your Download queue or click on *Add to download queue* to manually add Video IDs, links, channels or playlists.
4. Click on *Start download* and let **Tube Archivist** to it's thing.
5. Enjoy your archived collection!
## Roadmap
We have come far, nonetheless we are not short of ideas on how to improve and extend this project. Issues waiting for you to be tackled in no particular order:
- [ ] Audio download
- [ ] User roles
- [ ] Podcast mode to serve channel as mp3
- [ ] Random and repeat controls ([#108](https://github.com/tubearchivist/tubearchivist/issues/108), [#220](https://github.com/tubearchivist/tubearchivist/issues/220))
- [ ] Implement [PyFilesystem](https://github.com/PyFilesystem/pyfilesystem2) for flexible video storage
- [ ] Implement [Apprise](https://github.com/caronc/apprise) for notifications ([#97](https://github.com/tubearchivist/tubearchivist/issues/97))
- [ ] User created playlists, random and repeat controls ([#108](https://github.com/tubearchivist/tubearchivist/issues/108), [#220](https://github.com/tubearchivist/tubearchivist/issues/220))
- [ ] Auto play or play next link ([#226](https://github.com/tubearchivist/tubearchivist/issues/226))
- [ ] Show similar videos on video page
- [ ] Multi language support
- [ ] Show total video downloaded vs total videos available in channel
- [ ] Download or Ignore videos by keyword ([#163](https://github.com/tubearchivist/tubearchivist/issues/163))
- [ ] Add statistics of index
- [ ] Download speed schedule ([#198](https://github.com/tubearchivist/tubearchivist/issues/198))
- [ ] Auto ignore videos by keyword ([#163](https://github.com/tubearchivist/tubearchivist/issues/163))
- [ ] Custom searchable notes to videos, channels, playlists ([#144](https://github.com/tubearchivist/tubearchivist/issues/144))
- [ ] Search comments
- [ ] Search download queue
- [ ] Per user videos/channel/playlists
- [ ] Download video comments
Implemented:
- [X] Configure shorts, streams and video sizes per channel [2024-07-15]
- [X] User created playlists [2024-04-10]
- [X] User roles, aka read only user [2023-11-10]
- [X] Add statistics of index [2023-09-03]
- [X] Implement [Apprise](https://github.com/caronc/apprise) for notifications [2023-08-05]
- [X] Download video comments [2022-11-30]
- [X] Show similar videos on video page [2022-11-30]
- [X] Implement complete offline media file import from json file [2022-08-20]
- [X] Filter and query in search form, search by url query [2022-07-23]
- [X] Make items in grid row configurable to use more of the screen [2022-06-04]
@ -200,20 +211,11 @@ Implemented:
- [X] Backup and restore [2021-09-22]
- [X] Scan your file system to index already downloaded videos [2021-09-14]
## User Scripts
This is a list of useful user scripts, generously created from folks like you to extend this project and its functionality. Make sure to check the respective repository links for detailed license information.
## Known limitations
- Video files created by Tube Archivist need to be playable in your browser of choice. Not every codec is compatible with every browser and might require some testing with format selection.
- Every limitation of **yt-dlp** will also be present in Tube Archivist. If **yt-dlp** can't download or extract a video for any reason, Tube Archivist won't be able to either.
- There is currently no flexibility in naming of the media files.
This is your time to shine, [read this](https://github.com/tubearchivist/tubearchivist/blob/master/CONTRIBUTING.md#user-scripts) then open a PR to add your script here.
- [danieljue/ta_dl_page_script](https://github.com/danieljue/ta_dl_page_script): Helper browser script to prioritize a channels' videos in download queue.
- [dot-mike/ta-scripts](https://github.com/dot-mike/ta-scripts): A collection of personal scripts for managing TubeArchivist.
- [DarkFighterLuke/ta_base_url_nginx](https://gist.github.com/DarkFighterLuke/4561b6bfbf83720493dc59171c58ac36): Set base URL with Nginx when you can't use subdomains.
- [lamusmaser/ta_migration_helper](https://github.com/lamusmaser/ta_migration_helper): Advanced helper script for migration issues to TubeArchivist v0.4.4 or later.
- [lamusmaser/create_info_json](https://gist.github.com/lamusmaser/837fb58f73ea0cad784a33497932e0dd): Script to generate `.info.json` files using `ffmpeg` collecting information from downloaded videos.
- [lamusmaser/ta_fix_for_video_redirection](https://github.com/lamusmaser/ta_fix_for_video_redirection): Script to fix videos that were incorrectly indexed by YouTube's "Video is Unavailable" response.
- [RoninTech/ta-helper](https://github.com/RoninTech/ta-helper): Helper script to provide a symlink association to reference TubeArchivist videos with their original titles.
- [tangyjoust/Tautulli-Notify-TubeArchivist-of-Plex-Watched-State](https://github.com/tangyjoust/Tautulli-Notify-TubeArchivist-of-Plex-Watched-State) Mark videos watched in Plex (through streaming not manually) through Tautulli back to TubeArchivist
- [Dhs92/delete_shorts](https://github.com/Dhs92/delete_shorts): A script to delete ALL YouTube Shorts from TubeArchivist
## Donate
The best donation to **Tube Archivist** is your time, take a look at the [contribution page](CONTRIBUTING.md) to get started.
@ -223,18 +225,11 @@ Second best way to support the development is to provide for caffeinated beverag
* [Paypal Subscription](https://www.paypal.com/webapps/billing/plans/subscribe?plan_id=P-03770005GR991451KMFGVPMQ) for a monthly coffee
* [ko-fi.com](https://ko-fi.com/bbilly1) for an alternative platform
## Notable mentions
This is a selection of places where this project has been featured on reddit, in the news, blogs or any other online media, newest on top.
* **xda-developers.com**: 5 obscure self-hosted services worth checking out - Tube Archivist - To save your essential YouTube videos, [2024-10-13][[link](https://www.xda-developers.com/obscure-self-hosted-services/)]
* **selfhosted.show**: why we're trying Tube Archivist, [2024-06-14][[link](https://selfhosted.show/125)]
* **ycombinator**: Tube Archivist on Hackernews front page, [2023-07-16][[link](https://news.ycombinator.com/item?id=36744395)]
* **linux-community.de**: Tube Archivist bringt Ordnung in die Youtube-Sammlung, [German][2023-05-01][[link](https://www.linux-community.de/ausgaben/linuxuser/2023/05/tube-archivist-bringt-ordnung-in-die-youtube-sammlung/)]
* **noted.lol**: Dev Debrief, An Interview With the Developer of Tube Archivist, [2023-03-30] [[link](https://noted.lol/dev-debrief-tube-archivist/)]
* **console.substack.com**: Interview With Simon of Tube Archivist, [2023-01-29] [[link](https://console.substack.com/p/console-142#%C2%A7interview-with-simon-of-tube-archivist)]
* **reddit.com**: Tube Archivist v0.3.0 - Now Archiving Comments, [2022-12-02] [[link](https://www.reddit.com/r/selfhosted/comments/zaonzp/tube_archivist_v030_now_archiving_comments/)]
* **reddit.com**: Tube Archivist v0.2 - Now with Full Text Search, [2022-07-24] [[link](https://www.reddit.com/r/selfhosted/comments/w6jfa1/tube_archivist_v02_now_with_full_text_search/)]
* **noted.lol**: How I Control What Media My Kids Watch Using Tube Archivist, [2022-03-27] [[link](https://noted.lol/how-i-control-what-media-my-kids-watch-using-tube-archivist/)]
* **thehomelab.wiki**: Tube Archivist - A Youtube-DL Alternative on Steroids, [2022-01-27] [[link](https://thehomelab.wiki/books/news/page/tube-archivist-a-youtube-dl-alternative-on-steroids)]
* **reddit.com**: Celebrating TubeArchivist v0.1, [2022-01-09] [[link](https://www.reddit.com/r/selfhosted/comments/rzh084/celebrating_tubearchivist_v01/)]
* **linuxunplugged.com**: Pick: tubearchivist — Your self-hosted YouTube media server, [2021-09-11] [[link](https://linuxunplugged.com/425)] and [2021-10-05] [[link](https://linuxunplugged.com/426)]
* **reddit.com**: Introducing Tube Archivist, your self hosted Youtube media server, [2021-09-12] [[link](https://www.reddit.com/r/selfhosted/comments/pmj07b/introducing_tube_archivist_your_self_hosted/)]
## Sponsor
Big thank you to [Digitalocean](https://www.digitalocean.com/) for generously donating credit for the tubearchivist.com VPS and buildserver.
<p>
<a href="https://www.digitalocean.com/">
<img src="https://opensource.nyc3.cdn.digitaloceanspaces.com/attribution/assets/PoweredByDO/DO_Powered_by_Badge_blue.svg" width="201px">
</a>
</p>

View File

@ -1,25 +0,0 @@
## Tube Archivist on YouTube
[![ibracorp-youtube-video-thumb](assets/tube-archivist-ibracorp-O8H8Z01c0Ys.jpg)](https://www.youtube.com/watch?v=O8H8Z01c0Ys)
Video featuring Tube Archivist generously created by [IBRACORP](https://www.youtube.com/@IBRACORP).
## Screenshots
![login screenshot](assets/tube-archivist-login.png?raw=true "Tube Archivist Login")
*Login Page*: Secure way to access your media collection.
![home screenshot](assets/tube-archivist-home.png?raw=true "Tube Archivist Home")
*Home Page*: Your recent videos, continue watching incomplete videos.
![channels screenshot](assets/tube-archivist-channels.png?raw=true "Tube Archivist Channels")
*All Channels*: A list of all your indexed channels, filtered by subscribed only.
![single channel screenshot](assets/tube-archivist-single-channel.png?raw=true "Tube Archivist Single Channel")
*Single Channel*: Single channel page with additional metadata and sub pages.
![video page screenshot](assets/tube-archivist-video.png?raw=true "Tube Archivist Video Page")
*Video Page*: Stream your video directly from the interface.
![video page screenshot](assets/tube-archivist-download.png?raw=true "Tube Archivist Video Page")
*Downloads Page*: Add, control, and monitor your download queue.
![search page screenshot](assets/tube-archivist-search.png?raw=true "Tube Archivist Search Page")
*Search Page*. Use expressions to quickly search through your collection.

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 516 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 541 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.6 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 578 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 106 KiB

View File

@ -1,79 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<svg id="Layer_1" xmlns="http://www.w3.org/2000/svg" version="1.1" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 1000 1000">
<!-- Generator: Adobe Illustrator 29.5.0, SVG Export Plug-In . SVG Version: 2.1.0 Build 137) -->
<defs>
<style>
.st0 {
fill: #fff;
}
.st1 {
fill: #039a86;
}
.st2 {
fill: none;
}
.st3 {
clip-path: url(#clippath-1);
}
.st4 {
fill: #06131a;
}
.st5 {
clip-path: url(#clippath-3);
}
.st6 {
display: none;
}
.st7 {
clip-path: url(#clippath-2);
}
.st8 {
clip-path: url(#clippath);
}
</style>
<clipPath id="clippath">
<rect class="st2" x="25.6" y="22.9" width="948.9" height="954.2"/>
</clipPath>
<clipPath id="clippath-1">
<rect class="st2" x="25.6" y="22.9" width="948.9" height="954.2"/>
</clipPath>
<clipPath id="clippath-2">
<rect class="st2" x="25.6" y="22.9" width="948.9" height="954.2"/>
</clipPath>
<clipPath id="clippath-3">
<rect class="st2" x="25.6" y="22.9" width="948.9" height="954.2"/>
</clipPath>
</defs>
<g id="Artwork_1" class="st6">
<g class="st8">
<g class="st3">
<path class="st1" d="M447.2,22.9v15.2C269.3,59.3,118.8,179.4,58.6,348.1l76,21.8c49.9-135.2,169.9-232.2,312.6-252.7v15.4h35.3s0-109.7,0-109.7h-35.3ZM523,34.5v79.1c142.3,7.7,269.2,91.9,331.7,219.9l-14.8,4.2,9.7,33.7,106.6-30.3-9.7-33.9-14.9,4.3c-73.1-161.9-231-269-408.5-277M957.6,382.9l-75.8,21.7c8.9,32.9,13.6,66.8,13.8,100.8-.2,103.8-41.6,203.3-114.9,276.8l-9.4-12.6-28.6,20.8,11.9,16,46.5,64,6.6,9.1,28.6-20.8-8.8-12.1c93.6-88.8,146.7-212.1,147-341.1-.2-41.4-5.9-82.6-16.8-122.6M35.3,383.5l-9.7,33.9,14,4c-5.3,27.7-8.1,55.8-8.4,84,0,145.5,67.3,282.8,182.1,372.1l46.5-64c-94.4-74.4-149.6-187.9-149.8-308.1.3-20.8,2.2-41.6,5.8-62.1l15.1,4.1,9.7-33.9-17.9-4.9-75.7-21.7-11.6-3.3ZM303.8,820.6l-64.8,88.8,28.6,20.8,8.5-11.7c69.4,38.3,147.4,58.5,226.7,58.7,94.9,0,187.7-28.7,266.1-82.2l-46.6-64.1c-64.8,43.9-141.2,67.3-219.5,67.5-62.6-.3-124.2-15.5-179.8-44.4l9.4-12.6-28.6-20.8Z"/>
<polygon class="st4" points="114.9 238.4 115.1 324.3 261.3 324.3 261.1 458.5 351.9 458.5 352.1 324.3 495.9 324.3 495.6 238 114.9 238.4"/>
<rect class="st4" x="261.1" y="554.4" width="90.8" height="200.1"/>
<polygon class="st4" points="622.7 244.2 429.6 754.5 526.4 754.4 666.6 361.6 806 754.4 902.9 754.4 710.4 244.2 622.7 244.2"/>
<path class="st1" d="M255.5,476.4c-16.5,0-29.9,13.6-29.9,30.1.2,17.6,16.1,30.1,30,30.1,34.5,0,69.9,0,103.3,0,16.1,0,28.9-14,28.9-30.1,0-16.1-12.2-30.1-28.8-30.1-35.8,0-72.8,0-103.4,0"/>
<path class="st1" d="M665.5,483.6c-16.1,0-29.8,12.2-29.8,28.8v172l-37.8-38.9-25,24.5,92.2,93.8,94.3-93.8-25-24.5-38.9,38.9c0-23.6,0-40.8,0-68.6-.3-34.5,0-69,0-103.6,0-16.1-13.7-28.6-29.8-28.6h0Z"/>
</g>
</g>
</g>
<g id="Artwork_2">
<g class="st7">
<g class="st5">
<path class="st1" d="M447.2,22.9v15.2C269.3,59.3,118.8,179.4,58.6,348.1l76,21.8c49.9-135.2,169.9-232.2,312.6-252.7v15.4h35.3s0-109.7,0-109.7h-35.3ZM523,34.5v79.1c142.3,7.7,269.2,91.9,331.7,219.9l-14.8,4.2,9.7,33.7,106.6-30.3-9.7-33.9-14.9,4.3c-73.1-161.9-231-269-408.5-277M957.6,382.9l-75.8,21.7c8.9,32.9,13.6,66.8,13.8,100.8-.2,103.8-41.6,203.3-114.9,276.8l-9.4-12.6-28.6,20.8,11.9,16,46.5,64,6.6,9.1,28.6-20.8-8.8-12.1c93.6-88.8,146.7-212.1,147-341.1-.2-41.4-5.9-82.6-16.8-122.6M35.3,383.5l-9.7,33.9,14,4c-5.3,27.7-8.1,55.8-8.4,84,0,145.5,67.3,282.8,182.1,372.1l46.5-64c-94.4-74.4-149.6-187.9-149.8-308.1.3-20.8,2.2-41.6,5.8-62.1l15.1,4.1,9.7-33.9-17.9-4.9-75.7-21.7-11.6-3.3ZM303.8,820.6l-64.8,88.8,28.6,20.8,8.5-11.7c69.4,38.3,147.4,58.5,226.7,58.7,94.9,0,187.7-28.7,266.1-82.2l-46.6-64.1c-64.8,43.9-141.2,67.3-219.5,67.5-62.6-.3-124.2-15.5-179.8-44.4l9.4-12.6-28.6-20.8Z"/>
<polygon class="st0" points="114.9 238.4 115.1 324.3 261.3 324.3 261.1 458.5 351.9 458.5 352.1 324.3 495.9 324.3 495.6 238 114.9 238.4"/>
<rect class="st0" x="261.1" y="554.4" width="90.8" height="200.1"/>
<polygon class="st0" points="622.7 244.2 429.6 754.5 526.4 754.4 666.6 361.6 806 754.4 902.9 754.4 710.4 244.2 622.7 244.2"/>
<path class="st1" d="M255.5,476.4c-16.5,0-29.9,13.6-29.9,30.1.2,17.6,16.1,30.1,30,30.1,34.5,0,69.9,0,103.3,0,16.1,0,28.9-14,28.9-30.1,0-16.1-12.2-30.1-28.8-30.1-35.8,0-72.8,0-103.4,0"/>
<path class="st1" d="M665.5,483.6c-16.1,0-29.8,12.2-29.8,28.8v172l-37.8-38.9-25,24.5,92.2,93.8,94.3-93.8-25-24.5-38.9,38.9c0-23.6,0-40.8,0-68.6-.3-34.5,0-69,0-103.6,0-16.1-13.7-28.6-29.8-28.6h0Z"/>
</g>
</g>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 4.6 KiB

View File

@ -1,79 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<svg id="Layer_1" xmlns="http://www.w3.org/2000/svg" version="1.1" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 1000 1000">
<!-- Generator: Adobe Illustrator 29.5.0, SVG Export Plug-In . SVG Version: 2.1.0 Build 137) -->
<defs>
<style>
.st0 {
fill: #fff;
}
.st1 {
fill: #039a86;
}
.st2 {
fill: none;
}
.st3 {
clip-path: url(#clippath-1);
}
.st4 {
fill: #06131a;
}
.st5 {
clip-path: url(#clippath-3);
}
.st6 {
display: none;
}
.st7 {
clip-path: url(#clippath-2);
}
.st8 {
clip-path: url(#clippath);
}
</style>
<clipPath id="clippath">
<rect class="st2" x="25.6" y="22.9" width="948.9" height="954.2"/>
</clipPath>
<clipPath id="clippath-1">
<rect class="st2" x="25.6" y="22.9" width="948.9" height="954.2"/>
</clipPath>
<clipPath id="clippath-2">
<rect class="st2" x="25.6" y="22.9" width="948.9" height="954.2"/>
</clipPath>
<clipPath id="clippath-3">
<rect class="st2" x="25.6" y="22.9" width="948.9" height="954.2"/>
</clipPath>
</defs>
<g id="Artwork_1">
<g class="st8">
<g class="st3">
<path class="st1" d="M447.2,22.9v15.2C269.3,59.3,118.8,179.4,58.6,348.1l76,21.8c49.9-135.2,169.9-232.2,312.6-252.7v15.4h35.3s0-109.7,0-109.7h-35.3ZM523,34.5v79.1c142.3,7.7,269.2,91.9,331.7,219.9l-14.8,4.2,9.7,33.7,106.6-30.3-9.7-33.9-14.9,4.3c-73.1-161.9-231-269-408.5-277M957.6,382.9l-75.8,21.7c8.9,32.9,13.6,66.8,13.8,100.8-.2,103.8-41.6,203.3-114.9,276.8l-9.4-12.6-28.6,20.8,11.9,16,46.5,64,6.6,9.1,28.6-20.8-8.8-12.1c93.6-88.8,146.7-212.1,147-341.1-.2-41.4-5.9-82.6-16.8-122.6M35.3,383.5l-9.7,33.9,14,4c-5.3,27.7-8.1,55.8-8.4,84,0,145.5,67.3,282.8,182.1,372.1l46.5-64c-94.4-74.4-149.6-187.9-149.8-308.1.3-20.8,2.2-41.6,5.8-62.1l15.1,4.1,9.7-33.9-17.9-4.9-75.7-21.7-11.6-3.3ZM303.8,820.6l-64.8,88.8,28.6,20.8,8.5-11.7c69.4,38.3,147.4,58.5,226.7,58.7,94.9,0,187.7-28.7,266.1-82.2l-46.6-64.1c-64.8,43.9-141.2,67.3-219.5,67.5-62.6-.3-124.2-15.5-179.8-44.4l9.4-12.6-28.6-20.8Z"/>
<polygon class="st4" points="114.9 238.4 115.1 324.3 261.3 324.3 261.1 458.5 351.9 458.5 352.1 324.3 495.9 324.3 495.6 238 114.9 238.4"/>
<rect class="st4" x="261.1" y="554.4" width="90.8" height="200.1"/>
<polygon class="st4" points="622.7 244.2 429.6 754.5 526.4 754.4 666.6 361.6 806 754.4 902.9 754.4 710.4 244.2 622.7 244.2"/>
<path class="st1" d="M255.5,476.4c-16.5,0-29.9,13.6-29.9,30.1.2,17.6,16.1,30.1,30,30.1,34.5,0,69.9,0,103.3,0,16.1,0,28.9-14,28.9-30.1,0-16.1-12.2-30.1-28.8-30.1-35.8,0-72.8,0-103.4,0"/>
<path class="st1" d="M665.5,483.6c-16.1,0-29.8,12.2-29.8,28.8v172l-37.8-38.9-25,24.5,92.2,93.8,94.3-93.8-25-24.5-38.9,38.9c0-23.6,0-40.8,0-68.6-.3-34.5,0-69,0-103.6,0-16.1-13.7-28.6-29.8-28.6h0Z"/>
</g>
</g>
</g>
<g id="Artwork_2" class="st6">
<g class="st7">
<g class="st5">
<path class="st1" d="M447.2,22.9v15.2C269.3,59.3,118.8,179.4,58.6,348.1l76,21.8c49.9-135.2,169.9-232.2,312.6-252.7v15.4h35.3s0-109.7,0-109.7h-35.3ZM523,34.5v79.1c142.3,7.7,269.2,91.9,331.7,219.9l-14.8,4.2,9.7,33.7,106.6-30.3-9.7-33.9-14.9,4.3c-73.1-161.9-231-269-408.5-277M957.6,382.9l-75.8,21.7c8.9,32.9,13.6,66.8,13.8,100.8-.2,103.8-41.6,203.3-114.9,276.8l-9.4-12.6-28.6,20.8,11.9,16,46.5,64,6.6,9.1,28.6-20.8-8.8-12.1c93.6-88.8,146.7-212.1,147-341.1-.2-41.4-5.9-82.6-16.8-122.6M35.3,383.5l-9.7,33.9,14,4c-5.3,27.7-8.1,55.8-8.4,84,0,145.5,67.3,282.8,182.1,372.1l46.5-64c-94.4-74.4-149.6-187.9-149.8-308.1.3-20.8,2.2-41.6,5.8-62.1l15.1,4.1,9.7-33.9-17.9-4.9-75.7-21.7-11.6-3.3ZM303.8,820.6l-64.8,88.8,28.6,20.8,8.5-11.7c69.4,38.3,147.4,58.5,226.7,58.7,94.9,0,187.7-28.7,266.1-82.2l-46.6-64.1c-64.8,43.9-141.2,67.3-219.5,67.5-62.6-.3-124.2-15.5-179.8-44.4l9.4-12.6-28.6-20.8Z"/>
<polygon class="st0" points="114.9 238.4 115.1 324.3 261.3 324.3 261.1 458.5 351.9 458.5 352.1 324.3 495.9 324.3 495.6 238 114.9 238.4"/>
<rect class="st0" x="261.1" y="554.4" width="90.8" height="200.1"/>
<polygon class="st0" points="622.7 244.2 429.6 754.5 526.4 754.4 666.6 361.6 806 754.4 902.9 754.4 710.4 244.2 622.7 244.2"/>
<path class="st1" d="M255.5,476.4c-16.5,0-29.9,13.6-29.9,30.1.2,17.6,16.1,30.1,30,30.1,34.5,0,69.9,0,103.3,0,16.1,0,28.9-14,28.9-30.1,0-16.1-12.2-30.1-28.8-30.1-35.8,0-72.8,0-103.4,0"/>
<path class="st1" d="M665.5,483.6c-16.1,0-29.8,12.2-29.8,28.8v172l-37.8-38.9-25,24.5,92.2,93.8,94.3-93.8-25-24.5-38.9,38.9c0-23.6,0-40.8,0-68.6-.3-34.5,0-69,0-103.6,0-16.1-13.7-28.6-29.8-28.6h0Z"/>
</g>
</g>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 4.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 131 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 174 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 166 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 238 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 716 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 684 KiB

View File

@ -1,86 +0,0 @@
# Django Setup
## Apps
The backend is split up into the following apps.
### config
Root Django App. Doesn't define any views.
- Has main `settings.py`
- Has main `urls.py` responsible for routing to other apps
### common
Functionality shared between apps.
Defines views on the root `/api/*` path. Has base views to inherit from.
- Connections to ES and Redis
- Searching
- URL parser
- Collection of helper functions
### appsettings
Responsible for functionality from the settings pages.
Defines views at `/api/appsettings/*`.
- Index setup
- Reindexing
- Snapshots
- Filesystem Scan
- Manual import
### channel
Responsible for Channel Indexing functionality.
Defines views at `/api/channel/*` path.
### download
Implements download functionality with yt-dlp.
Defines views at `/api/download/*`.
- Download videos
- Queue management
- Thumbnails
- Subscriptions
### playlist
Implements playlist functionality.
Defines views at `/api/playlist/*`.
- Index Playlists
- Manual Playlists
### stats
Builds aggregations views for the statistics dashboard.
Defines views at `/api/stats/*`.
### task
Defines tasks for Celery.
Defines views at `/api/task/*`.
- Has main `tasks.py` with all shared_task definitions
- Has `CustomPeriodicTask` model
- Implements apprise notifications links
- Implements schedule functionality
### user
Implements user and auth functionality.
Defines views at `/api/config/*`.
- Defines custom `Account` model
### video
Index functionality for videos.
Defines views at `/api/video/*`.
- Index videos
- Index comments
- Index/download subtitles
- Media stream parsing

View File

@ -1,133 +0,0 @@
"""appsettings erializers"""
# pylint: disable=abstract-method
from common.serializers import ValidateUnknownFieldsMixin
from rest_framework import serializers
class BackupFileSerializer(serializers.Serializer):
"""serialize backup file"""
filename = serializers.CharField()
file_path = serializers.CharField()
file_size = serializers.IntegerField()
timestamp = serializers.CharField()
reason = serializers.CharField()
class AppConfigSubSerializer(
ValidateUnknownFieldsMixin, serializers.Serializer
):
"""serialize app config subscriptions"""
channel_size = serializers.IntegerField(required=False)
live_channel_size = serializers.IntegerField(required=False)
shorts_channel_size = serializers.IntegerField(required=False)
auto_start = serializers.BooleanField(required=False)
class AppConfigDownloadsSerializer(
ValidateUnknownFieldsMixin, serializers.Serializer
):
"""serialize app config downloads config"""
limit_speed = serializers.IntegerField(allow_null=True)
sleep_interval = serializers.IntegerField(allow_null=True)
autodelete_days = serializers.IntegerField(allow_null=True)
format = serializers.CharField(allow_null=True)
format_sort = serializers.CharField(allow_null=True)
add_metadata = serializers.BooleanField()
add_thumbnail = serializers.BooleanField()
subtitle = serializers.CharField(allow_null=True)
subtitle_source = serializers.ChoiceField(
choices=["auto", "user"], allow_null=True
)
subtitle_index = serializers.BooleanField()
comment_max = serializers.CharField(allow_null=True)
comment_sort = serializers.ChoiceField(
choices=["top", "new"], allow_null=True
)
cookie_import = serializers.BooleanField()
potoken = serializers.BooleanField()
throttledratelimit = serializers.IntegerField(allow_null=True)
extractor_lang = serializers.CharField(allow_null=True)
integrate_ryd = serializers.BooleanField()
integrate_sponsorblock = serializers.BooleanField()
class AppConfigAppSerializer(
ValidateUnknownFieldsMixin, serializers.Serializer
):
"""serialize app config"""
enable_snapshot = serializers.BooleanField()
enable_cast = serializers.BooleanField()
class AppConfigSerializer(ValidateUnknownFieldsMixin, serializers.Serializer):
"""serialize appconfig"""
subscriptions = AppConfigSubSerializer(required=False)
downloads = AppConfigDownloadsSerializer(required=False)
application = AppConfigAppSerializer(required=False)
class CookieValidationSerializer(serializers.Serializer):
"""serialize cookie validation response"""
cookie_enabled = serializers.BooleanField()
status = serializers.BooleanField(required=False)
validated = serializers.IntegerField(required=False)
validated_str = serializers.CharField(required=False)
class CookieUpdateSerializer(serializers.Serializer):
"""serialize cookie to update"""
cookie = serializers.CharField()
class PoTokenSerializer(serializers.Serializer):
"""serialize PO token"""
potoken = serializers.CharField()
class SnapshotItemSerializer(serializers.Serializer):
"""serialize snapshot response"""
id = serializers.CharField()
state = serializers.CharField()
es_version = serializers.CharField()
start_date = serializers.CharField()
end_date = serializers.CharField()
end_stamp = serializers.IntegerField()
duration_s = serializers.IntegerField()
class SnapshotListSerializer(serializers.Serializer):
"""serialize snapshot list response"""
next_exec = serializers.IntegerField()
next_exec_str = serializers.CharField()
expire_after = serializers.CharField()
snapshots = SnapshotItemSerializer(many=True)
class SnapshotCreateResponseSerializer(serializers.Serializer):
"""serialize new snapshot creating response"""
snapshot_name = serializers.CharField()
class SnapshotRestoreResponseSerializer(serializers.Serializer):
"""serialize snapshot restore response"""
accepted = serializers.BooleanField()
class TokenResponseSerializer(serializers.Serializer):
"""serialize token response"""
token = serializers.CharField()

View File

@ -1,273 +0,0 @@
"""
Functionality:
- Handle json zip file based backup
- create backup
- restore backup
"""
import json
import os
import zipfile
from datetime import datetime
from common.src.env_settings import EnvironmentSettings
from common.src.es_connect import ElasticWrap, IndexPaginate
from common.src.helper import get_mapping, ignore_filelist
from task.models import CustomPeriodicTask
class ElasticBackup:
"""dump index to nd-json files for later bulk import"""
INDEX_SPLIT = ["comment"]
CACHE_DIR = EnvironmentSettings.CACHE_DIR
BACKUP_DIR = os.path.join(CACHE_DIR, "backup")
def __init__(self, reason=False, task=False) -> None:
self.timestamp = datetime.now().strftime("%Y%m%d")
self.index_config = get_mapping()
self.reason = reason
self.task = task
def backup_all_indexes(self):
"""backup all indexes, add reason to init"""
print("backup all indexes")
if not self.reason:
raise ValueError("missing backup reason in ElasticBackup")
if self.task:
self.task.send_progress(["Scanning your index."])
for index in self.index_config:
index_name = index["index_name"]
print(f"backup: export in progress for {index_name}")
if not self.index_exists(index_name):
print(f"skip backup for not yet existing index {index_name}")
continue
self.backup_index(index_name)
if self.task:
self.task.send_progress(["Compress files to zip archive."])
self.zip_it()
if self.reason == "auto":
self.rotate_backup()
def backup_index(self, index_name):
"""export all documents of a single index"""
paginate_kwargs = {
"data": {"query": {"match_all": {}}},
"keep_source": True,
"callback": BackupCallback,
"task": self.task,
"total": self._get_total(index_name),
}
if index_name in self.INDEX_SPLIT:
paginate_kwargs.update({"size": 200})
paginate = IndexPaginate(f"ta_{index_name}", **paginate_kwargs)
_ = paginate.get_results()
@staticmethod
def _get_total(index_name):
"""get total documents in index"""
path = f"ta_{index_name}/_count"
response, _ = ElasticWrap(path).get()
return response.get("count")
def zip_it(self):
"""pack it up into single zip file"""
file_name = f"ta_backup-{self.timestamp}-{self.reason}.zip"
to_backup = []
for file in os.listdir(self.BACKUP_DIR):
if file.endswith(".json"):
to_backup.append(os.path.join(self.BACKUP_DIR, file))
backup_file = os.path.join(self.BACKUP_DIR, file_name)
comp = zipfile.ZIP_DEFLATED
with zipfile.ZipFile(backup_file, "w", compression=comp) as zip_f:
for backup_file in to_backup:
zip_f.write(backup_file, os.path.basename(backup_file))
# cleanup
for backup_file in to_backup:
os.remove(backup_file)
def post_bulk_restore(self, file_name):
"""send bulk to es"""
file_path = os.path.join(self.CACHE_DIR, file_name)
with open(file_path, "r", encoding="utf-8") as f:
data = f.read()
if not data.strip():
return
_, _ = ElasticWrap("_bulk").post(data=data, ndjson=True)
def get_all_backup_files(self):
"""build all available backup files for view"""
all_backup_files = ignore_filelist(os.listdir(self.BACKUP_DIR))
all_available_backups = [
i
for i in all_backup_files
if i.startswith("ta_") and i.endswith(".zip")
]
all_available_backups.sort(reverse=True)
backup_dicts = []
for filename in all_available_backups:
data = self.build_backup_file_data(filename)
backup_dicts.append(data)
return backup_dicts
def build_backup_file_data(self, filename):
"""build metadata of single backup file"""
file_path = os.path.join(self.BACKUP_DIR, filename)
if not os.path.exists(file_path):
return False
file_split = filename.split("-")
if len(file_split) == 2:
timestamp = file_split[1].strip(".zip")
reason = False
elif len(file_split) == 3:
timestamp = file_split[1]
reason = file_split[2].strip(".zip")
else:
raise ValueError
data = {
"filename": filename,
"file_path": file_path,
"file_size": os.path.getsize(file_path),
"timestamp": timestamp,
"reason": reason,
}
return data
def restore(self, filename):
"""
restore from backup zip file
call reset from ElasitIndexWrap first to start blank
"""
zip_content = self._unpack_zip_backup(filename)
self._restore_json_files(zip_content)
def _unpack_zip_backup(self, filename):
"""extract backup zip and return filelist"""
file_path = os.path.join(self.BACKUP_DIR, filename)
with zipfile.ZipFile(file_path, "r") as z:
zip_content = z.namelist()
z.extractall(self.BACKUP_DIR)
return zip_content
def _restore_json_files(self, zip_content):
"""go through the unpacked files and restore"""
for idx, json_f in enumerate(zip_content):
self._notify_restore(idx, json_f, len(zip_content))
file_name = os.path.join(self.BACKUP_DIR, json_f)
if not json_f.startswith("es_") or not json_f.endswith(".json"):
os.remove(file_name)
continue
print("restoring: " + json_f)
self.post_bulk_restore(file_name)
os.remove(file_name)
def _notify_restore(self, idx, json_f, total_files):
"""notify restore progress"""
message = [f"Restore index from json backup file {json_f}."]
progress = (idx + 1) / total_files
self.task.send_progress(message_lines=message, progress=progress)
@staticmethod
def index_exists(index_name):
"""check if index already exists to skip"""
_, status_code = ElasticWrap(f"ta_{index_name}").get()
exists = status_code == 200
return exists
def rotate_backup(self):
"""delete old backups if needed"""
try:
task = CustomPeriodicTask.objects.get(name="run_backup")
except CustomPeriodicTask.DoesNotExist:
return
rotate = task.task_config.get("rotate")
if not rotate:
return
all_backup_files = self.get_all_backup_files()
auto = [i for i in all_backup_files if i["reason"] == "auto"]
if len(auto) <= rotate:
print("no backup files to rotate")
return
all_to_delete = auto[rotate:]
for to_delete in all_to_delete:
self.delete_file(to_delete["filename"])
def delete_file(self, filename):
"""delete backup file"""
file_path = os.path.join(self.BACKUP_DIR, filename)
if not os.path.exists(file_path):
print(f"backup file not found: {filename}")
return False
print(f"remove old backup file: {file_path}")
os.remove(file_path)
return file_path
class BackupCallback:
"""handle backup ndjson writer as callback for IndexPaginate"""
def __init__(self, source, index_name, counter=0):
self.source = source
self.index_name = index_name
self.counter = counter
self.timestamp = datetime.now().strftime("%Y%m%d")
self.cache_dir = EnvironmentSettings.CACHE_DIR
def run(self):
"""run the junk task"""
file_content = self._build_bulk()
self._write_es_json(file_content)
def _build_bulk(self):
"""build bulk query data from all_results"""
bulk_list = []
for document in self.source:
document_id = document["_id"]
es_index = document["_index"]
action = {"index": {"_index": es_index, "_id": document_id}}
source = document["_source"]
bulk_list.append(json.dumps(action))
bulk_list.append(json.dumps(source))
# add last newline
bulk_list.append("\n")
file_content = "\n".join(bulk_list)
return file_content
def _write_es_json(self, file_content):
"""write nd-json file for es _bulk API to disk"""
index = self.index_name.lstrip("ta_")
file_name = f"es_{index}-{self.timestamp}-{self.counter}.json"
file_path = os.path.join(self.cache_dir, "backup", file_name)
with open(file_path, "a+", encoding="utf-8") as f:
f.write(file_content)

View File

@ -1,252 +0,0 @@
"""
Functionality:
- read and write config
- load config variables into redis
"""
from random import randint
from time import sleep
from typing import Literal, TypedDict
import requests
from appsettings.src.snapshot import ElasticSnapshot
from common.src.es_connect import ElasticWrap
from common.src.ta_redis import RedisArchivist
from django.conf import settings
class SubscriptionsConfigType(TypedDict):
"""describes subscriptions config"""
channel_size: int
live_channel_size: int
shorts_channel_size: int
auto_start: bool
class DownloadsConfigType(TypedDict):
"""describes downloads config"""
limit_speed: int | None
sleep_interval: int | None
autodelete_days: int | None
format: str | None
format_sort: str | None
add_metadata: bool
add_thumbnail: bool
subtitle: str | None
subtitle_source: Literal["user", "auto"] | None
subtitle_index: bool
comment_max: str | None
comment_sort: Literal["top", "new"] | None
cookie_import: bool
potoken: bool
throttledratelimit: int | None
extractor_lang: str | None
integrate_ryd: bool
integrate_sponsorblock: bool
class ApplicationConfigType(TypedDict):
"""describes application config"""
enable_snapshot: bool
enable_cast: bool
class AppConfigType(TypedDict):
"""combined app config type"""
subscriptions: SubscriptionsConfigType
downloads: DownloadsConfigType
application: ApplicationConfigType
class AppConfig:
"""handle application variables"""
ES_PATH = "ta_config/_doc/appsettings"
ES_UPDATE_PATH = "ta_config/_update/appsettings"
CONFIG_DEFAULTS: AppConfigType = {
"subscriptions": {
"channel_size": 50,
"live_channel_size": 50,
"shorts_channel_size": 50,
"auto_start": False,
},
"downloads": {
"limit_speed": None,
"sleep_interval": 10,
"autodelete_days": None,
"format": None,
"format_sort": None,
"add_metadata": False,
"add_thumbnail": False,
"subtitle": None,
"subtitle_source": None,
"subtitle_index": False,
"comment_max": None,
"comment_sort": "top",
"cookie_import": False,
"potoken": False,
"throttledratelimit": None,
"extractor_lang": None,
"integrate_ryd": False,
"integrate_sponsorblock": False,
},
"application": {
"enable_snapshot": True,
"enable_cast": False,
},
}
def __init__(self):
self.config = self.get_config()
def get_config(self) -> AppConfigType:
"""get config from ES"""
response, status_code = ElasticWrap(self.ES_PATH).get()
if not status_code == 200:
raise ValueError(f"no config found at {self.ES_PATH}")
return response["_source"]
def update_config(self, data: dict) -> AppConfigType:
"""update single config value"""
new_config = self.config.copy()
for key, value in data.items():
if (
isinstance(value, dict)
and key in new_config
and isinstance(new_config[key], dict)
):
new_config[key].update(value)
else:
new_config[key] = value
response, status_code = ElasticWrap(self.ES_PATH).post(new_config)
if not status_code == 200:
print(response)
self.config = new_config
return new_config
def post_process_updated(self, data: dict) -> None:
"""apply hooks for some config keys"""
for config_value, updated_value in data:
if config_value == "application.enable_snapshot" and updated_value:
ElasticSnapshot().setup()
@staticmethod
def _fail_message(message_line):
"""notify our failure"""
key = "message:setting"
message = {
"status": key,
"group": "setting:application",
"level": "error",
"title": "Cookie import failed",
"messages": [message_line],
"id": "0000",
}
RedisArchivist().set_message(key, message=message, expire=True)
def sync_defaults(self):
"""sync defaults at startup, needs to be called with __new__"""
return ElasticWrap(self.ES_PATH).post(self.CONFIG_DEFAULTS)
def add_new_defaults(self) -> list[str]:
"""add new default config values to ES, called at startup"""
updated = []
for key, value in self.CONFIG_DEFAULTS.items():
if key not in self.config:
# complete new key
self.update_config({key: value})
updated.append(str({key: value}))
continue
for sub_key, sub_value in value.items(): # type: ignore
if sub_key not in self.config[key]:
# new partial key
to_update = {key: {sub_key: sub_value}}
self.update_config(to_update)
updated.append(str(to_update))
return updated
class ReleaseVersion:
"""compare local version with remote version"""
REMOTE_URL = "https://www.tubearchivist.com/api/release/latest/"
NEW_KEY = "versioncheck:new"
def __init__(self) -> None:
self.local_version: str = settings.TA_VERSION
self.is_unstable: bool = settings.TA_VERSION.endswith("-unstable")
self.remote_version: str = ""
self.is_breaking: bool = False
def check(self) -> None:
"""check version"""
print(f"[{self.local_version}]: look for updates")
self.get_remote_version()
new_version = self._has_update()
if new_version:
message = {
"status": True,
"version": new_version,
"is_breaking": self.is_breaking,
}
RedisArchivist().set_message(self.NEW_KEY, message)
print(f"[{self.local_version}]: found new version {new_version}")
def get_local_version(self) -> str:
"""read version from local"""
return self.local_version
def get_remote_version(self) -> None:
"""read version from remote"""
sleep(randint(0, 60))
response = requests.get(self.REMOTE_URL, timeout=20).json()
self.remote_version = response["release_version"]
self.is_breaking = response["breaking_changes"]
def _has_update(self) -> str | bool:
"""check if there is an update"""
remote_parsed = self._parse_version(self.remote_version)
local_parsed = self._parse_version(self.local_version)
if remote_parsed > local_parsed:
return self.remote_version
if self.is_unstable and local_parsed == remote_parsed:
return self.remote_version
return False
@staticmethod
def _parse_version(version) -> tuple[int, ...]:
"""return version parts"""
clean = version.rstrip("-unstable").lstrip("v")
return tuple((int(i) for i in clean.split(".")))
def is_updated(self) -> str | bool:
"""check if update happened in the mean time"""
message = self.get_update()
if not message:
return False
local_parsed = self._parse_version(self.local_version)
message_parsed = self._parse_version(message.get("version"))
if local_parsed >= message_parsed:
RedisArchivist().del_message(self.NEW_KEY)
return settings.TA_VERSION
return False
def get_update(self) -> dict | None:
"""return new version dict if available"""
message = RedisArchivist().get_message_dict(self.NEW_KEY)
return message or None

View File

@ -1,93 +0,0 @@
"""
Functionality:
- scan the filesystem to delete or index
"""
import os
from common.src.env_settings import EnvironmentSettings
from common.src.es_connect import IndexPaginate
from common.src.helper import ignore_filelist
from video.src.comments import CommentList
from video.src.index import YoutubeVideo, index_new_video
class Scanner:
"""scan index and filesystem"""
VIDEOS: str = EnvironmentSettings.MEDIA_DIR
def __init__(self, task=False) -> None:
self.task = task
self.to_delete: set[str] = set()
self.to_index: set[str] = set()
def scan(self) -> None:
"""scan the filesystem"""
downloaded: set[str] = self._get_downloaded()
indexed: set[str] = self._get_indexed()
self.to_index = downloaded - indexed
self.to_delete = indexed - downloaded
def _get_downloaded(self) -> set[str]:
"""get downloaded ids"""
if self.task:
self.task.send_progress(["Scan your filesystem for videos."])
downloaded: set = set()
channels = ignore_filelist(os.listdir(self.VIDEOS))
for channel in channels:
folder = os.path.join(self.VIDEOS, channel)
files = ignore_filelist(os.listdir(folder))
downloaded.update({i.split(".")[0] for i in files})
return downloaded
def _get_indexed(self) -> set:
"""get all indexed ids"""
if self.task:
self.task.send_progress(["Get all videos indexed."])
data = {"query": {"match_all": {}}, "_source": ["youtube_id"]}
response = IndexPaginate("ta_video", data).get_results()
return {i["youtube_id"] for i in response}
def apply(self) -> None:
"""apply all changes"""
self.delete()
self.index()
def delete(self) -> None:
"""delete videos from index"""
if not self.to_delete:
print("nothing to delete")
return
if self.task:
self.task.send_progress(
[f"Remove {len(self.to_delete)} videos from index."]
)
for youtube_id in self.to_delete:
YoutubeVideo(youtube_id).delete_media_file()
def index(self) -> None:
"""index new"""
if not self.to_index:
print("nothing to index")
return
total = len(self.to_index)
for idx, youtube_id in enumerate(self.to_index):
if self.task:
self.task.send_progress(
message_lines=[
f"Index missing video {youtube_id}, {idx + 1}/{total}"
],
progress=(idx + 1) / total,
)
index_new_video(youtube_id)
comment_list = CommentList(task=self.task)
comment_list.add(video_ids=list(self.to_index))
comment_list.index()

View File

@ -1,220 +0,0 @@
"""
functionality:
- setup elastic index at first start
- verify and update index mapping and settings if needed
- backup and restore metadata
"""
from appsettings.src.backup import ElasticBackup
from appsettings.src.config import AppConfig
from appsettings.src.snapshot import ElasticSnapshot
from common.src.es_connect import ElasticWrap
from common.src.helper import get_mapping
class ElasticIndex:
"""interact with a single index"""
def __init__(self, index_name, expected_map=False, expected_set=False):
self.index_name = index_name
self.expected_map = expected_map
self.expected_set = expected_set
self.exists, self.details = self.index_exists()
def index_exists(self):
"""check if index already exists and return mapping if it does"""
response, status_code = ElasticWrap(f"ta_{self.index_name}").get()
exists = status_code == 200
details = response.get(f"ta_{self.index_name}", False)
return exists, details
def validate(self):
"""
check if all expected mappings and settings match
returns True when rebuild is needed
"""
if self.expected_map:
rebuild = self.validate_mappings()
if rebuild:
return rebuild
if self.expected_set:
rebuild = self.validate_settings()
if rebuild:
return rebuild
return False
def validate_mappings(self):
"""check if all mappings are as expected"""
now_map = self.details["mappings"]["properties"]
for key, value in self.expected_map.items():
# nested
if list(value.keys()) == ["properties"]:
for key_n, value_n in value["properties"].items():
if key not in now_map:
print(f"detected mapping change: {key_n}, {value_n}")
return True
if key_n not in now_map[key]["properties"].keys():
print(f"detected mapping change: {key_n}, {value_n}")
return True
if not value_n == now_map[key]["properties"][key_n]:
print(f"detected mapping change: {key_n}, {value_n}")
return True
continue
# not nested
if key not in now_map.keys():
print(f"detected mapping change: {key}, {value}")
return True
if not value == now_map[key]:
print(f"detected mapping change: {key}, {value}")
return True
return False
def validate_settings(self):
"""check if all settings are as expected"""
now_set = self.details["settings"]["index"]
for key, value in self.expected_set.items():
if key not in now_set.keys():
print(key, value)
return True
if not value == now_set[key]:
print(key, value)
return True
return False
def rebuild_index(self):
"""rebuild with new mapping"""
print(f"applying new mappings to index ta_{self.index_name}...")
self.create_blank(for_backup=True)
self.reindex("backup")
self.delete_index(backup=False)
self.create_blank()
self.reindex("restore")
self.delete_index()
def reindex(self, method):
"""create on elastic search"""
if method == "backup":
source = f"ta_{self.index_name}"
destination = f"ta_{self.index_name}_backup"
elif method == "restore":
source = f"ta_{self.index_name}_backup"
destination = f"ta_{self.index_name}"
else:
raise ValueError("invalid method, expected 'backup' or 'restore'")
data = {"source": {"index": source}, "dest": {"index": destination}}
_, _ = ElasticWrap("_reindex?refresh=true").post(data=data)
def delete_index(self, backup=True):
"""delete index passed as argument"""
path = f"ta_{self.index_name}"
if backup:
path = path + "_backup"
_, _ = ElasticWrap(path).delete()
def create_blank(self, for_backup=False):
"""apply new mapping and settings for blank new index"""
print(f"create new blank index with name ta_{self.index_name}...")
path = f"ta_{self.index_name}"
if for_backup:
path = f"{path}_backup"
data = {}
if self.expected_set:
data.update({"settings": self.expected_set})
if self.expected_map:
data.update({"mappings": {"properties": self.expected_map}})
_, _ = ElasticWrap(path).put(data)
class ElasitIndexWrap:
"""interact with all index mapping and setup"""
def __init__(self):
self.index_config = get_mapping()
self.backup_run = False
def setup(self):
"""setup elastic index, run at startup"""
for index in self.index_config:
index_name, expected_map, expected_set = self._config_split(index)
handler = ElasticIndex(index_name, expected_map, expected_set)
if not handler.exists:
handler.create_blank()
continue
rebuild = handler.validate()
if rebuild:
self._check_backup()
handler.rebuild_index()
continue
# else all good
print(f"ta_{index_name} index is created and up to date...")
def reset(self):
"""reset all indexes to blank"""
self.delete_all()
self.create_all_blank()
def delete_all(self):
"""delete all indexes"""
print("reset elastic index")
for index in self.index_config:
index_name, _, _ = self._config_split(index)
handler = ElasticIndex(index_name)
handler.delete_index(backup=False)
def create_all_blank(self):
"""create all blank indexes"""
print("create all new indexes in elastic from template")
for index in self.index_config:
index_name, expected_map, expected_set = self._config_split(index)
handler = ElasticIndex(index_name, expected_map, expected_set)
handler.create_blank()
@staticmethod
def _config_split(index):
"""split index config keys"""
index_name = index["index_name"]
expected_map = index["expected_map"]
expected_set = index["expected_set"]
return index_name, expected_map, expected_set
def _check_backup(self):
"""create backup if needed"""
if self.backup_run:
return
try:
config = AppConfig().config
except ValueError:
# create defaults in ES if config not found
print("AppConfig not found, creating defaults...")
handler = AppConfig.__new__(AppConfig)
handler.sync_defaults()
config = AppConfig.CONFIG_DEFAULTS
if config["application"]["enable_snapshot"]:
# take snapshot if enabled
ElasticSnapshot().take_snapshot_now(wait=True)
else:
# fallback to json backup
ElasticBackup(reason="update").backup_all_indexes()
self.backup_run = True

View File

@ -1,563 +0,0 @@
"""
functionality:
- periodically refresh documents
- index and update in es
"""
import json
import os
from datetime import datetime
from typing import Callable, TypedDict
from appsettings.src.config import AppConfig
from channel.src.index import YoutubeChannel
from common.src.env_settings import EnvironmentSettings
from common.src.es_connect import ElasticWrap, IndexPaginate
from common.src.helper import rand_sleep
from common.src.ta_redis import RedisQueue
from download.src.subscriptions import ChannelSubscription
from download.src.thumbnails import ThumbManager
from download.src.yt_dlp_base import CookieHandler
from playlist.src.index import YoutubePlaylist
from task.models import CustomPeriodicTask
from video.src.comments import Comments
from video.src.index import YoutubeVideo
class ReindexConfigType(TypedDict):
"""represents config type"""
index_name: str
queue_name: str
active_key: str
refresh_key: str
class ReindexBase:
"""base config class for reindex task"""
REINDEX_CONFIG: dict[str, ReindexConfigType] = {
"video": {
"index_name": "ta_video",
"queue_name": "reindex:ta_video",
"active_key": "active",
"refresh_key": "vid_last_refresh",
},
"channel": {
"index_name": "ta_channel",
"queue_name": "reindex:ta_channel",
"active_key": "channel_active",
"refresh_key": "channel_last_refresh",
},
"playlist": {
"index_name": "ta_playlist",
"queue_name": "reindex:ta_playlist",
"active_key": "playlist_active",
"refresh_key": "playlist_last_refresh",
},
}
MULTIPLY = 1.2
DAYS3 = 60 * 60 * 24 * 3
def __init__(self):
self.config = AppConfig().config
self.now = int(datetime.now().timestamp())
def populate(self, all_ids, reindex_config: ReindexConfigType):
"""add all to reindex ids to redis queue"""
if not all_ids:
return
RedisQueue(queue_name=reindex_config["queue_name"]).add_list(all_ids)
class ReindexPopulate(ReindexBase):
"""add outdated and recent documents to reindex queue"""
INTERVAL_DEFAIULT: int = 90
def __init__(self):
super().__init__()
self.interval = self.INTERVAL_DEFAIULT
def get_interval(self) -> None:
"""get reindex days interval from task"""
try:
task = CustomPeriodicTask.objects.get(name="check_reindex")
except CustomPeriodicTask.DoesNotExist:
return
task_config = task.task_config
if task_config.get("days"):
self.interval = task_config.get("days")
def add_recent(self) -> None:
"""add recent videos to refresh"""
gte = datetime.fromtimestamp(self.now - self.DAYS3).date().isoformat()
must_list = [
{"term": {"active": {"value": True}}},
{"range": {"published": {"gte": gte}}},
]
data = {
"size": 10000,
"query": {"bool": {"must": must_list}},
"sort": [{"published": {"order": "desc"}}],
}
response, _ = ElasticWrap("ta_video/_search").get(data=data)
hits = response["hits"]["hits"]
if not hits:
return
all_ids = [i["_source"]["youtube_id"] for i in hits]
reindex_config: ReindexConfigType = self.REINDEX_CONFIG["video"]
self.populate(all_ids, reindex_config)
def add_outdated(self) -> None:
"""add outdated documents"""
for reindex_config in self.REINDEX_CONFIG.values():
total_hits = self._get_total_hits(reindex_config)
daily_should = self._get_daily_should(total_hits)
all_ids = self._get_outdated_ids(reindex_config, daily_should)
self.populate(all_ids, reindex_config)
@staticmethod
def _get_total_hits(reindex_config: ReindexConfigType) -> int:
"""get total hits from index"""
index_name = reindex_config["index_name"]
active_key = reindex_config["active_key"]
data = {
"query": {"term": {active_key: {"value": True}}},
"_source": False,
}
total = IndexPaginate(index_name, data, keep_source=True).get_results()
return len(total)
def _get_daily_should(self, total_hits: int) -> int:
"""calc how many should reindex daily"""
daily_should = int((total_hits // self.interval + 1) * self.MULTIPLY)
if daily_should >= 10000:
daily_should = 9999
return daily_should
def _get_outdated_ids(
self, reindex_config: ReindexConfigType, daily_should: int
) -> list[str]:
"""get outdated from index_name"""
index_name = reindex_config["index_name"]
refresh_key = reindex_config["refresh_key"]
now_lte = str(self.now - self.interval * 24 * 60 * 60)
must_list = [
{"match": {reindex_config["active_key"]: True}},
{"range": {refresh_key: {"lte": now_lte}}},
]
data = {
"size": daily_should,
"query": {"bool": {"must": must_list}},
"sort": [{refresh_key: {"order": "asc"}}],
"_source": False,
}
response, _ = ElasticWrap(f"{index_name}/_search").get(data=data)
all_ids = [i["_id"] for i in response["hits"]["hits"]]
return all_ids
class ReindexManual(ReindexBase):
"""
manually add ids to reindex queue from API
data_example = {
"video": ["video1", "video2", "video3"],
"channel": ["channel1", "channel2", "channel3"],
"playlist": ["playlist1", "playlist2"],
}
extract_videos to also reindex all videos of channel/playlist
"""
def __init__(self, extract_videos=False):
super().__init__()
self.extract_videos = extract_videos
self.data = False
def extract_data(self, data) -> None:
"""process data"""
self.data = data
for key, values in self.data.items():
reindex_config = self.REINDEX_CONFIG.get(key)
if not reindex_config:
print(f"reindex type {key} not valid")
raise ValueError
self.process_index(reindex_config, values)
def process_index(
self, index_config: ReindexConfigType, values: list[str]
) -> None:
"""process values per index"""
index_name = index_config["index_name"]
if index_name == "ta_video":
self._add_videos(values)
elif index_name == "ta_channel":
self._add_channels(values)
elif index_name == "ta_playlist":
self._add_playlists(values)
def _add_videos(self, values: list[str]) -> None:
"""add list of videos to reindex queue"""
if not values:
return
queue_name = self.REINDEX_CONFIG["video"]["queue_name"]
RedisQueue(queue_name).add_list(values)
def _add_channels(self, values: list[str]) -> None:
"""add list of channels to reindex queue"""
queue_name = self.REINDEX_CONFIG["channel"]["queue_name"]
RedisQueue(queue_name).add_list(values)
if self.extract_videos:
for channel_id in values:
all_videos = self._get_channel_videos(channel_id)
self._add_videos(all_videos)
def _add_playlists(self, values: list[str]) -> None:
"""add list of playlists to reindex queue"""
queue_name = self.REINDEX_CONFIG["playlist"]["queue_name"]
RedisQueue(queue_name).add_list(values)
if self.extract_videos:
for playlist_id in values:
all_videos = self._get_playlist_videos(playlist_id)
self._add_videos(all_videos)
def _get_channel_videos(self, channel_id: str) -> list[str]:
"""get all videos from channel"""
data = {
"query": {"term": {"channel.channel_id": {"value": channel_id}}},
"_source": ["youtube_id"],
}
all_results = IndexPaginate("ta_video", data).get_results()
return [i["youtube_id"] for i in all_results]
def _get_playlist_videos(self, playlist_id: str) -> list[str]:
"""get all videos from playlist"""
data = {
"query": {"term": {"playlist.keyword": {"value": playlist_id}}},
"_source": ["youtube_id"],
}
all_results = IndexPaginate("ta_video", data).get_results()
return [i["youtube_id"] for i in all_results]
class Reindex(ReindexBase):
"""reindex all documents from redis queue"""
def __init__(self, task=False):
super().__init__()
self.task = task
self.processed = {
"videos": 0,
"channels": 0,
"playlists": 0,
}
def reindex_all(self) -> None:
"""reindex all in queue"""
if not self.cookie_is_valid():
print("[reindex] cookie invalid, exiting...")
return
for name, index_config in self.REINDEX_CONFIG.items():
if not RedisQueue(index_config["queue_name"]).length():
continue
self.reindex_type(name, index_config)
def reindex_type(self, name: str, index_config: ReindexConfigType) -> None:
"""reindex all of a single index"""
reindex = self._get_reindex_map(index_config["index_name"])
queue = RedisQueue(index_config["queue_name"])
while True:
total = queue.max_score()
youtube_id, idx = queue.get_next()
if not youtube_id or not idx or not total:
break
if self.task:
self._notify(name, total, idx)
reindex(youtube_id)
rand_sleep(self.config)
def _get_reindex_map(self, index_name: str) -> Callable:
"""return def to run for index"""
def_map = {
"ta_video": self._reindex_single_video,
"ta_channel": self._reindex_single_channel,
"ta_playlist": self._reindex_single_playlist,
}
return def_map[index_name]
def _notify(self, name: str, total: int, idx: int) -> None:
"""send notification back to task"""
message = [f"Reindexing {name.title()}s {idx}/{total}"]
progress = idx / total
self.task.send_progress(message, progress=progress)
def _reindex_single_video(self, youtube_id: str) -> None:
"""refresh data for single video"""
video = YoutubeVideo(youtube_id)
# read current state
video.get_from_es()
if not video.json_data:
return
es_meta = video.json_data.copy()
# get new
media_url = os.path.join(
EnvironmentSettings.MEDIA_DIR, es_meta["media_url"]
)
video.build_json(media_path=media_url)
if not video.youtube_meta:
video.deactivate()
return
video.delete_subtitles(subtitles=es_meta.get("subtitles"))
video.check_subtitles()
# add back
video.json_data["player"] = es_meta.get("player")
video.json_data["date_downloaded"] = es_meta.get("date_downloaded")
video.json_data["vid_type"] = es_meta.get("vid_type")
video.json_data["channel"] = es_meta.get("channel")
if es_meta.get("playlist"):
video.json_data["playlist"] = es_meta.get("playlist")
video.upload_to_es()
thumb_handler = ThumbManager(youtube_id)
thumb_handler.delete_video_thumb()
thumb_handler.download_video_thumb(video.json_data["vid_thumb_url"])
Comments(youtube_id, config=self.config).reindex_comments()
self.processed["videos"] += 1
def _reindex_single_channel(self, channel_id: str) -> None:
"""refresh channel data and sync to videos"""
# read current state
channel = YoutubeChannel(channel_id)
channel.get_from_es()
if not channel.json_data:
return
es_meta = channel.json_data.copy()
# get new
channel.get_from_youtube()
if not channel.youtube_meta:
channel.deactivate()
channel.get_from_es()
channel.sync_to_videos()
return
channel.process_youtube_meta()
channel.get_channel_art()
# add back
channel.json_data["channel_subscribed"] = es_meta["channel_subscribed"]
overwrites = es_meta.get("channel_overwrites")
if overwrites:
channel.json_data["channel_overwrites"] = overwrites
channel.upload_to_es()
channel.sync_to_videos()
ChannelFullScan(channel_id).scan()
self.processed["channels"] += 1
def _reindex_single_playlist(self, playlist_id: str) -> None:
"""refresh playlist data"""
playlist = YoutubePlaylist(playlist_id)
playlist.get_from_es()
if (
not playlist.json_data
or playlist.json_data["playlist_type"] == "custom"
):
return
is_active = playlist.update_playlist()
if not is_active:
playlist.deactivate()
return
self.processed["playlists"] += 1
def cookie_is_valid(self) -> bool:
"""return true if cookie is enabled and valid"""
if not self.config["downloads"]["cookie_import"]:
# is not activated, continue reindex
return True
valid = CookieHandler(self.config).validate()
return valid
def build_message(self) -> str:
"""build progress message"""
message = ""
for key, value in self.processed.items():
if value:
message = message + f"{value} {key}, "
if message:
message = f"reindexed {message.rstrip(', ')}"
return message
class ReindexProgress(ReindexBase):
"""
get progress of reindex task
request_type: key of self.REINDEX_CONFIG
request_id: id of request_type
return = {
"state": "running" | "queued" | False
"total_queued": int
"in_queue_name": "queue_name"
}
"""
def __init__(self, request_type=False, request_id=False):
super().__init__()
self.request_type = request_type
self.request_id = request_id
def get_progress(self) -> dict:
"""get progress from task"""
queue_name, request_type = self._get_queue_name()
total = self._get_total_in_queue(queue_name)
progress = {
"total_queued": total,
"type": request_type,
}
state = self._get_state(total, queue_name)
progress.update(state)
return progress
def _get_queue_name(self):
"""return queue_name, queue_type, raise exception on error"""
if not self.request_type:
return "all", "all"
reindex_config = self.REINDEX_CONFIG.get(self.request_type)
if not reindex_config:
print(f"reindex_config not found: {self.request_type}")
raise ValueError
return reindex_config["queue_name"], self.request_type
def _get_total_in_queue(self, queue_name):
"""get all items in queue"""
total = 0
if queue_name == "all":
queues = [i["queue_name"] for i in self.REINDEX_CONFIG.values()]
for queue in queues:
total += len(RedisQueue(queue).get_all())
else:
total += len(RedisQueue(queue_name).get_all())
return total
def _get_state(self, total, queue_name):
"""get state based on request_id"""
state_dict = {}
if self.request_id:
state = RedisQueue(queue_name).in_queue(self.request_id)
state_dict.update({"id": self.request_id, "state": state})
return state_dict
if total:
state = "running"
else:
state = "empty"
state_dict.update({"state": state})
return state_dict
class ChannelFullScan:
"""
update from v0.3.0 to v0.3.1
full scan of channel to fix vid_type mismatch
"""
def __init__(self, channel_id):
self.channel_id = channel_id
self.to_update = False
def scan(self):
"""match local with remote"""
print(f"{self.channel_id}: start full scan")
all_local_videos = self._get_all_local()
all_remote_videos = self._get_all_remote()
self.to_update = []
for video in all_local_videos:
video_id = video["youtube_id"]
remote_match = [i for i in all_remote_videos if i[0] == video_id]
if not remote_match:
print(f"{video_id}: no remote match found")
continue
expected_type = remote_match[0][-1]
if video["vid_type"] != expected_type:
self.to_update.append(
{
"video_id": video_id,
"vid_type": expected_type,
}
)
self.update()
def _get_all_remote(self):
"""get all channel videos"""
sub = ChannelSubscription()
all_remote_videos = sub.get_last_youtube_videos(
self.channel_id, limit=False
)
return all_remote_videos
def _get_all_local(self):
"""get all local indexed channel_videos"""
channel = YoutubeChannel(self.channel_id)
all_local_videos = channel.get_channel_videos()
return all_local_videos
def update(self):
"""build bulk query for updates"""
if not self.to_update:
print(f"{self.channel_id}: nothing to update")
return
print(f"{self.channel_id}: fixing {len(self.to_update)} videos")
bulk_list = []
for video in self.to_update:
action = {
"update": {"_id": video.get("video_id"), "_index": "ta_video"}
}
source = {"doc": {"vid_type": video.get("vid_type")}}
bulk_list.append(json.dumps(action))
bulk_list.append(json.dumps(source))
# add last newline
bulk_list.append("\n")
data = "\n".join(bulk_list)
_, _ = ElasticWrap("_bulk").post(data=data, ndjson=True)

View File

@ -1,285 +0,0 @@
"""
functionality:
- handle snapshots in ES
"""
from datetime import datetime
from time import sleep
from zoneinfo import ZoneInfo
from common.src.env_settings import EnvironmentSettings
from common.src.es_connect import ElasticWrap
from common.src.helper import get_mapping
class ElasticSnapshot:
"""interact with snapshots on ES"""
REPO = "ta_snapshot"
REPO_SETTINGS = {
"compress": "true",
"chunk_size": "1g",
"location": EnvironmentSettings.ES_SNAPSHOT_DIR,
}
POLICY = "ta_daily"
def __init__(self):
self.all_indices = self._get_all_indices()
def _get_all_indices(self):
"""return all indices names managed by TA"""
mapping = get_mapping()
all_indices = [f"ta_{i['index_name']}" for i in mapping]
return all_indices
def setup(self):
"""setup the snapshot in ES, create or update if needed"""
print("snapshot: run setup")
repo_exists = self._check_repo_exists()
if not repo_exists:
self.create_repo()
policy_exists = self._check_policy_exists()
if not policy_exists:
self.create_policy()
is_outdated = self._needs_startup_snapshot()
if is_outdated:
_ = self.take_snapshot_now()
def _check_repo_exists(self):
"""check if expected repo already exists"""
path = f"_snapshot/{self.REPO}"
response, statuscode = ElasticWrap(path).get()
if statuscode == 200:
print(f"snapshot: repo {self.REPO} already created")
matching = response[self.REPO]["settings"] == self.REPO_SETTINGS
if not matching:
print(f"snapshot: update repo settings {self.REPO_SETTINGS}")
return matching
print(f"snapshot: setup repo {self.REPO} config {self.REPO_SETTINGS}")
return False
def create_repo(self):
"""create filesystem repo"""
path = f"_snapshot/{self.REPO}"
data = {
"type": "fs",
"settings": self.REPO_SETTINGS,
}
response, statuscode = ElasticWrap(path).post(data=data)
if statuscode == 200:
print(f"snapshot: repo setup correctly: {response}")
def _check_policy_exists(self):
"""check if snapshot policy is set correctly"""
policy = self._get_policy()
expected_policy = self._build_policy_data()
if not policy:
print(f"snapshot: create policy {self.POLICY} {expected_policy}")
return False
if policy["policy"] != expected_policy:
print(f"snapshot: update policy settings {expected_policy}")
return False
print("snapshot: policy is set.")
return True
def _get_policy(self):
"""get policy from es"""
path = f"_slm/policy/{self.POLICY}"
response, statuscode = ElasticWrap(path).get()
if statuscode != 200:
return False
return response[self.POLICY]
def create_policy(self):
"""create snapshot lifetime policy"""
path = f"_slm/policy/{self.POLICY}"
data = self._build_policy_data()
response, statuscode = ElasticWrap(path).put(data)
if statuscode == 200:
print(f"snapshot: policy setup correctly: {response}")
def _build_policy_data(self):
"""build policy dict from config"""
at_12 = datetime.now().replace(hour=12, minute=0, second=0)
hour = at_12.astimezone(ZoneInfo("UTC")).hour
return {
"schedule": f"0 0 {hour} * * ?",
"name": f"<{self.POLICY}_>",
"repository": self.REPO,
"config": {
"indices": self.all_indices,
"ignore_unavailable": True,
"include_global_state": True,
},
"retention": {
"expire_after": "30d",
"min_count": 5,
"max_count": 50,
},
}
def _needs_startup_snapshot(self):
"""check if last snapshot is expired"""
snap_dicts = self._get_all_snapshots()
if not snap_dicts:
print("snapshot: create initial snapshot")
return True
last_stamp = snap_dicts[0]["end_stamp"]
now = int(datetime.now().timestamp())
outdated = (now - last_stamp) / 60 / 60 > 24
if outdated:
print("snapshot: is outdated, create new now")
print("snapshot: last snapshot is up-to-date")
return outdated
def take_snapshot_now(self, wait=False):
"""execute daily snapshot now"""
path = f"_slm/policy/{self.POLICY}/_execute"
response, statuscode = ElasticWrap(path).post()
if statuscode == 200:
print(f"snapshot: executing now: {response}")
if wait and "snapshot_name" in response:
self._wait_for_snapshot(response["snapshot_name"])
return response
def _wait_for_snapshot(self, snapshot_name):
"""return after snapshot_name completes"""
path = f"_snapshot/{self.REPO}/{snapshot_name}"
while True:
# wait for task to be created
sleep(1)
_, statuscode = ElasticWrap(path).get()
if statuscode == 200:
break
while True:
# wait for snapshot success
response, statuscode = ElasticWrap(path).get()
snapshot_state = response["snapshots"][0]["state"]
if snapshot_state == "SUCCESS":
break
print(f"snapshot: {snapshot_name} in state {snapshot_state}")
print("snapshot: wait to complete")
sleep(5)
print(f"snapshot: completed - {response}")
def get_snapshot_stats(self):
"""get snapshot info for frontend"""
snapshot_info = self._build_policy_details()
if snapshot_info:
snapshot_info.update({"snapshots": self._get_all_snapshots()})
return snapshot_info
def get_single_snapshot(self, snapshot_id):
"""get single snapshot metadata"""
path = f"_snapshot/{self.REPO}/{snapshot_id}"
response, statuscode = ElasticWrap(path).get()
if statuscode == 404:
print(f"snapshots: not found: {snapshot_id}")
return False
snapshot = response["snapshots"][0]
return self._parse_single_snapshot(snapshot)
def _get_all_snapshots(self):
"""get a list of all registered snapshots"""
path = f"_snapshot/{self.REPO}/*?sort=start_time&order=desc"
response, statuscode = ElasticWrap(path).get()
if statuscode == 404:
print("snapshots: not configured")
return False
all_snapshots = response["snapshots"]
if not all_snapshots:
print("snapshots: no snapshots found")
return False
snap_dicts = []
for snapshot in all_snapshots:
snap_dict = self._parse_single_snapshot(snapshot)
snap_dicts.append(snap_dict)
return snap_dicts
def _parse_single_snapshot(self, snapshot):
"""extract relevant metadata from single snapshot"""
snap_dict = {
"id": snapshot["snapshot"],
"state": snapshot["state"],
"es_version": snapshot["version"],
"start_date": self._date_converter(snapshot["start_time"]),
"end_date": self._date_converter(snapshot["end_time"]),
"end_stamp": snapshot["end_time_in_millis"] // 1000,
"duration_s": snapshot["duration_in_millis"] // 1000,
}
return snap_dict
def _build_policy_details(self):
"""get additional policy details"""
policy = self._get_policy()
if not policy:
return False
next_exec = policy["next_execution_millis"] // 1000
next_exec_date = datetime.fromtimestamp(next_exec)
next_exec_str = next_exec_date.strftime("%Y-%m-%d %H:%M")
expire_after = policy["policy"]["retention"]["expire_after"]
policy_metadata = {
"next_exec": next_exec,
"next_exec_str": next_exec_str,
"expire_after": expire_after,
}
return policy_metadata
@staticmethod
def _date_converter(date_utc):
"""convert datetime string"""
date = datetime.strptime(date_utc, "%Y-%m-%dT%H:%M:%S.%fZ")
utc_date = date.replace(tzinfo=ZoneInfo("UTC"))
converted = utc_date.astimezone(ZoneInfo(EnvironmentSettings.TZ))
converted_str = converted.strftime("%Y-%m-%d %H:%M")
return converted_str
def restore_all(self, snapshot_name):
"""restore snapshot by name"""
for index in self.all_indices:
_, _ = ElasticWrap(index).delete()
path = f"_snapshot/{self.REPO}/{snapshot_name}/_restore"
data = {"indices": "*"}
response, statuscode = ElasticWrap(path).post(data=data)
if statuscode == 200:
print(f"snapshot: executing now: {response}")
return response
print(f"snapshot: failed to restore, {statuscode} {response}")
return False
def delete_single_snapshot(self, snapshot_id):
"""delete single snapshot from index"""
path = f"_snapshot/{self.REPO}/{snapshot_id}"
response, statuscode = ElasticWrap(path).delete()
if statuscode == 200:
print(f"snapshot: deleting {snapshot_id} {response}")
return response
print(f"snapshot: failed to delete, {statuscode} {response}")
return False

View File

@ -1,47 +0,0 @@
"""all app settings API urls"""
from appsettings import views
from django.urls import path
urlpatterns = [
path(
"config/",
views.AppConfigApiView.as_view(),
name="api-config",
),
path(
"snapshot/",
views.SnapshotApiListView.as_view(),
name="api-snapshot-list",
),
path(
"snapshot/<slug:snapshot_id>/",
views.SnapshotApiView.as_view(),
name="api-snapshot",
),
path(
"backup/",
views.BackupApiListView.as_view(),
name="api-backup-list",
),
path(
"backup/<str:filename>/",
views.BackupApiView.as_view(),
name="api-backup",
),
path(
"cookie/",
views.CookieView.as_view(),
name="api-cookie",
),
path(
"potoken/",
views.POTokenView.as_view(),
name="api-potoken",
),
path(
"token/",
views.TokenView.as_view(),
name="api-token",
),
]

View File

@ -1,493 +0,0 @@
"""all app settings API views"""
from appsettings.serializers import (
AppConfigSerializer,
BackupFileSerializer,
CookieUpdateSerializer,
CookieValidationSerializer,
PoTokenSerializer,
SnapshotCreateResponseSerializer,
SnapshotItemSerializer,
SnapshotListSerializer,
SnapshotRestoreResponseSerializer,
TokenResponseSerializer,
)
from appsettings.src.backup import ElasticBackup
from appsettings.src.config import AppConfig
from appsettings.src.snapshot import ElasticSnapshot
from common.serializers import (
AsyncTaskResponseSerializer,
ErrorResponseSerializer,
)
from common.src.ta_redis import RedisArchivist
from common.views_base import AdminOnly, AdminWriteOnly, ApiBaseView
from django.conf import settings
from download.src.yt_dlp_base import CookieHandler, POTokenHandler
from drf_spectacular.utils import OpenApiResponse, extend_schema
from rest_framework.authtoken.models import Token
from rest_framework.response import Response
from task.src.task_manager import TaskCommand
from task.tasks import run_restore_backup
class BackupApiListView(ApiBaseView):
"""resolves to /api/appsettings/backup/
GET: returns list of available zip backups
POST: take zip backup now
"""
permission_classes = [AdminOnly]
task_name = "run_backup"
@staticmethod
@extend_schema(
responses={
200: OpenApiResponse(BackupFileSerializer(many=True)),
},
)
def get(request):
"""get list of available backup files"""
# pylint: disable=unused-argument
backup_files = ElasticBackup().get_all_backup_files()
serializer = BackupFileSerializer(backup_files, many=True)
return Response(serializer.data)
@extend_schema(
responses={
200: OpenApiResponse(AsyncTaskResponseSerializer()),
},
)
def post(self, request):
"""start new backup file task"""
# pylint: disable=unused-argument
response = TaskCommand().start(self.task_name)
message = {
"message": "backup task started",
"task_id": response["task_id"],
}
serializer = AsyncTaskResponseSerializer(message)
return Response(serializer.data)
class BackupApiView(ApiBaseView):
"""resolves to /api/appsettings/backup/<filename>/
GET: return a single backup
POST: restore backup
DELETE: delete backup
"""
permission_classes = [AdminOnly]
task_name = "restore_backup"
@staticmethod
@extend_schema(
responses={
200: OpenApiResponse(BackupFileSerializer()),
404: OpenApiResponse(
ErrorResponseSerializer(), description="file not found"
),
}
)
def get(request, filename):
"""get single backup"""
# pylint: disable=unused-argument
backup_file = ElasticBackup().build_backup_file_data(filename)
if not backup_file:
error = ErrorResponseSerializer({"error": "file not found"})
return Response(error.data, status=404)
serializer = BackupFileSerializer(backup_file)
return Response(serializer.data)
@extend_schema(
responses={
200: OpenApiResponse(AsyncTaskResponseSerializer()),
404: OpenApiResponse(
ErrorResponseSerializer(), description="file not found"
),
}
)
def post(self, request, filename):
"""start new task to restore backup file"""
# pylint: disable=unused-argument
backup_file = ElasticBackup().build_backup_file_data(filename)
if not backup_file:
error = ErrorResponseSerializer({"error": "file not found"})
return Response(error.data, status=404)
task = run_restore_backup.delay(filename)
message = {
"message": "backup restore task started",
"filename": filename,
"task_id": task.id,
}
return Response(message)
@staticmethod
@extend_schema(
responses={
204: OpenApiResponse(description="file deleted"),
404: OpenApiResponse(
ErrorResponseSerializer(), description="file not found"
),
}
)
def delete(request, filename):
"""delete backup file"""
# pylint: disable=unused-argument
backup_file = ElasticBackup().delete_file(filename)
if not backup_file:
error = ErrorResponseSerializer({"error": "file not found"})
return Response(error.data, status=404)
return Response(status=204)
class AppConfigApiView(ApiBaseView):
"""resolves to /api/appsettings/config/
GET: return app settings
POST: update app settings
"""
permission_classes = [AdminWriteOnly]
@staticmethod
@extend_schema(
responses={
200: OpenApiResponse(AppConfigSerializer()),
}
)
def get(request):
"""get app config"""
response = AppConfig().config
serializer = AppConfigSerializer(response)
return Response(serializer.data)
@staticmethod
@extend_schema(
request=AppConfigSerializer(),
responses={
200: OpenApiResponse(AppConfigSerializer()),
400: OpenApiResponse(
ErrorResponseSerializer(), description="Bad request"
),
},
)
def post(request):
"""update config values, allows partial update"""
serializer = AppConfigSerializer(data=request.data, partial=True)
serializer.is_valid(raise_exception=True)
validated_data = serializer.validated_data
updated_config = AppConfig().update_config(validated_data)
updated_serializer = AppConfigSerializer(updated_config)
return Response(updated_serializer.data)
class CookieView(ApiBaseView):
"""resolves to /api/appsettings/cookie/
GET: check if cookie is enabled
POST: verify validity of cookie
PUT: import cookie
DELETE: revoke the cookie
"""
permission_classes = [AdminOnly]
@extend_schema(
responses={
200: OpenApiResponse(CookieValidationSerializer()),
}
)
def get(self, request):
"""get cookie validation status"""
# pylint: disable=unused-argument
validation = self._get_cookie_validation()
serializer = CookieValidationSerializer(validation)
return Response(serializer.data)
@extend_schema(
responses={
200: OpenApiResponse(CookieValidationSerializer()),
}
)
def post(self, request):
"""validate cookie"""
# pylint: disable=unused-argument
config = AppConfig().config
_ = CookieHandler(config).validate()
validation = self._get_cookie_validation()
serializer = CookieValidationSerializer(validation)
return Response(serializer.data)
@extend_schema(
request=CookieUpdateSerializer(),
responses={
200: OpenApiResponse(CookieValidationSerializer()),
400: OpenApiResponse(
ErrorResponseSerializer(), description="Bad request"
),
},
)
def put(self, request):
"""handle put request"""
# pylint: disable=unused-argument
serializer = CookieUpdateSerializer(data=request.data)
serializer.is_valid(raise_exception=True)
validated_data = serializer.validated_data
cookie = validated_data.get("cookie")
if not cookie:
message = "missing cookie key in request data"
print(message)
error = ErrorResponseSerializer({"error": message})
return Response(error.data, status=400)
if settings.DEBUG:
print(f"[cookie] preview:\n\n{cookie[:300]}")
config = AppConfig().config
handler = CookieHandler(config)
handler.set_cookie(cookie)
validated = handler.validate()
if not validated:
message = "[cookie]: import failed, not valid"
print(message)
error = ErrorResponseSerializer({"error": message})
handler.revoke()
return Response(error.data, status=400)
validation = self._get_cookie_validation()
serializer = CookieValidationSerializer(validation)
return Response(serializer.data)
@extend_schema(
responses={
204: OpenApiResponse(description="Cookie revoked"),
},
)
def delete(self, request):
"""delete the cookie"""
config = AppConfig().config
handler = CookieHandler(config)
handler.revoke()
return Response(status=204)
@staticmethod
def _get_cookie_validation():
"""get current cookie validation"""
config = AppConfig().config
validation = RedisArchivist().get_message_dict("cookie:valid")
is_enabled = {"cookie_enabled": config["downloads"]["cookie_import"]}
validation.update(is_enabled)
return validation
class POTokenView(ApiBaseView):
"""handle PO token"""
permission_classes = [AdminOnly]
@extend_schema(
responses={
200: OpenApiResponse(PoTokenSerializer()),
404: OpenApiResponse(
ErrorResponseSerializer(), description="PO token not found"
),
}
)
def get(self, request):
"""get PO token"""
config = AppConfig().config
potoken = POTokenHandler(config).get()
if not potoken:
error = ErrorResponseSerializer({"error": "PO token not found"})
return Response(error.data, status=404)
serializer = PoTokenSerializer(data={"potoken": potoken})
serializer.is_valid(raise_exception=True)
return Response(serializer.data)
@extend_schema(
responses={
200: OpenApiResponse(PoTokenSerializer()),
400: OpenApiResponse(
ErrorResponseSerializer(), description="Bad request"
),
}
)
def post(self, request):
"""Update PO token"""
serializer = PoTokenSerializer(data=request.data)
serializer.is_valid(raise_exception=True)
validated_data = serializer.validated_data
if not validated_data:
error = ErrorResponseSerializer(
{"error": "missing PO token key in request data"}
)
return Response(error.data, status=400)
config = AppConfig().config
new_token = validated_data["potoken"]
POTokenHandler(config).set_token(new_token)
return Response(serializer.data)
@extend_schema(
responses={
204: OpenApiResponse(description="PO token revoked"),
},
)
def delete(self, request):
"""delete PO token"""
config = AppConfig().config
POTokenHandler(config).revoke_token()
return Response(status=204)
class SnapshotApiListView(ApiBaseView):
"""resolves to /api/appsettings/snapshot/
GET: returns snapshot config plus list of existing snapshots
POST: take snapshot now
"""
permission_classes = [AdminOnly]
@staticmethod
@extend_schema(
responses={
200: OpenApiResponse(SnapshotListSerializer()),
}
)
def get(request):
"""get available snapshots with metadata"""
# pylint: disable=unused-argument
snapshots = ElasticSnapshot().get_snapshot_stats()
serializer = SnapshotListSerializer(snapshots)
return Response(serializer.data)
@staticmethod
@extend_schema(
responses={
200: OpenApiResponse(SnapshotCreateResponseSerializer()),
}
)
def post(request):
"""take snapshot now"""
# pylint: disable=unused-argument
response = ElasticSnapshot().take_snapshot_now()
serializer = SnapshotCreateResponseSerializer(response)
return Response(serializer.data)
class SnapshotApiView(ApiBaseView):
"""resolves to /api/appsettings/snapshot/<snapshot-id>/
GET: return a single snapshot
POST: restore snapshot
DELETE: delete a snapshot
"""
permission_classes = [AdminOnly]
@staticmethod
@extend_schema(
responses={
200: OpenApiResponse(SnapshotItemSerializer()),
404: OpenApiResponse(
ErrorResponseSerializer(), description="snapshot not found"
),
}
)
def get(request, snapshot_id):
"""handle get request"""
# pylint: disable=unused-argument
snapshot = ElasticSnapshot().get_single_snapshot(snapshot_id)
if not snapshot:
error = ErrorResponseSerializer({"error": "snapshot not found"})
return Response(error.data, status=404)
serializer = SnapshotItemSerializer(snapshot)
return Response(serializer.data)
@staticmethod
@extend_schema(
responses={
200: OpenApiResponse(SnapshotRestoreResponseSerializer()),
400: OpenApiResponse(
ErrorResponseSerializer(), description="bad request"
),
}
)
def post(request, snapshot_id):
"""restore snapshot"""
# pylint: disable=unused-argument
response = ElasticSnapshot().restore_all(snapshot_id)
if not response:
error = ErrorResponseSerializer(
{"error": "failed to restore snapshot"}
)
return Response(error.data, status=400)
serializer = SnapshotRestoreResponseSerializer(response)
return Response(serializer.data)
@staticmethod
@extend_schema(
responses={
204: OpenApiResponse(description="delete snapshot from index"),
}
)
def delete(request, snapshot_id):
"""delete snapshot from index"""
# pylint: disable=unused-argument
response = ElasticSnapshot().delete_single_snapshot(snapshot_id)
if not response:
error = ErrorResponseSerializer(
{"error": "failed to delete snapshot"}
)
return Response(error.data, status=400)
return Response(status=204)
class TokenView(ApiBaseView):
"""resolves to /api/appsettings/token/
GET: get API token
DELETE: revoke the token
"""
permission_classes = [AdminOnly]
@staticmethod
@extend_schema(
responses={
200: OpenApiResponse(TokenResponseSerializer()),
}
)
def get(request):
"""get your API token"""
token, _ = Token.objects.get_or_create(user=request.user)
serializer = TokenResponseSerializer({"token": token.key})
return Response(serializer.data)
@staticmethod
@extend_schema(
responses={
204: OpenApiResponse(description="delete token"),
}
)
def delete(request):
"""delete your API token, new will get created on next get"""
print("revoke API token")
request.user.auth_token.delete()
return Response(status=204)

View File

@ -1,103 +0,0 @@
"""channel serializers"""
# pylint: disable=abstract-method
from common.serializers import PaginationSerializer, ValidateUnknownFieldsMixin
from rest_framework import serializers
class ChannelOverwriteSerializer(
ValidateUnknownFieldsMixin, serializers.Serializer
):
"""serialize channel overwrites"""
download_format = serializers.CharField(required=False, allow_null=True)
autodelete_days = serializers.IntegerField(required=False, allow_null=True)
index_playlists = serializers.BooleanField(required=False, allow_null=True)
integrate_sponsorblock = serializers.BooleanField(
required=False, allow_null=True
)
subscriptions_channel_size = serializers.IntegerField(
required=False, allow_null=True
)
subscriptions_live_channel_size = serializers.IntegerField(
required=False, allow_null=True
)
subscriptions_shorts_channel_size = serializers.IntegerField(
required=False, allow_null=True
)
class ChannelSerializer(serializers.Serializer):
"""serialize channel"""
channel_id = serializers.CharField()
channel_active = serializers.BooleanField()
channel_banner_url = serializers.CharField()
channel_thumb_url = serializers.CharField()
channel_tvart_url = serializers.CharField()
channel_description = serializers.CharField()
channel_last_refresh = serializers.CharField()
channel_name = serializers.CharField()
channel_overwrites = ChannelOverwriteSerializer(required=False)
channel_subs = serializers.IntegerField()
channel_subscribed = serializers.BooleanField()
channel_tags = serializers.ListField(
child=serializers.CharField(), required=False
)
channel_views = serializers.IntegerField()
_index = serializers.CharField(required=False)
_score = serializers.IntegerField(required=False)
class ChannelListSerializer(serializers.Serializer):
"""serialize channel list"""
data = ChannelSerializer(many=True)
paginate = PaginationSerializer()
class ChannelListQuerySerializer(serializers.Serializer):
"""serialize list query"""
filter = serializers.ChoiceField(choices=["subscribed"], required=False)
page = serializers.IntegerField(required=False)
class ChannelUpdateSerializer(serializers.Serializer):
"""update channel"""
channel_subscribed = serializers.BooleanField(required=False)
channel_overwrites = ChannelOverwriteSerializer(required=False)
class ChannelAggBucketSerializer(serializers.Serializer):
"""serialize channel agg bucket"""
value = serializers.IntegerField()
value_str = serializers.CharField(required=False)
class ChannelAggSerializer(serializers.Serializer):
"""serialize channel aggregation"""
total_items = ChannelAggBucketSerializer()
total_size = ChannelAggBucketSerializer()
total_duration = ChannelAggBucketSerializer()
class ChannelNavSerializer(serializers.Serializer):
"""serialize channel navigation"""
has_pending = serializers.BooleanField()
has_ignored = serializers.BooleanField()
has_playlists = serializers.BooleanField()
has_videos = serializers.BooleanField()
has_streams = serializers.BooleanField()
has_shorts = serializers.BooleanField()
class ChannelSearchQuerySerializer(serializers.Serializer):
"""serialize query parameters for searching"""
q = serializers.CharField()

View File

@ -1,363 +0,0 @@
"""
functionality:
- get metadata from youtube for a channel
- index and update in es
"""
import json
import os
from datetime import datetime
from common.src.env_settings import EnvironmentSettings
from common.src.es_connect import ElasticWrap, IndexPaginate
from common.src.helper import rand_sleep
from common.src.index_generic import YouTubeItem
from download.src.thumbnails import ThumbManager
from download.src.yt_dlp_base import YtWrap
class YoutubeChannel(YouTubeItem):
"""represents a single youtube channel"""
es_path = False
index_name = "ta_channel"
yt_base = "https://www.youtube.com/channel/"
yt_obs = {
"playlist_items": "0,0",
"skip_download": True,
}
def __init__(self, youtube_id, task=False):
super().__init__(youtube_id)
self.all_playlists = False
self.task = task
def build_json(self, upload=False, fallback=False):
"""get from es or from youtube"""
self.get_from_es()
if self.json_data:
return
self.get_from_youtube()
if not self.youtube_meta and fallback:
self._video_fallback(fallback)
else:
if not self.youtube_meta:
message = f"{self.youtube_id}: Failed to get metadata"
raise ValueError(message)
self.process_youtube_meta()
self.get_channel_art()
if upload:
self.upload_to_es()
def process_youtube_meta(self):
"""extract relevant fields"""
self.youtube_meta["thumbnails"].reverse()
channel_name = self.youtube_meta["uploader"] or self.youtube_meta["id"]
self.json_data = {
"channel_active": True,
"channel_description": self.youtube_meta.get("description", ""),
"channel_id": self.youtube_id,
"channel_last_refresh": int(datetime.now().timestamp()),
"channel_name": channel_name,
"channel_subs": self.youtube_meta.get("channel_follower_count", 0),
"channel_subscribed": False,
"channel_tags": self.youtube_meta.get("tags", []),
"channel_banner_url": self._get_banner_art(),
"channel_thumb_url": self._get_thumb_art(),
"channel_tvart_url": self._get_tv_art(),
"channel_views": self.youtube_meta.get("view_count") or 0,
}
def _get_thumb_art(self):
"""extract thumb art"""
for i in self.youtube_meta["thumbnails"]:
if not i.get("width"):
continue
if i.get("width") == i.get("height"):
return i["url"]
return False
def _get_tv_art(self):
"""extract tv artwork"""
for i in self.youtube_meta["thumbnails"]:
if i.get("id") == "banner_uncropped":
return i["url"]
for i in self.youtube_meta["thumbnails"]:
if not i.get("width"):
continue
if i["width"] // i["height"] < 2 and not i["width"] == i["height"]:
return i["url"]
return False
def _get_banner_art(self):
"""extract banner artwork"""
for i in self.youtube_meta["thumbnails"]:
if not i.get("width"):
continue
if i["width"] // i["height"] > 5:
return i["url"]
return False
def _video_fallback(self, fallback):
"""use video metadata as fallback"""
print(f"{self.youtube_id}: fallback to video metadata")
self.json_data = {
"channel_active": False,
"channel_last_refresh": int(datetime.now().timestamp()),
"channel_subs": fallback.get("channel_follower_count", 0),
"channel_name": fallback["uploader"],
"channel_banner_url": False,
"channel_tvart_url": False,
"channel_id": self.youtube_id,
"channel_subscribed": False,
"channel_tags": [],
"channel_description": "",
"channel_thumb_url": False,
"channel_views": 0,
}
self._info_json_fallback()
def _info_json_fallback(self):
"""read channel info.json for additional metadata"""
info_json = os.path.join(
EnvironmentSettings.CACHE_DIR,
"import",
f"{self.youtube_id}.info.json",
)
if os.path.exists(info_json):
print(f"{self.youtube_id}: read info.json file")
with open(info_json, "r", encoding="utf-8") as f:
content = json.loads(f.read())
self.json_data.update(
{
"channel_subs": content.get("channel_follower_count", 0),
"channel_description": content.get("description", False),
}
)
os.remove(info_json)
def get_channel_art(self):
"""download channel art for new channels"""
urls = (
self.json_data["channel_thumb_url"],
self.json_data["channel_banner_url"],
self.json_data["channel_tvart_url"],
)
ThumbManager(self.youtube_id, item_type="channel").download(urls)
def sync_to_videos(self):
"""sync new channel_dict to all videos of channel"""
# add ingest pipeline
processors = []
for field, value in self.json_data.items():
if value is None:
line = {
"script": {
"lang": "painless",
"source": f"ctx['{field}'] = null;",
}
}
else:
line = {"set": {"field": "channel." + field, "value": value}}
processors.append(line)
data = {"description": self.youtube_id, "processors": processors}
ingest_path = f"_ingest/pipeline/{self.youtube_id}"
_, _ = ElasticWrap(ingest_path).put(data)
# apply pipeline
data = {"query": {"match": {"channel.channel_id": self.youtube_id}}}
update_path = f"ta_video/_update_by_query?pipeline={self.youtube_id}"
_, _ = ElasticWrap(update_path).post(data)
def get_folder_path(self):
"""get folder where media files get stored"""
folder_path = os.path.join(
EnvironmentSettings.MEDIA_DIR,
self.json_data["channel_id"],
)
return folder_path
def delete_es_videos(self):
"""delete all channel documents from elasticsearch"""
data = {
"query": {
"term": {"channel.channel_id": {"value": self.youtube_id}}
}
}
_, _ = ElasticWrap("ta_video/_delete_by_query").post(data)
def delete_es_comments(self):
"""delete all comments from this channel"""
data = {
"query": {
"term": {"comment_channel_id": {"value": self.youtube_id}}
}
}
_, _ = ElasticWrap("ta_comment/_delete_by_query").post(data)
def delete_es_subtitles(self):
"""delete all subtitles from this channel"""
data = {
"query": {
"term": {"subtitle_channel_id": {"value": self.youtube_id}}
}
}
_, _ = ElasticWrap("ta_subtitle/_delete_by_query").post(data)
def delete_playlists(self):
"""delete all indexed playlist from es"""
from playlist.src.index import YoutubePlaylist
all_playlists = self.get_indexed_playlists()
for playlist in all_playlists:
YoutubePlaylist(playlist["playlist_id"]).delete_metadata()
def delete_channel(self):
"""delete channel and all videos"""
print(f"{self.youtube_id}: delete channel")
self.get_from_es()
if not self.json_data:
raise FileNotFoundError
folder_path = self.get_folder_path()
print(f"{self.youtube_id}: delete all media files")
try:
all_videos = os.listdir(folder_path)
for video in all_videos:
video_path = os.path.join(folder_path, video)
os.remove(video_path)
os.rmdir(folder_path)
except FileNotFoundError:
print(f"no videos found for {folder_path}")
print(f"{self.youtube_id}: delete indexed playlists")
self.delete_playlists()
print(f"{self.youtube_id}: delete indexed videos")
self.delete_es_videos()
self.delete_es_comments()
self.delete_es_subtitles()
self.del_in_es()
def index_channel_playlists(self):
"""add all playlists of channel to index"""
print(f"{self.youtube_id}: index all playlists")
self.get_from_es()
channel_name = self.json_data["channel_name"]
self.task.send_progress([f"{channel_name}: Looking for Playlists"])
self.get_all_playlists()
if not self.all_playlists:
print(f"{self.youtube_id}: no playlists found.")
return
total = len(self.all_playlists)
for idx, playlist in enumerate(self.all_playlists):
if self.task:
self._notify_single_playlist(idx, total)
self._index_single_playlist(playlist)
print("add playlist: " + playlist[1])
rand_sleep(self.config)
def _notify_single_playlist(self, idx, total):
"""send notification"""
channel_name = self.json_data["channel_name"]
message = [
f"{channel_name}: Scanning channel for playlists",
f"Progress: {idx + 1}/{total}",
]
self.task.send_progress(message, progress=(idx + 1) / total)
@staticmethod
def _index_single_playlist(playlist):
"""add single playlist if needed"""
from playlist.src.index import YoutubePlaylist
playlist = YoutubePlaylist(playlist[0])
playlist.update_playlist(skip_on_empty=True)
def get_channel_videos(self):
"""get all videos from channel"""
data = {
"query": {
"term": {"channel.channel_id": {"value": self.youtube_id}}
},
"_source": ["youtube_id", "vid_type"],
}
all_videos = IndexPaginate("ta_video", data).get_results()
return all_videos
def get_all_playlists(self):
"""get all playlists owned by this channel"""
url = (
f"https://www.youtube.com/channel/{self.youtube_id}"
+ "/playlists?view=1&sort=dd&shelf_id=0"
)
obs = {"skip_download": True, "extract_flat": True}
playlists = YtWrap(obs, self.config).extract(url)
if not playlists:
self.all_playlists = []
return
all_entries = [(i["id"], i["title"]) for i in playlists["entries"]]
self.all_playlists = all_entries
def get_indexed_playlists(self, active_only=False):
"""get all indexed playlists from channel"""
must_list = [
{"term": {"playlist_channel_id": {"value": self.youtube_id}}}
]
if active_only:
must_list.append({"term": {"playlist_active": {"value": True}}})
data = {"query": {"bool": {"must": must_list}}}
all_playlists = IndexPaginate("ta_playlist", data).get_results()
return all_playlists
def get_overwrites(self) -> dict:
"""get all per channel overwrites"""
return self.json_data.get("channel_overwrites", {})
def set_overwrites(self, overwrites):
"""set per channel overwrites"""
valid_keys = [
"download_format",
"autodelete_days",
"index_playlists",
"integrate_sponsorblock",
"subscriptions_channel_size",
"subscriptions_live_channel_size",
"subscriptions_shorts_channel_size",
]
to_write = self.json_data.get("channel_overwrites", {})
for key, value in overwrites.items():
if key not in valid_keys:
raise ValueError(f"invalid overwrite key: {key}")
if value is None and key in to_write:
to_write.pop(key)
continue
to_write.update({key: value})
self.json_data["channel_overwrites"] = to_write
def channel_overwrites(channel_id, overwrites):
"""collection to overwrite settings per channel"""
channel = YoutubeChannel(channel_id)
channel.build_json()
channel.set_overwrites(overwrites)
channel.upload_to_es()
channel.sync_to_videos()
return channel.json_data

View File

@ -1,97 +0,0 @@
"""build channel nav"""
from common.src.es_connect import ElasticWrap
class ChannelNav:
"""get all nav items"""
def __init__(self, channel_id):
self.channel_id = channel_id
def get_nav(self):
"""build nav items"""
nav = {
"has_pending": self._get_has_pending(),
"has_ignored": self._get_has_ignored(),
"has_playlists": self._get_has_playlists(),
}
nav.update(self._get_vid_types())
return nav
def _get_vid_types(self):
"""get available vid_types in given channel"""
data = {
"size": 0,
"query": {
"term": {"channel.channel_id": {"value": self.channel_id}}
},
"aggs": {"unique_values": {"terms": {"field": "vid_type"}}},
}
response, _ = ElasticWrap("ta_video/_search").get(data)
buckets = response["aggregations"]["unique_values"]["buckets"]
type_nav = {
"has_videos": False,
"has_streams": False,
"has_shorts": False,
}
for bucket in buckets:
if bucket["key"] == "videos":
type_nav["has_videos"] = True
if bucket["key"] == "streams":
type_nav["has_streams"] = True
if bucket["key"] == "shorts":
type_nav["has_shorts"] = True
return type_nav
def _get_has_pending(self):
"""check if has pending videos in download queue"""
data = {
"size": 1,
"query": {
"bool": {
"must": [
{"term": {"status": {"value": "pending"}}},
{"term": {"channel_id": {"value": self.channel_id}}},
]
}
},
"_source": False,
}
response, _ = ElasticWrap("ta_download/_search").get(data=data)
return bool(response["hits"]["hits"])
def _get_has_ignored(self):
"""Check if there are ignored videos in the download queue"""
data = {
"size": 1,
"query": {
"bool": {
"must": [
{"term": {"status": {"value": "ignore"}}},
{"term": {"channel_id": {"value": self.channel_id}}},
]
}
},
"_source": False,
}
response, _ = ElasticWrap("ta_download/_search").get(data=data)
return bool(response["hits"]["hits"])
def _get_has_playlists(self):
"""check if channel has playlists"""
path = "ta_playlist/_search"
data = {
"size": 1,
"query": {
"term": {"playlist_channel_id": {"value": self.channel_id}}
},
"_source": False,
}
response, _ = ElasticWrap(path).get(data=data)
return bool(response["hits"]["hits"])

View File

@ -1,32 +0,0 @@
"""all channel API urls"""
from channel import views
from django.urls import path
urlpatterns = [
path(
"",
views.ChannelApiListView.as_view(),
name="api-channel-list",
),
path(
"search/",
views.ChannelApiSearchView.as_view(),
name="api-channel-search",
),
path(
"<slug:channel_id>/",
views.ChannelApiView.as_view(),
name="api-channel",
),
path(
"<slug:channel_id>/aggs/",
views.ChannelAggsApiView.as_view(),
name="api-channel-aggs",
),
path(
"<slug:channel_id>/nav/",
views.ChannelNavApiView.as_view(),
name="api-channel-nav",
),
]

View File

@ -1,281 +0,0 @@
"""all channel API views"""
from channel.serializers import (
ChannelAggSerializer,
ChannelListQuerySerializer,
ChannelListSerializer,
ChannelNavSerializer,
ChannelSearchQuerySerializer,
ChannelSerializer,
ChannelUpdateSerializer,
)
from channel.src.index import YoutubeChannel, channel_overwrites
from channel.src.nav import ChannelNav
from common.serializers import ErrorResponseSerializer
from common.src.urlparser import Parser
from common.views_base import AdminWriteOnly, ApiBaseView
from download.src.subscriptions import ChannelSubscription
from drf_spectacular.utils import (
OpenApiParameter,
OpenApiResponse,
extend_schema,
)
from rest_framework.response import Response
from task.tasks import index_channel_playlists, subscribe_to
class ChannelApiListView(ApiBaseView):
"""resolves to /api/channel/
GET: returns list of channels
POST: edit a list of channels
"""
search_base = "ta_channel/_search/"
valid_filter = ["subscribed"]
permission_classes = [AdminWriteOnly]
@extend_schema(
responses={
200: OpenApiResponse(ChannelListSerializer()),
},
parameters=[ChannelListQuerySerializer()],
)
def get(self, request):
"""get request"""
self.data.update(
{"sort": [{"channel_name.keyword": {"order": "asc"}}]}
)
serializer = ChannelListQuerySerializer(data=request.query_params)
serializer.is_valid(raise_exception=True)
validated_data = serializer.validated_data
must_list = []
query_filter = validated_data.get("filter")
if query_filter:
must_list.append({"term": {"channel_subscribed": {"value": True}}})
self.data["query"] = {"bool": {"must": must_list}}
self.get_document_list(request)
serializer = ChannelListSerializer(self.response)
return Response(serializer.data)
def post(self, request):
"""subscribe/unsubscribe to list of channels"""
data = request.data
try:
to_add = data["data"]
except KeyError:
message = "missing expected data key"
print(message)
return Response({"message": message}, status=400)
pending = []
for channel_item in to_add:
channel_id = channel_item["channel_id"]
if channel_item["channel_subscribed"]:
pending.append(channel_id)
else:
self._unsubscribe(channel_id)
if pending:
url_str = " ".join(pending)
subscribe_to.delay(url_str, expected_type="channel")
return Response(data)
@staticmethod
def _unsubscribe(channel_id: str):
"""unsubscribe"""
print(f"[{channel_id}] unsubscribe from channel")
ChannelSubscription().change_subscribe(
channel_id, channel_subscribed=False
)
class ChannelApiView(ApiBaseView):
"""resolves to /api/channel/<channel_id>/
GET: returns metadata dict of channel
"""
search_base = "ta_channel/_doc/"
permission_classes = [AdminWriteOnly]
@extend_schema(
responses={
200: OpenApiResponse(ChannelSerializer()),
404: OpenApiResponse(
ErrorResponseSerializer(), description="Channel not found"
),
}
)
def get(self, request, channel_id):
# pylint: disable=unused-argument
"""get channel detail"""
self.get_document(channel_id)
if not self.response:
error = ErrorResponseSerializer({"error": "channel not found"})
return Response(error.data, status=404)
response_serializer = ChannelSerializer(self.response)
return Response(response_serializer.data, status=self.status_code)
@extend_schema(
request=ChannelUpdateSerializer(),
responses={
200: OpenApiResponse(ChannelUpdateSerializer()),
400: OpenApiResponse(
ErrorResponseSerializer(), description="Bad request"
),
404: OpenApiResponse(
ErrorResponseSerializer(), description="Channel not found"
),
},
)
def post(self, request, channel_id):
"""modify channel"""
self.get_document(channel_id)
if not self.response:
error = ErrorResponseSerializer({"error": "channel not found"})
return Response(error.data, status=404)
serializer = ChannelUpdateSerializer(data=request.data)
serializer.is_valid(raise_exception=True)
validated_data = serializer.validated_data
subscribed = validated_data.get("channel_subscribed")
if subscribed is not None:
ChannelSubscription().change_subscribe(channel_id, subscribed)
overwrites = validated_data.get("channel_overwrites")
if overwrites:
channel_overwrites(channel_id, overwrites)
if overwrites.get("index_playlists"):
index_channel_playlists.delay(channel_id)
return Response(serializer.data, status=200)
@extend_schema(
responses={
204: OpenApiResponse(description="Channel deleted"),
404: OpenApiResponse(
ErrorResponseSerializer(), description="Channel not found"
),
},
)
def delete(self, request, channel_id):
# pylint: disable=unused-argument
"""delete channel"""
try:
YoutubeChannel(channel_id).delete_channel()
return Response(status=204)
except FileNotFoundError:
pass
error = ErrorResponseSerializer({"error": "channel not found"})
return Response(error.data, status=404)
class ChannelAggsApiView(ApiBaseView):
"""resolves to /api/channel/<channel_id>/aggs/
GET: get channel aggregations
"""
search_base = "ta_video/_search"
@extend_schema(
responses={
200: OpenApiResponse(ChannelAggSerializer()),
},
)
def get(self, request, channel_id):
"""get channel aggregations"""
self.data.update(
{
"query": {
"term": {"channel.channel_id": {"value": channel_id}}
},
"aggs": {
"total_items": {"value_count": {"field": "youtube_id"}},
"total_size": {"sum": {"field": "media_size"}},
"total_duration": {"sum": {"field": "player.duration"}},
},
}
)
self.get_aggs()
serializer = ChannelAggSerializer(self.response)
return Response(serializer.data)
class ChannelNavApiView(ApiBaseView):
"""resolves to /api/channel/<channel_id>/nav/
GET: get channel nav
"""
@extend_schema(
responses={
200: OpenApiResponse(ChannelNavSerializer()),
},
)
def get(self, request, channel_id):
"""get navigation"""
nav = ChannelNav(channel_id).get_nav()
serializer = ChannelNavSerializer(nav)
return Response(serializer.data)
class ChannelApiSearchView(ApiBaseView):
"""resolves to /api/channel/search/
search for channel
"""
search_base = "ta_channel/_doc/"
@extend_schema(
responses={
200: OpenApiResponse(ChannelSerializer()),
400: OpenApiResponse(description="Bad Request"),
404: OpenApiResponse(
ErrorResponseSerializer(), description="Channel not found"
),
},
parameters=[
OpenApiParameter(
name="q",
description="Search query string",
required=True,
type=str,
),
],
)
def get(self, request):
"""search for local channel ID"""
serializer = ChannelSearchQuerySerializer(data=request.query_params)
serializer.is_valid(raise_exception=True)
validated_data = serializer.validated_data
query = validated_data.get("q")
if not query:
message = "missing expected q parameter"
return Response({"message": message, "data": False}, status=400)
try:
parsed = Parser(query).parse()[0]
except (ValueError, IndexError, AttributeError):
error = ErrorResponseSerializer(
{"error": f"channel not found: {query}"}
)
return Response(error.data, status=404)
if not parsed["type"] == "channel":
error = ErrorResponseSerializer({"error": "expected channel data"})
return Response(error.data, status=400)
self.get_document(parsed["url"])
serializer = ChannelSerializer(self.response)
return Response(serializer.data, status=self.status_code)

View File

@ -1,143 +0,0 @@
"""common serializers"""
# pylint: disable=abstract-method
from rest_framework import serializers
class ValidateUnknownFieldsMixin:
"""
Mixin to validate and reject unknown fields in a serializer.
"""
def to_internal_value(self, data):
"""check expected keys"""
allowed_fields = set(self.fields.keys())
input_fields = set(data.keys())
# Find unknown fields
unknown_fields = input_fields - allowed_fields
if unknown_fields:
raise serializers.ValidationError(
{"error": f"Unknown fields: {', '.join(unknown_fields)}"}
)
return super().to_internal_value(data)
class ErrorResponseSerializer(serializers.Serializer):
"""error message"""
error = serializers.CharField()
class PaginationSerializer(serializers.Serializer):
"""serialize paginate response"""
page_size = serializers.IntegerField()
page_from = serializers.IntegerField()
prev_pages = serializers.ListField(
child=serializers.IntegerField(), allow_null=True
)
current_page = serializers.IntegerField()
max_hits = serializers.BooleanField()
params = serializers.CharField()
last_page = serializers.IntegerField()
next_pages = serializers.ListField(
child=serializers.IntegerField(), allow_null=True
)
total_hits = serializers.IntegerField()
class AsyncTaskResponseSerializer(serializers.Serializer):
"""serialize new async task"""
message = serializers.CharField(required=False)
task_id = serializers.CharField()
status = serializers.CharField(required=False)
filename = serializers.CharField(required=False)
class NotificationSerializer(serializers.Serializer):
"""serialize notification messages"""
id = serializers.CharField()
title = serializers.CharField()
group = serializers.CharField()
api_start = serializers.BooleanField()
api_stop = serializers.BooleanField()
level = serializers.ChoiceField(choices=["info", "error"])
messages = serializers.ListField(child=serializers.CharField())
progress = serializers.FloatField(required=False)
command = serializers.ChoiceField(choices=["STOP", "KILL"], required=False)
class NotificationQueryFilterSerializer(serializers.Serializer):
"""serialize notification query filter"""
filter = serializers.ChoiceField(
choices=["download", "settings", "channel"], required=False
)
class PingUpdateSerializer(serializers.Serializer):
"""serialize update notification"""
status = serializers.BooleanField()
version = serializers.CharField()
is_breaking = serializers.BooleanField()
class PingSerializer(serializers.Serializer):
"""serialize ping response"""
response = serializers.ChoiceField(choices=["pong"])
user = serializers.IntegerField()
version = serializers.CharField()
ta_update = PingUpdateSerializer(required=False)
class WatchedDataSerializer(serializers.Serializer):
"""mark as watched serializer"""
id = serializers.CharField()
is_watched = serializers.BooleanField()
class RefreshQuerySerializer(serializers.Serializer):
"""refresh query filtering"""
type = serializers.ChoiceField(
choices=["video", "channel", "playlist"], required=False
)
id = serializers.CharField(required=False)
class RefreshResponseSerializer(serializers.Serializer):
"""serialize refresh response"""
state = serializers.ChoiceField(
choices=["running", "queued", "empty", False]
)
total_queued = serializers.IntegerField()
in_queue_name = serializers.CharField(required=False)
class RefreshAddQuerySerializer(serializers.Serializer):
"""serialize add to refresh queue"""
extract_videos = serializers.BooleanField(required=False)
class RefreshAddDataSerializer(serializers.Serializer):
"""add to refresh queue serializer"""
video = serializers.ListField(
child=serializers.CharField(), required=False
)
channel = serializers.ListField(
child=serializers.CharField(), required=False
)
playlist = serializers.ListField(
child=serializers.CharField(), required=False
)

View File

@ -1,115 +0,0 @@
"""
Functionality:
- read and write application config backed by ES
- encapsulate persistence of application properties
"""
from os import environ
try:
from dotenv import load_dotenv
print("loading local dotenv")
load_dotenv(".env")
except ModuleNotFoundError:
pass
class EnvironmentSettings:
"""
Handle settings for the application that are driven from the environment.
These will not change when the user is using the application.
These settings are only provided only on startup.
"""
HOST_UID: int = int(environ.get("HOST_UID", False))
HOST_GID: int = int(environ.get("HOST_GID", False))
DISABLE_STATIC_AUTH: bool = bool(environ.get("DISABLE_STATIC_AUTH"))
TZ: str = str(environ.get("TZ", "UTC"))
TA_PORT: int = int(environ.get("TA_PORT", False))
TA_BACKEND_PORT: int = int(environ.get("TA_BACKEND_PORT", False))
TA_USERNAME: str = str(environ.get("TA_USERNAME"))
TA_PASSWORD: str = str(environ.get("TA_PASSWORD"))
# Application Paths
MEDIA_DIR: str = str(environ.get("TA_MEDIA_DIR", "/youtube"))
APP_DIR: str = str(environ.get("TA_APP_DIR", "/app"))
CACHE_DIR: str = str(environ.get("TA_CACHE_DIR", "/cache"))
# Redis
REDIS_CON: str = str(environ.get("REDIS_CON"))
REDIS_NAME_SPACE: str = str(environ.get("REDIS_NAME_SPACE", "ta:"))
# ElasticSearch
ES_URL: str = str(environ.get("ES_URL"))
ES_PASS: str = str(environ.get("ELASTIC_PASSWORD"))
ES_USER: str = str(environ.get("ELASTIC_USER", "elastic"))
ES_SNAPSHOT_DIR: str = str(
environ.get(
"ES_SNAPSHOT_DIR", "/usr/share/elasticsearch/data/snapshot"
)
)
ES_DISABLE_VERIFY_SSL: bool = bool(environ.get("ES_DISABLE_VERIFY_SSL"))
def get_cache_root(self):
"""get root for web server"""
if self.CACHE_DIR.startswith("/"):
return self.CACHE_DIR
return f"/{self.CACHE_DIR}"
def get_media_root(self):
"""get root for media folder"""
if self.MEDIA_DIR.startswith("/"):
return self.MEDIA_DIR
return f"/{self.MEDIA_DIR}"
def print_generic(self):
"""print generic env vars"""
print(
f"""
HOST_UID: {self.HOST_UID}
HOST_GID: {self.HOST_GID}
TZ: {self.TZ}
DISABLE_STATIC_AUTH: {self.DISABLE_STATIC_AUTH}
TA_PORT: {self.TA_PORT}
TA_BACKEND_PORT: {self.TA_BACKEND_PORT}
TA_USERNAME: {self.TA_USERNAME}
TA_PASSWORD: *****"""
)
def print_paths(self):
"""debug paths set"""
print(
f"""
MEDIA_DIR: {self.MEDIA_DIR}
APP_DIR: {self.APP_DIR}
CACHE_DIR: {self.CACHE_DIR}"""
)
def print_redis_conf(self):
"""debug redis conf paths"""
print(
f"""
REDIS_CON: {self.REDIS_CON}
REDIS_NAME_SPACE: {self.REDIS_NAME_SPACE}"""
)
def print_es_paths(self):
"""debug es conf"""
print(
f"""
ES_URL: {self.ES_URL}
ES_PASS: *****
ES_USER: {self.ES_USER}
ES_SNAPSHOT_DIR: {self.ES_SNAPSHOT_DIR}
ES_DISABLE_VERIFY_SSL: {self.ES_DISABLE_VERIFY_SSL}"""
)
def print_all(self):
"""print all"""
self.print_generic()
self.print_paths()
self.print_redis_conf()
self.print_es_paths()

View File

@ -1,231 +0,0 @@
"""
functionality:
- wrapper around requests to call elastic search
- reusable search_after to extract total index
"""
# pylint: disable=missing-timeout
import json
from typing import Any
import requests
import urllib3
from common.src.env_settings import EnvironmentSettings
class ElasticWrap:
"""makes all calls to elastic search
returns response json and status code tuple
"""
def __init__(self, path: str):
self.url: str = f"{EnvironmentSettings.ES_URL}/{path}"
self.auth: tuple[str, str] = (
EnvironmentSettings.ES_USER,
EnvironmentSettings.ES_PASS,
)
if EnvironmentSettings.ES_DISABLE_VERIFY_SSL:
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
def get(
self,
data: bool | dict = False,
timeout: int = 10,
print_error: bool = True,
) -> tuple[dict, int]:
"""get data from es"""
kwargs: dict[str, Any] = {
"auth": self.auth,
"timeout": timeout,
}
if EnvironmentSettings.ES_DISABLE_VERIFY_SSL:
kwargs["verify"] = False
if data:
kwargs["json"] = data
response = requests.get(self.url, **kwargs)
if print_error and not response.ok:
print(response.text)
return response.json(), response.status_code
def post(
self, data: bool | dict = False, ndjson: bool = False
) -> tuple[dict, int]:
"""post data to es"""
kwargs: dict[str, Any] = {"auth": self.auth}
if ndjson and data:
kwargs.update(
{
"headers": {"Content-type": "application/x-ndjson"},
"data": data,
}
)
elif data:
kwargs.update(
{
"headers": {"Content-type": "application/json"},
"data": json.dumps(data),
}
)
if EnvironmentSettings.ES_DISABLE_VERIFY_SSL:
kwargs["verify"] = False
response = requests.post(self.url, **kwargs)
if not response.ok:
print(response.text)
return response.json(), response.status_code
def put(
self,
data: bool | dict = False,
refresh: bool = False,
) -> tuple[dict, Any]:
"""put data to es"""
if refresh:
self.url = f"{self.url}/?refresh=true"
kwargs: dict[str, Any] = {
"json": data,
"auth": self.auth,
}
if EnvironmentSettings.ES_DISABLE_VERIFY_SSL:
kwargs["verify"] = False
response = requests.put(self.url, **kwargs)
if not response.ok:
print(response.text)
print(data)
raise ValueError("failed to add item to index")
return response.json(), response.status_code
def delete(
self,
data: bool | dict = False,
refresh: bool = False,
) -> tuple[dict, Any]:
"""delete document from es"""
if refresh:
self.url = f"{self.url}/?refresh=true"
kwargs: dict[str, Any] = {"auth": self.auth}
if data:
kwargs["json"] = data
if EnvironmentSettings.ES_DISABLE_VERIFY_SSL:
kwargs["verify"] = False
response = requests.delete(self.url, **kwargs)
if not response.ok:
print(response.text)
return response.json(), response.status_code
class IndexPaginate:
"""use search_after to go through whole index
kwargs:
- size: int, overwrite DEFAULT_SIZE
- keep_source: bool, keep _source key from es results
- callback: obj, Class implementing run method callback for every loop
- task: task object to send notification
- total: int, total items in index for progress message
"""
DEFAULT_SIZE = 500
def __init__(self, index_name, data, **kwargs):
self.index_name = index_name
self.data = data
self.pit_id = False
self.kwargs = kwargs
def get_results(self):
"""get all results, add task and total for notifications"""
self.get_pit()
self.validate_data()
all_results = self.run_loop()
self.clean_pit()
return all_results
def get_pit(self):
"""get pit for index"""
path = f"{self.index_name}/_pit?keep_alive=10m"
response, _ = ElasticWrap(path).post()
self.pit_id = response["id"]
def validate_data(self):
"""add pit and size to data"""
if not self.data:
self.data = {}
if "query" not in self.data.keys():
self.data.update({"query": {"match_all": {}}})
if "sort" not in self.data.keys():
self.data.update({"sort": [{"_doc": {"order": "desc"}}]})
self.data["size"] = self.kwargs.get("size") or self.DEFAULT_SIZE
self.data["pit"] = {"id": self.pit_id, "keep_alive": "10m"}
def run_loop(self):
"""loop through results until last hit"""
all_results = []
counter = 0
while True:
response, _ = ElasticWrap("_search").get(data=self.data)
all_hits = response["hits"]["hits"]
if not all_hits:
break
for hit in all_hits:
if self.kwargs.get("keep_source"):
all_results.append(hit)
else:
all_results.append(hit["_source"])
if self.kwargs.get("callback"):
self.kwargs.get("callback")(
all_hits, self.index_name, counter=counter
).run()
if self.kwargs.get("task"):
print(f"{self.index_name}: processing page {counter}")
self._notify(len(all_results))
counter += 1
# update search_after with last hit data
self.data["search_after"] = all_hits[-1]["sort"]
return all_results
def _notify(self, processed):
"""send notification on task"""
total = self.kwargs.get("total")
progress = processed / total
index_clean = self.index_name.lstrip("ta_").title()
message = [f"Processing {index_clean}s {processed}/{total}"]
self.kwargs.get("task").send_progress(message, progress=progress)
def clean_pit(self):
"""delete pit from elastic search"""
ElasticWrap("_pit").delete(data={"id": self.pit_id})

View File

@ -1,300 +0,0 @@
"""
Loose collection of helper functions
- don't import AppConfig class here to avoid circular imports
"""
import json
import os
import random
import string
import subprocess
from datetime import datetime, timezone
from time import sleep
from typing import Any
from urllib.parse import urlparse
import requests
from common.src.es_connect import IndexPaginate
def ignore_filelist(filelist: list[str]) -> list[str]:
"""ignore temp files for os.listdir sanitizer"""
to_ignore = [
"@eaDir",
"Icon\r\r",
"Network Trash Folder",
"Temporary Items",
]
cleaned: list[str] = []
for file_name in filelist:
if file_name.startswith(".") or file_name in to_ignore:
continue
cleaned.append(file_name)
return cleaned
def randomizor(length: int) -> str:
"""generate random alpha numeric string"""
pool: str = string.digits + string.ascii_letters
return "".join(random.choice(pool) for i in range(length))
def rand_sleep(config) -> None:
"""randomized sleep based on config"""
sleep_config = config["downloads"].get("sleep_interval")
if not sleep_config:
return
secs = random.randrange(int(sleep_config * 0.5), int(sleep_config * 1.5))
sleep(secs)
def requests_headers() -> dict[str, str]:
"""build header with random user agent for requests outside of yt-dlp"""
chrome_versions = (
"90.0.4430.212",
"90.0.4430.24",
"90.0.4430.70",
"90.0.4430.72",
"90.0.4430.85",
"90.0.4430.93",
"91.0.4472.101",
"91.0.4472.106",
"91.0.4472.114",
"91.0.4472.124",
"91.0.4472.164",
"91.0.4472.19",
"91.0.4472.77",
"92.0.4515.107",
"92.0.4515.115",
"92.0.4515.131",
"92.0.4515.159",
"92.0.4515.43",
"93.0.4556.0",
"93.0.4577.15",
"93.0.4577.63",
"93.0.4577.82",
"94.0.4606.41",
"94.0.4606.54",
"94.0.4606.61",
"94.0.4606.71",
"94.0.4606.81",
"94.0.4606.85",
"95.0.4638.17",
"95.0.4638.50",
"95.0.4638.54",
"95.0.4638.69",
"95.0.4638.74",
"96.0.4664.18",
"96.0.4664.45",
"96.0.4664.55",
"96.0.4664.93",
"97.0.4692.20",
)
template = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
+ "AppleWebKit/537.36 (KHTML, like Gecko) "
+ f"Chrome/{random.choice(chrome_versions)} Safari/537.36"
)
return {"User-Agent": template}
def date_parser(timestamp: int | str) -> str:
"""return formatted date string"""
if isinstance(timestamp, int):
date_obj = datetime.fromtimestamp(timestamp, tz=timezone.utc)
elif isinstance(timestamp, str):
date_obj = datetime.strptime(timestamp, "%Y-%m-%d")
date_obj = date_obj.replace(tzinfo=timezone.utc)
else:
raise TypeError(f"invalid timestamp: {timestamp}")
return date_obj.isoformat()
def time_parser(timestamp: str) -> float:
"""return seconds from timestamp, false on empty"""
if not timestamp:
return False
if timestamp.isnumeric():
return int(timestamp)
hours, minutes, seconds = timestamp.split(":", maxsplit=3)
return int(hours) * 60 * 60 + int(minutes) * 60 + float(seconds)
def clear_dl_cache(cache_dir: str) -> int:
"""clear leftover files from dl cache"""
print("clear download cache")
download_cache_dir = os.path.join(cache_dir, "download")
leftover_files = ignore_filelist(os.listdir(download_cache_dir))
for cached in leftover_files:
to_delete = os.path.join(download_cache_dir, cached)
os.remove(to_delete)
return len(leftover_files)
def get_mapping() -> dict:
"""read index_mapping.json and get expected mapping and settings"""
with open("appsettings/index_mapping.json", "r", encoding="utf-8") as f:
index_config: dict = json.load(f).get("index_config")
return index_config
def is_shorts(youtube_id: str) -> bool:
"""check if youtube_id is a shorts video, bot not it it's not a shorts"""
shorts_url = f"https://www.youtube.com/shorts/{youtube_id}"
cookies = {"SOCS": "CAI"}
response = requests.head(
shorts_url, cookies=cookies, headers=requests_headers(), timeout=10
)
return response.status_code == 200
def get_duration_sec(file_path: str) -> int:
"""get duration of media file from file path"""
duration = subprocess.run(
[
"ffprobe",
"-v",
"error",
"-show_entries",
"format=duration",
"-of",
"default=noprint_wrappers=1:nokey=1",
file_path,
],
capture_output=True,
check=True,
)
duration_raw = duration.stdout.decode().strip()
if duration_raw == "N/A":
return 0
duration_sec = int(float(duration_raw))
return duration_sec
def get_duration_str(seconds: int) -> str:
"""Return a human-readable duration string from seconds."""
if not seconds:
return "NA"
units = [("y", 31536000), ("d", 86400), ("h", 3600), ("m", 60), ("s", 1)]
duration_parts = []
for unit_label, unit_seconds in units:
if seconds >= unit_seconds:
unit_count, seconds = divmod(seconds, unit_seconds)
duration_parts.append(f"{unit_count:02}{unit_label}")
duration_parts[0] = duration_parts[0].lstrip("0")
return " ".join(duration_parts)
def ta_host_parser(ta_host: str) -> tuple[list[str], list[str]]:
"""parse ta_host env var for ALLOWED_HOSTS and CSRF_TRUSTED_ORIGINS"""
allowed_hosts: list[str] = [
"localhost",
"tubearchivist",
]
csrf_trusted_origins: list[str] = [
"http://localhost",
"http://tubearchivist",
]
for host in ta_host.split():
host_clean = host.strip()
if not host_clean.startswith("http"):
host_clean = f"http://{host_clean}"
parsed = urlparse(host_clean)
allowed_hosts.append(f"{parsed.hostname}")
cors_url = f"{parsed.scheme}://{parsed.hostname}"
if parsed.port:
cors_url = f"{cors_url}:{parsed.port}"
csrf_trusted_origins.append(cors_url)
return allowed_hosts, csrf_trusted_origins
def get_stylesheets() -> list:
"""Get all valid stylesheets from /static/css"""
stylesheets = [
"dark.css",
"light.css",
"matrix.css",
"midnight.css",
"custom.css",
]
return stylesheets
def check_stylesheet(stylesheet: str):
"""Check if a stylesheet exists. Return dark.css as a fallback"""
if stylesheet in get_stylesheets():
return stylesheet
return "dark.css"
def is_missing(
to_check: str | list[str],
index_name: str = "ta_video,ta_download",
on_key: str = "youtube_id",
) -> list[str]:
"""id or list of ids that are missing from index_name"""
if isinstance(to_check, str):
to_check = [to_check]
data = {
"query": {"terms": {on_key: to_check}},
"_source": [on_key],
}
result = IndexPaginate(index_name, data=data).get_results()
existing_ids = [i[on_key] for i in result]
dl = [i for i in to_check if i not in existing_ids]
return dl
def get_channel_overwrites() -> dict[str, dict[str, Any]]:
"""get overwrites indexed my channel_id"""
data = {
"query": {
"bool": {"must": [{"exists": {"field": "channel_overwrites"}}]}
},
"_source": ["channel_id", "channel_overwrites"],
}
result = IndexPaginate("ta_channel", data).get_results()
overwrites = {i["channel_id"]: i["channel_overwrites"] for i in result}
return overwrites
def calc_is_watched(duration: float, position: float) -> bool:
"""considered watched based on duration position"""
if not duration or duration <= 0:
return False
if duration < 60:
threshold = 0.5
elif duration > 900:
threshold = 1 - (180 / duration)
else:
threshold = 0.9
return position >= duration * threshold

View File

@ -1,219 +0,0 @@
"""
Functionality:
- processing search results for frontend
- this is duplicated code from home.src.frontend.searching.SearchHandler
"""
import urllib.parse
from common.src.env_settings import EnvironmentSettings
from common.src.helper import date_parser, get_duration_str
from common.src.ta_redis import RedisArchivist
from download.src.thumbnails import ThumbManager
class SearchProcess:
"""process search results"""
def __init__(self, response, match_video_user_progress: None | int = None):
self.response = response
self.processed = False
self.position_index = self.get_user_progress(match_video_user_progress)
def process(self):
"""detect type and process"""
if "_source" in self.response.keys():
# single
self.processed = self._process_result(self.response)
elif "hits" in self.response.keys():
# multiple
self.processed = []
all_sources = self.response["hits"]["hits"]
for result in all_sources:
self.processed.append(self._process_result(result))
return self.processed
def get_user_progress(self, match_video_user_progress) -> dict | None:
"""get user video watch progress"""
if not match_video_user_progress:
return None
query = f"{match_video_user_progress}:progress:*"
all_positions = RedisArchivist().list_items(query)
if not all_positions:
return None
pos_index = {
i["youtube_id"]: i["position"]
for i in all_positions
if not i.get("watched")
}
return pos_index
def _process_result(self, result):
"""detect which type of data to process"""
index = result["_index"]
processed = False
if index == "ta_video":
processed = self._process_video(result["_source"])
if index == "ta_channel":
processed = self._process_channel(result["_source"])
if index == "ta_playlist":
processed = self._process_playlist(result["_source"])
if index == "ta_download":
processed = self._process_download(result["_source"])
if index == "ta_comment":
processed = self._process_comment(result["_source"])
if index == "ta_subtitle":
processed = self._process_subtitle(result)
if isinstance(processed, dict):
processed.update(
{
"_index": index,
"_score": round(result.get("_score") or 0, 2),
}
)
return processed
@staticmethod
def _process_channel(channel_dict):
"""run on single channel"""
channel_id = channel_dict["channel_id"]
cache_root = EnvironmentSettings().get_cache_root()
art_base = f"{cache_root}/channels/{channel_id}"
date_str = date_parser(channel_dict["channel_last_refresh"])
channel_dict.update(
{
"channel_last_refresh": date_str,
"channel_banner_url": f"{art_base}_banner.jpg",
"channel_thumb_url": f"{art_base}_thumb.jpg",
"channel_tvart_url": f"{art_base}_tvart.jpg",
}
)
return dict(sorted(channel_dict.items()))
def _process_video(self, video_dict):
"""run on single video dict"""
video_id = video_dict["youtube_id"]
media_url = urllib.parse.quote(video_dict["media_url"])
vid_last_refresh = date_parser(video_dict["vid_last_refresh"])
published = date_parser(video_dict["published"])
vid_thumb_url = ThumbManager(video_id).vid_thumb_path()
channel = self._process_channel(video_dict["channel"])
cache_root = EnvironmentSettings().get_cache_root()
media_root = EnvironmentSettings().get_media_root()
if "subtitles" in video_dict:
for idx, _ in enumerate(video_dict["subtitles"]):
url = video_dict["subtitles"][idx]["media_url"]
video_dict["subtitles"][idx][
"media_url"
] = f"{media_root}/{url}"
else:
video_dict["subtitles"] = []
video_dict.update(
{
"channel": channel,
"media_url": f"{media_root}/{media_url}",
"vid_last_refresh": vid_last_refresh,
"published": published,
"vid_thumb_url": f"{cache_root}/{vid_thumb_url}",
}
)
if self.position_index:
player_position = self.position_index.get(video_id)
total = video_dict["player"].get("duration")
if player_position and total:
progress = 100 * (player_position / total)
video_dict["player"].update(
{
"progress": progress,
"position": player_position,
}
)
if "playlist" not in video_dict:
video_dict["playlist"] = []
return dict(sorted(video_dict.items()))
@staticmethod
def _process_playlist(playlist_dict):
"""run on single playlist dict"""
playlist_id = playlist_dict["playlist_id"]
playlist_last_refresh = date_parser(
playlist_dict["playlist_last_refresh"]
)
cache_root = EnvironmentSettings().get_cache_root()
playlist_thumbnail = f"{cache_root}/playlists/{playlist_id}.jpg"
playlist_dict.update(
{
"playlist_thumbnail": playlist_thumbnail,
"playlist_last_refresh": playlist_last_refresh,
}
)
return dict(sorted(playlist_dict.items()))
def _process_download(self, download_dict):
"""run on single download item"""
video_id = download_dict["youtube_id"]
cache_root = EnvironmentSettings().get_cache_root()
vid_thumb_url = ThumbManager(video_id).vid_thumb_path()
published = date_parser(download_dict["published"])
download_dict.update(
{
"vid_thumb_url": f"{cache_root}/{vid_thumb_url}",
"published": published,
}
)
return dict(sorted(download_dict.items()))
def _process_comment(self, comment_dict):
"""run on all comments, create reply thread"""
all_comments = comment_dict["comment_comments"]
processed_comments = []
for comment in all_comments:
if comment["comment_parent"] == "root":
comment.update({"comment_replies": []})
processed_comments.append(comment)
else:
processed_comments[-1]["comment_replies"].append(comment)
return processed_comments
def _process_subtitle(self, result):
"""take complete result dict to extract highlight"""
subtitle_dict = result["_source"]
highlight = result.get("highlight")
if highlight:
# replace lines with the highlighted markdown
subtitle_line = highlight.get("subtitle_line")[0]
subtitle_dict.update({"subtitle_line": subtitle_line})
thumb_path = ThumbManager(subtitle_dict["youtube_id"]).vid_thumb_path()
subtitle_dict.update({"vid_thumb_url": f"/cache/{thumb_path}"})
return subtitle_dict
def process_aggs(response):
"""convert aggs duration to str"""
if response.get("aggregations"):
aggs = response["aggregations"]
if "total_duration" in aggs:
duration_sec = int(aggs["total_duration"]["value"])
aggs["total_duration"].update(
{"value_str": get_duration_str(duration_sec)}
)

View File

@ -1,258 +0,0 @@
"""
functionality:
- interact with redis
- hold temporary download queue in redis
- interact with celery tasks results
"""
import json
import redis
from common.src.env_settings import EnvironmentSettings
class RedisBase:
"""connection base for redis"""
NAME_SPACE: str = EnvironmentSettings.REDIS_NAME_SPACE
def __init__(self):
self.conn = redis.from_url(
url=EnvironmentSettings.REDIS_CON, decode_responses=True
)
class RedisArchivist(RedisBase):
"""collection of methods to interact with redis"""
CHANNELS: list[str] = [
"download",
"add",
"rescan",
"subchannel",
"subplaylist",
"playlistscan",
"setting",
]
def set_message(
self,
key: str,
message: dict | str,
expire: bool | int = False,
save: bool = False,
) -> None:
"""write new message to redis"""
to_write = (
json.dumps(message) if isinstance(message, dict) else message
)
self.conn.execute_command("SET", self.NAME_SPACE + key, to_write)
if expire:
if isinstance(expire, bool):
secs: int = 20
else:
secs = expire
self.conn.execute_command("EXPIRE", self.NAME_SPACE + key, secs)
if save:
self.bg_save()
def bg_save(self) -> None:
"""save to aof"""
try:
self.conn.bgsave()
except redis.exceptions.ResponseError:
pass
def get_message_str(self, key: str) -> str | None:
"""get message string"""
reply = self.conn.execute_command("GET", self.NAME_SPACE + key)
return reply
def get_message_dict(self, key: str) -> dict:
"""get message dict"""
reply = self.conn.execute_command("GET", self.NAME_SPACE + key)
if not reply:
return {}
return json.loads(reply)
def get_message(self, key: str) -> dict | None:
"""
get message dict from redis
old json get message, only used for migration, to be removed later
"""
reply = self.conn.execute_command("JSON.GET", self.NAME_SPACE + key)
if reply:
return json.loads(reply)
return {"status": False}
def list_keys(self, query: str) -> list:
"""return all key matches"""
reply = self.conn.execute_command(
"KEYS", self.NAME_SPACE + query + "*"
)
if not reply:
return []
return [i.lstrip(self.NAME_SPACE) for i in reply]
def list_items(self, query: str) -> list:
"""list all matches"""
all_matches = self.list_keys(query)
if not all_matches:
return []
return [self.get_message_dict(i) for i in all_matches]
def del_message(self, key: str, save: bool = False) -> bool:
"""delete key from redis"""
response = self.conn.execute_command("DEL", self.NAME_SPACE + key)
if save:
self.bg_save()
return response
class RedisQueue(RedisBase):
"""
dynamically interact with queues in redis using sorted set
- low score number is first in queue
- add new items with high score number
queue names in use:
download:channel channels during download
download:playlist:full playlists during dl for full refresh
download:playlist:quick playlists during dl for quick refresh
download:video videos during downloads
index:comment videos needing comment indexing
reindex:ta_video reindex videos
reindex:ta_channel reindex channels
reindex:ta_playlist reindex playlists
"""
def __init__(self, queue_name: str):
super().__init__()
self.key = f"{self.NAME_SPACE}{queue_name}"
def get_all(self) -> list[str]:
"""return all elements in list"""
result = self.conn.zrange(self.key, 0, -1)
return result
def length(self) -> int:
"""return total elements in list"""
return self.conn.zcard(self.key)
def in_queue(self, element) -> str | bool:
"""check if element is in list"""
result = self.conn.zrank(self.key, element)
if result is not None:
return "in_queue"
return False
def add(self, to_add: str) -> None:
"""add single item to queue"""
if not to_add:
return
next_score = self._get_next_score()
self.conn.zadd(self.key, {to_add: next_score})
def add_list(self, to_add: list) -> None:
"""add list to queue"""
if not to_add:
return
next_score = self._get_next_score()
mapping = {i[1]: next_score + i[0] for i in enumerate(to_add)}
self.conn.zadd(self.key, mapping)
def max_score(self) -> int | None:
"""get max score"""
last = self.conn.zrange(self.key, -1, -1, withscores=True)
if not last:
return None
return int(last[0][1])
def _get_next_score(self) -> float:
"""get next score in queue to append"""
last = self.conn.zrange(self.key, -1, -1, withscores=True)
if not last:
return 1.0
return last[0][1] + 1
def get_next(self) -> tuple[str | None, int | None]:
"""return next element in the queue, if available"""
result = self.conn.zpopmin(self.key)
if not result:
return None, None
item, idx = result[0][0], int(result[0][1])
return item, idx
def clear(self) -> None:
"""delete list from redis"""
self.conn.delete(self.key)
class TaskRedis(RedisBase):
"""interact with redis tasks"""
BASE: str = "celery-task-meta-"
EXPIRE: int = 60 * 60 * 24
COMMANDS: list[str] = ["STOP", "KILL"]
def get_all(self) -> list:
"""return all tasks"""
all_keys = self.conn.execute_command("KEYS", f"{self.BASE}*")
return [i.replace(self.BASE, "") for i in all_keys]
def get_single(self, task_id: str) -> dict:
"""return content of single task"""
result = self.conn.execute_command("GET", self.BASE + task_id)
if not result:
return {}
return json.loads(result)
def set_key(
self, task_id: str, message: dict, expire: bool | int = False
) -> None:
"""set value for lock, initial or update"""
key: str = f"{self.BASE}{task_id}"
self.conn.execute_command("SET", key, json.dumps(message))
if expire:
self.conn.execute_command("EXPIRE", key, self.EXPIRE)
def set_command(self, task_id: str, command: str) -> None:
"""set task command"""
if command not in self.COMMANDS:
print(f"{command} not in valid commands {self.COMMANDS}")
raise ValueError
message = self.get_single(task_id)
if not message:
print(f"{task_id} not found")
raise KeyError
message.update({"command": command})
self.set_key(task_id, message)
def del_task(self, task_id: str) -> None:
"""delete task result by id"""
self.conn.execute_command("DEL", f"{self.BASE}{task_id}")
def del_all(self) -> None:
"""delete all task results"""
all_tasks = self.get_all()
for task_id in all_tasks:
self.del_task(task_id)

View File

@ -1,192 +0,0 @@
"""
Functionality:
- detect valid youtube ids and links from multi line string
- identify vid_type if possible
"""
from urllib.parse import parse_qs, urlparse
from common.src.ta_redis import RedisArchivist
from download.src.yt_dlp_base import YtWrap
from video.src.constants import VideoTypeEnum
class Parser:
"""
take a multi line string and detect valid youtube ids
channel handle lookup is cached, can be disabled for unittests
"""
def __init__(self, url_str, use_cache=True):
self.url_list = [i.strip() for i in url_str.split()]
self.use_cache = use_cache
def parse(self):
"""parse the list"""
ids = []
for url in self.url_list:
parsed = urlparse(url)
if parsed.netloc:
# is url
identified = self.process_url(parsed)
else:
# is not url
identified = self._find_valid_id(url)
if "vid_type" not in identified:
identified.update(self._detect_vid_type(parsed.path))
ids.append(identified)
return ids
def process_url(self, parsed):
"""process as url"""
if parsed.netloc == "youtu.be":
# shortened
youtube_id = parsed.path.strip("/")
return self._validate_expected(youtube_id, "video")
if "youtube.com" not in parsed.netloc:
message = f"invalid domain: {parsed.netloc}"
raise ValueError(message)
query_parsed = parse_qs(parsed.query)
if "v" in query_parsed:
# video from v query str
youtube_id = query_parsed["v"][0]
return self._validate_expected(youtube_id, "video")
if "list" in query_parsed:
# playlist from list query str
youtube_id = query_parsed["list"][0]
return self._validate_expected(youtube_id, "playlist")
all_paths = parsed.path.strip("/").split("/")
if all_paths[0] == "shorts":
# is shorts video
item = self._validate_expected(all_paths[1], "video")
item.update({"vid_type": VideoTypeEnum.SHORTS.value})
return item
if all_paths[0] == "channel":
return self._validate_expected(all_paths[1], "channel")
if all_paths[0] == "live":
return self._validate_expected(all_paths[1], "video")
# detect channel
channel_id = self._extract_channel_name(parsed.geturl())
return {"type": "channel", "url": channel_id}
def _validate_expected(self, youtube_id, expected_type):
"""raise value error if not matching"""
matched = self._find_valid_id(youtube_id)
if matched["type"] != expected_type:
raise ValueError(
f"{youtube_id} not of expected type {expected_type}"
)
return {"type": expected_type, "url": youtube_id}
def _find_valid_id(self, id_str):
"""detect valid id from length of string"""
if id_str in ("LL", "WL"):
return {"type": "playlist", "url": id_str}
if id_str.startswith("@"):
url = f"https://www.youtube.com/{id_str}"
channel_id = self._extract_channel_name(url)
return {"type": "channel", "url": channel_id}
len_id_str = len(id_str)
if len_id_str == 11:
item_type = "video"
elif len_id_str == 24:
item_type = "channel"
elif len_id_str in (34, 26, 18) or id_str.startswith("TA_playlist_"):
item_type = "playlist"
else:
raise ValueError(f"not a valid id_str: {id_str}")
return {"type": item_type, "url": id_str}
def _extract_channel_name(self, url):
"""find channel id from channel name with yt-dlp help, cache result"""
if self.use_cache:
cached = self._get_cached(url)
if cached:
return cached
obs_request = {
"check_formats": None,
"skip_download": True,
"extract_flat": True,
"playlistend": 0,
}
url_info = YtWrap(obs_request).extract(url)
if not url_info:
raise ValueError(f"failed to retrieve content from URL: {url}")
channel_id = url_info.get("channel_id", False)
if channel_id:
if self.use_cache:
self._set_cache(url, channel_id)
return channel_id
url = url_info.get("url", False)
if url:
# handle old channel name redirect with url path split
channel_id = urlparse(url).path.strip("/").split("/")[1]
return channel_id
print(f"failed to extract channel id from {url}")
raise ValueError
@staticmethod
def _get_cached(url) -> str | None:
"""get cached channel ID, if available"""
path = urlparse(url).path.lstrip("/")
if not path.startswith("@"):
return None
handle = path.split("/")[0]
if not handle:
return None
cache_key = f"channel:handlesearch:{handle.lower()}"
cached = RedisArchivist().get_message_dict(cache_key)
if cached:
return cached["channel_id"]
return None
@staticmethod
def _set_cache(url, channel_id) -> None:
"""set cache"""
path = urlparse(url).path.lstrip("/")
if not path.startswith("@"):
return
handle = path.split("/")[0]
if not handle:
return
cache_key = f"channel:handlesearch:{handle.lower()}"
message = {
"channel_id": channel_id,
"handle": handle,
}
RedisArchivist().set_message(cache_key, message, expire=3600 * 24 * 7)
def _detect_vid_type(self, path):
"""try to match enum from path, needs to be serializable"""
last = path.strip("/").split("/")[-1]
try:
vid_type = VideoTypeEnum(last).value
except ValueError:
vid_type = VideoTypeEnum.UNKNOWN.value
return {"vid_type": vid_type}

View File

@ -1,104 +0,0 @@
"""
functionality:
- handle watched state for videos, channels and playlists
"""
from datetime import datetime
from common.src.es_connect import ElasticWrap
from common.src.ta_redis import RedisArchivist
from common.src.urlparser import Parser
class WatchState:
"""handle watched checkbox for videos and channels"""
def __init__(self, youtube_id: str, is_watched: bool, user_id: int):
self.youtube_id = youtube_id
self.is_watched = is_watched
self.user_id = user_id
self.stamp = int(datetime.now().timestamp())
self.pipeline = f"_ingest/pipeline/watch_{youtube_id}"
def change(self):
"""change watched state of item(s)"""
print(f"{self.youtube_id}: change watched state to {self.is_watched}")
url_type = self._dedect_type()
if url_type == "video":
self.change_vid_state()
return
self._add_pipeline()
path = f"ta_video/_update_by_query?pipeline=watch_{self.youtube_id}"
data = self._build_update_data(url_type)
_, _ = ElasticWrap(path).post(data)
self._delete_pipeline()
def _dedect_type(self):
"""find youtube id type"""
url_process = Parser(self.youtube_id).parse()
url_type = url_process[0]["type"]
return url_type
def change_vid_state(self):
"""change watched state of video"""
path = f"ta_video/_update/{self.youtube_id}"
data = {"doc": {"player": {"watched": self.is_watched}}}
if self.is_watched:
data["doc"]["player"]["watched_date"] = self.stamp
response, status_code = ElasticWrap(path).post(data=data)
key = f"{self.user_id}:progress:{self.youtube_id}"
RedisArchivist().del_message(key)
if status_code != 200:
print(response)
raise ValueError("failed to mark video as watched")
def _build_update_data(self, url_type):
"""build update by query data based on url_type"""
term_key_map = {
"channel": "channel.channel_id",
"playlist": "playlist.keyword",
}
term_key = term_key_map.get(url_type)
return {
"query": {
"bool": {
"must": [
{"term": {term_key: {"value": self.youtube_id}}},
{
"term": {
"player.watched": {
"value": not self.is_watched
}
}
},
],
}
}
}
def _add_pipeline(self):
"""add ingest pipeline"""
data = {
"description": f"{self.youtube_id}: watched {self.is_watched}",
"processors": [
{
"set": {
"field": "player.watched",
"value": self.is_watched,
}
},
{
"set": {
"field": "player.watched_date",
"value": self.stamp,
}
},
],
}
_, _ = ElasticWrap(self.pipeline).put(data)
def _delete_pipeline(self):
"""delete pipeline"""
ElasticWrap(self.pipeline).delete()

View File

@ -1,11 +0,0 @@
"""test configs"""
import os
import pytest
@pytest.fixture(scope="session", autouse=True)
def change_test_dir(request):
"""change directory to project folder"""
os.chdir(request.config.rootdir / "backend")

View File

@ -1,113 +0,0 @@
"""tests for helper functions"""
import pytest
from common.src.helper import (
date_parser,
get_duration_str,
get_mapping,
is_shorts,
randomizor,
time_parser,
)
def test_randomizor_with_positive_length():
"""test randomizer"""
length = 10
result = randomizor(length)
assert len(result) == length
assert result.isalnum()
def test_date_parser_with_int():
"""unix timestamp"""
timestamp = 1621539600
expected_date = "2021-05-20T19:40:00+00:00"
assert date_parser(timestamp) == expected_date
def test_date_parser_with_str():
"""iso timestamp"""
date_str = "2021-05-21"
expected_date = "2021-05-21T00:00:00+00:00"
assert date_parser(date_str) == expected_date
def test_date_parser_with_invalid_input():
"""invalid type"""
invalid_input = [1621539600]
with pytest.raises(TypeError):
date_parser(invalid_input)
def test_date_parser_with_invalid_string_format():
"""invalid date string"""
invalid_date_str = "21/05/2021"
with pytest.raises(ValueError):
date_parser(invalid_date_str)
def test_time_parser_with_numeric_string():
"""as number"""
timestamp = "100"
expected_seconds = 100
assert time_parser(timestamp) == expected_seconds
def test_time_parser_with_hh_mm_ss_format():
"""to seconds"""
timestamp = "01:00:00"
expected_seconds = 3600.0
assert time_parser(timestamp) == expected_seconds
def test_time_parser_with_empty_string():
"""handle empty"""
timestamp = ""
assert time_parser(timestamp) is False
def test_time_parser_with_invalid_format():
"""not enough to unpack"""
timestamp = "01:00"
with pytest.raises(ValueError):
time_parser(timestamp)
def test_time_parser_with_non_numeric_input():
"""non numeric"""
timestamp = "1a:00:00"
with pytest.raises(ValueError):
time_parser(timestamp)
def test_get_mapping():
"""test mappint"""
index_config = get_mapping()
assert isinstance(index_config, list)
assert all(isinstance(i, dict) for i in index_config)
def test_is_shorts():
"""is shorts id"""
youtube_id = "YG3-Pw3rixU"
assert is_shorts(youtube_id)
def test_is_not_shorts():
"""is not shorts id"""
youtube_id = "Ogr9kbypSNg"
assert is_shorts(youtube_id) is False
def test_get_duration_str():
"""only seconds"""
assert get_duration_str(None) == "NA"
assert get_duration_str(5) == "5s"
assert get_duration_str(10) == "10s"
assert get_duration_str(500) == "8m 20s"
assert get_duration_str(1000) == "16m 40s"
assert get_duration_str(5000) == "1h 23m 20s"
assert get_duration_str(500000) == "5d 18h 53m 20s"
assert get_duration_str(5000000) == "57d 20h 53m 20s"
assert get_duration_str(50000000) == "1y 213d 16h 53m 20s"

View File

@ -1,145 +0,0 @@
"""tests for url parser"""
import pytest
from common.src.urlparser import Parser
# video id parsing
VIDEO_URL_IN = [
"7DKv5H5Frt0",
"https://www.youtube.com/watch?v=7DKv5H5Frt0",
"https://www.youtube.com/watch?v=7DKv5H5Frt0&t=113&feature=shared",
"https://www.youtube.com/watch?v=7DKv5H5Frt0&list=PL96C35uN7xGJu6skU4TBYrIWxggkZBrF5&index=1&pp=iAQB" # noqa: E501
"https://youtu.be/7DKv5H5Frt0",
"https://www.youtube.com/live/7DKv5H5Frt0",
]
VIDEO_OUT = [{"type": "video", "url": "7DKv5H5Frt0", "vid_type": "unknown"}]
VIDEO_TEST_CASES = [(i, VIDEO_OUT) for i in VIDEO_URL_IN]
# shorts id parsing
SHORTS_URL_IN = [
"https://www.youtube.com/shorts/YG3-Pw3rixU",
"https://youtube.com/shorts/YG3-Pw3rixU?feature=shared",
]
SHORTS_OUT = [{"type": "video", "url": "YG3-Pw3rixU", "vid_type": "shorts"}]
SHORTS_TEST_CASES = [(i, SHORTS_OUT) for i in SHORTS_URL_IN]
# channel id parsing
CHANNEL_URL_IN = [
"UCBa659QWEk1AI4Tg--mrJ2A",
"@TomScottGo",
"https://www.youtube.com/channel/UCBa659QWEk1AI4Tg--mrJ2A",
"https://www.youtube.com/@TomScottGo",
]
CHANNEL_OUT = [
{
"type": "channel",
"url": "UCBa659QWEk1AI4Tg--mrJ2A",
"vid_type": "unknown",
}
]
CHANNEL_TEST_CASES = [(i, CHANNEL_OUT) for i in CHANNEL_URL_IN]
# channel vid type parsing
CHANNEL_VID_TYPES = [
(
"https://www.youtube.com/@IBRACORP/videos",
[
{
"type": "channel",
"url": "UC7aW7chIafJG6ECYAd3N5uQ",
"vid_type": "videos",
}
],
),
(
"https://www.youtube.com/@IBRACORP/shorts",
[
{
"type": "channel",
"url": "UC7aW7chIafJG6ECYAd3N5uQ",
"vid_type": "shorts",
}
],
),
(
"https://www.youtube.com/@IBRACORP/streams",
[
{
"type": "channel",
"url": "UC7aW7chIafJG6ECYAd3N5uQ",
"vid_type": "streams",
}
],
),
]
# playlist id parsing
PLAYLIST_URL_IN = [
"PL96C35uN7xGJu6skU4TBYrIWxggkZBrF5",
"https://www.youtube.com/playlist?list=PL96C35uN7xGJu6skU4TBYrIWxggkZBrF5",
]
PLAYLIST_OUT = [
{
"type": "playlist",
"url": "PL96C35uN7xGJu6skU4TBYrIWxggkZBrF5",
"vid_type": "unknown",
}
]
PLAYLIST_TEST_CASES = [(i, PLAYLIST_OUT) for i in PLAYLIST_URL_IN]
# personal playlists
EXPECTED_WL = [{"type": "playlist", "url": "WL", "vid_type": "unknown"}]
EXPECTED_LL = [{"type": "playlist", "url": "LL", "vid_type": "unknown"}]
PERSONAL_PLAYLISTS_TEST_CASES = [
("WL", EXPECTED_WL),
("https://www.youtube.com/playlist?list=WL", EXPECTED_WL),
("LL", EXPECTED_LL),
("https://www.youtube.com/playlist?list=LL", EXPECTED_LL),
]
# collect tests expected to pass
PASSTING_TESTS = []
PASSTING_TESTS.extend(VIDEO_TEST_CASES)
PASSTING_TESTS.extend(SHORTS_TEST_CASES)
PASSTING_TESTS.extend(CHANNEL_TEST_CASES)
PASSTING_TESTS.extend(CHANNEL_VID_TYPES)
PASSTING_TESTS.extend(PLAYLIST_TEST_CASES)
PASSTING_TESTS.extend(PERSONAL_PLAYLISTS_TEST_CASES)
@pytest.mark.parametrize("url_str, expected_result", PASSTING_TESTS)
def test_passing_parse(url_str, expected_result):
"""test parser"""
parser = Parser(url_str, use_cache=False)
parsed = parser.parse()
assert parsed == expected_result
INVALID_IDS_ERRORS = [
"aaaaa",
"https://www.youtube.com/playlist?list=AAAA",
"https://www.youtube.com/channel/UC9-y-6csu5WGm29I7Jiwpn",
"https://www.youtube.com/watch?v=CK3_zarXkw",
]
@pytest.mark.parametrize("invalid_value", INVALID_IDS_ERRORS)
def test_invalid_ids(invalid_value):
"""test for invalid IDs"""
with pytest.raises(ValueError, match="not a valid id_str"):
parser = Parser(invalid_value, use_cache=False)
parser.parse()
INVALID_DOMAINS = [
"https://vimeo.com/32001208",
"https://peertube.tv/w/8RiJE2j2nw569FVgPNjDt7",
]
@pytest.mark.parametrize("invalid_value", INVALID_DOMAINS)
def test_invalid_domains(invalid_value):
"""raise error on none YT domains"""
parser = Parser(invalid_value, use_cache=False)
with pytest.raises(ValueError, match="invalid domain"):
parser.parse()

View File

@ -1,33 +0,0 @@
"""all api urls"""
from common import views
from django.urls import path
urlpatterns = [
path("ping/", views.PingView.as_view(), name="ping"),
path(
"refresh/",
views.RefreshView.as_view(),
name="api-refresh",
),
path(
"watched/",
views.WatchedView.as_view(),
name="api-watched",
),
path(
"search/",
views.SearchView.as_view(),
name="api-search",
),
path(
"notification/",
views.NotificationView.as_view(),
name="api-notification",
),
path(
"health/",
views.HealthCheck.as_view(),
name="api-health",
),
]

View File

@ -1,210 +0,0 @@
"""all API views"""
from appsettings.src.config import ReleaseVersion
from appsettings.src.reindex import ReindexProgress
from common.serializers import (
AsyncTaskResponseSerializer,
ErrorResponseSerializer,
NotificationQueryFilterSerializer,
NotificationSerializer,
PingSerializer,
RefreshAddDataSerializer,
RefreshAddQuerySerializer,
RefreshQuerySerializer,
RefreshResponseSerializer,
WatchedDataSerializer,
)
from common.src.searching import SearchForm
from common.src.ta_redis import RedisArchivist
from common.src.watched import WatchState
from common.views_base import AdminOnly, ApiBaseView
from drf_spectacular.utils import OpenApiResponse, extend_schema
from rest_framework.response import Response
from rest_framework.views import APIView
from task.tasks import check_reindex
class PingView(ApiBaseView):
"""resolves to /api/ping/
GET: test your connection
"""
@staticmethod
@extend_schema(
responses={200: OpenApiResponse(PingSerializer())},
)
def get(request):
"""get pong"""
data = {
"response": "pong",
"user": request.user.id,
"version": ReleaseVersion().get_local_version(),
"ta_update": ReleaseVersion().get_update(),
}
serializer = PingSerializer(data)
return Response(serializer.data)
class RefreshView(ApiBaseView):
"""resolves to /api/refresh/
GET: get refresh progress
POST: start a manual refresh task
"""
permission_classes = [AdminOnly]
@extend_schema(
responses={
200: OpenApiResponse(RefreshResponseSerializer()),
400: OpenApiResponse(
ErrorResponseSerializer(), description="Bad request"
),
},
parameters=[RefreshQuerySerializer()],
)
def get(self, request):
"""get refresh status"""
query_serializer = RefreshQuerySerializer(data=request.query_params)
query_serializer.is_valid(raise_exception=True)
validated_query = query_serializer.validated_data
request_type = validated_query.get("type")
request_id = validated_query.get("id")
if request_id and not request_type:
error = ErrorResponseSerializer(
{"error": "specified id also needs type"}
)
return Response(error.data, status=400)
try:
progress = ReindexProgress(
request_type=request_type, request_id=request_id
).get_progress()
except ValueError:
error = ErrorResponseSerializer({"error": "bad request"})
return Response(error.data, status=400)
response_serializer = RefreshResponseSerializer(progress)
return Response(response_serializer.data)
@extend_schema(
request=RefreshAddDataSerializer(),
responses={
200: OpenApiResponse(AsyncTaskResponseSerializer()),
},
parameters=[RefreshAddQuerySerializer()],
)
def post(self, request):
"""add to reindex queue"""
query_serializer = RefreshAddQuerySerializer(data=request.query_params)
query_serializer.is_valid(raise_exception=True)
validated_query = query_serializer.validated_data
data_serializer = RefreshAddDataSerializer(data=request.data)
data_serializer.is_valid(raise_exception=True)
validated_data = data_serializer.validated_data
extract_videos = validated_query.get("extract_videos")
task = check_reindex.delay(
data=validated_data, extract_videos=extract_videos
)
message = {
"message": "reindex task started",
"task_id": task.id,
}
serializer = AsyncTaskResponseSerializer(message)
return Response(serializer.data)
class WatchedView(ApiBaseView):
"""resolves to /api/watched/
POST: change watched state of video, channel or playlist
"""
@extend_schema(
request=WatchedDataSerializer(),
responses={
200: OpenApiResponse(WatchedDataSerializer()),
400: OpenApiResponse(
ErrorResponseSerializer(), description="Bad request"
),
},
)
def post(self, request):
"""change watched state"""
data_serializer = WatchedDataSerializer(data=request.data)
data_serializer.is_valid(raise_exception=True)
validated_data = data_serializer.validated_data
youtube_id = validated_data.get("id")
is_watched = validated_data.get("is_watched")
if not youtube_id or is_watched is None:
error = ErrorResponseSerializer(
{"error": "missing id or is_watched"}
)
return Response(error.data, status=400)
WatchState(youtube_id, is_watched, request.user.id).change()
return Response(data_serializer.data)
class SearchView(ApiBaseView):
"""resolves to /api/search/
GET: run a search with the string in the ?query parameter
"""
@staticmethod
def get(request):
"""handle get request
search through all indexes"""
search_query = request.GET.get("query", None)
if search_query is None:
return Response(
{"message": "no search query specified"}, status=400
)
search_results = SearchForm().multi_search(search_query)
return Response(search_results)
class NotificationView(ApiBaseView):
"""resolves to /api/notification/
GET: returns a list of notifications
filter query to filter messages by group
"""
valid_filters = ["download", "settings", "channel"]
@extend_schema(
responses={
200: OpenApiResponse(NotificationSerializer(many=True)),
},
parameters=[NotificationQueryFilterSerializer],
)
def get(self, request):
"""get all notifications"""
query_serializer = NotificationQueryFilterSerializer(
data=request.query_params
)
query_serializer.is_valid(raise_exception=True)
validated_query = query_serializer.validated_data
filter_by = validated_query.get("filter")
query = "message"
if filter_by in self.valid_filters:
query = f"{query}:{filter_by}"
notifications = RedisArchivist().list_items(query)
response_serializer = NotificationSerializer(notifications, many=True)
return Response(response_serializer.data)
class HealthCheck(APIView):
"""health check view, no auth needed"""
def get(self, request):
"""health check, no auth needed"""
return Response("OK", status=200)

View File

@ -1,102 +0,0 @@
"""base classes to inherit from"""
from common.src.es_connect import ElasticWrap
from common.src.index_generic import Pagination
from common.src.search_processor import SearchProcess, process_aggs
from rest_framework import permissions
from rest_framework.authentication import (
SessionAuthentication,
TokenAuthentication,
)
from rest_framework.views import APIView
def check_admin(user):
"""check for admin permission for restricted views"""
return user.is_staff or user.groups.filter(name="admin").exists()
class AdminOnly(permissions.BasePermission):
"""allow only admin"""
def has_permission(self, request, view):
return check_admin(request.user)
class AdminWriteOnly(permissions.BasePermission):
"""allow only admin writes"""
def has_permission(self, request, view):
if request.method in permissions.SAFE_METHODS:
return permissions.IsAuthenticated().has_permission(request, view)
return check_admin(request.user)
class ApiBaseView(APIView):
"""base view to inherit from"""
authentication_classes = [SessionAuthentication, TokenAuthentication]
permission_classes = [permissions.IsAuthenticated]
search_base = ""
data = ""
def __init__(self):
super().__init__()
self.response = {}
self.data = {"query": {"match_all": {}}}
self.status_code = False
self.context = False
self.pagination_handler = False
def get_document(self, document_id, progress_match=None):
"""get single document from es"""
path = f"{self.search_base}{document_id}"
response, status_code = ElasticWrap(path).get()
try:
self.response = SearchProcess(
response, match_video_user_progress=progress_match
).process()
except KeyError:
print(f"item not found: {document_id}")
self.status_code = status_code
def initiate_pagination(self, request):
"""set initial pagination values"""
self.pagination_handler = Pagination(request)
self.data.update(
{
"size": self.pagination_handler.pagination["page_size"],
"from": self.pagination_handler.pagination["page_from"],
}
)
def get_document_list(self, request, pagination=True, progress_match=None):
"""get a list of results"""
if pagination:
self.initiate_pagination(request)
es_handler = ElasticWrap(self.search_base)
response, status_code = es_handler.get(data=self.data)
self.response["data"] = SearchProcess(
response, match_video_user_progress=progress_match
).process()
if self.response["data"]:
self.status_code = status_code
else:
self.status_code = 404
if pagination and response.get("hits"):
self.pagination_handler.validate(
response["hits"]["total"]["value"]
)
self.response["paginate"] = self.pagination_handler.pagination
def get_aggs(self):
"""get aggs alone"""
self.data["size"] = 0
response, _ = ElasticWrap(self.search_base).get(data=self.data)
process_aggs(response)
self.response = response.get("aggregations")

View File

@ -1,36 +0,0 @@
"""change user password"""
from django.contrib.auth import get_user_model
from django.core.management.base import BaseCommand, CommandError
User = get_user_model()
class Command(BaseCommand):
"""change password"""
help = "Change Password of user"
def add_arguments(self, parser):
parser.add_argument("username", type=str)
parser.add_argument("password", type=str)
def handle(self, *args, **kwargs):
"""entry point"""
username = kwargs["username"]
new_password = kwargs["password"]
self.stdout.write(f"Changing password for user '{username}'")
try:
user = User.objects.get(name=username)
except User.DoesNotExist as err:
message = f"Username '{username}' does not exist. "
message += "Available username(s) are:\n"
message += ", ".join([i.name for i in User.objects.all()])
raise CommandError(message) from err
user.set_password(new_password)
user.save()
self.stdout.write(
self.style.SUCCESS(f" ✓ updated password for user '{username}'")
)

View File

@ -1,76 +0,0 @@
"""backup config for sqlite reset and restore"""
import json
from pathlib import Path
from django.contrib.auth import get_user_model
from django.core.management.base import BaseCommand
from home.models import CustomPeriodicTask
from home.src.ta.settings import EnvironmentSettings
from rest_framework.authtoken.models import Token
User = get_user_model()
class Command(BaseCommand):
"""export"""
help = "Exports all users and their auth tokens to a JSON file"
FILE = Path(EnvironmentSettings.CACHE_DIR) / "backup" / "migration.json"
def handle(self, *args, **kwargs):
"""entry point"""
data = {
"user_data": self.get_users(),
"schedule_data": self.get_schedules(),
}
with open(self.FILE, "w", encoding="utf-8") as json_file:
json_file.write(json.dumps(data))
def get_users(self):
"""get users"""
users = User.objects.all()
user_data = []
for user in users:
user_info = {
"username": user.name,
"is_staff": user.is_staff,
"is_superuser": user.is_superuser,
"password": user.password,
"tokens": [],
}
try:
token = Token.objects.get(user=user)
user_info["tokens"] = [token.key]
except Token.DoesNotExist:
user_info["tokens"] = []
user_data.append(user_info)
return user_data
def get_schedules(self):
"""get schedules"""
all_schedules = CustomPeriodicTask.objects.all()
schedule_data = []
for schedule in all_schedules:
schedule_info = {
"name": schedule.name,
"crontab": {
"minute": schedule.crontab.minute,
"hour": schedule.crontab.hour,
"day_of_week": schedule.crontab.day_of_week,
},
}
schedule_data.append(schedule_info)
return schedule_data

View File

@ -1,89 +0,0 @@
"""restore config from backup"""
import json
from pathlib import Path
from common.src.env_settings import EnvironmentSettings
from django.core.management.base import BaseCommand
from django_celery_beat.models import CrontabSchedule
from rest_framework.authtoken.models import Token
from task.models import CustomPeriodicTask
from task.src.task_config import TASK_CONFIG
from user.models import Account
class Command(BaseCommand):
"""export"""
help = "Exports all users and their auth tokens to a JSON file"
FILE = Path(EnvironmentSettings.CACHE_DIR) / "backup" / "migration.json"
def handle(self, *args, **options):
"""handle"""
self.stdout.write("restore users and schedules")
data = self.get_config()
self.restore_users(data["user_data"])
self.restore_schedules(data["schedule_data"])
self.stdout.write(
self.style.SUCCESS(
" ✓ restore completed. Please restart the container."
)
)
def get_config(self) -> dict:
"""get config from backup"""
with open(self.FILE, "r", encoding="utf-8") as json_file:
data = json.loads(json_file.read())
self.stdout.write(
self.style.SUCCESS(f" ✓ json file found: {self.FILE}")
)
return data
def restore_users(self, user_data: list[dict]) -> None:
"""restore users from config"""
self.stdout.write("delete existing users")
Account.objects.all().delete()
self.stdout.write("recreate users")
for user_info in user_data:
user = Account.objects.create(
name=user_info["username"],
is_staff=user_info["is_staff"],
is_superuser=user_info["is_superuser"],
password=user_info["password"],
)
for token in user_info["tokens"]:
Token.objects.create(user=user, key=token)
self.stdout.write(
self.style.SUCCESS(
f" ✓ recreated user with name: {user_info['username']}"
)
)
def restore_schedules(self, schedule_data: list[dict]) -> None:
"""restore schedules"""
self.stdout.write("delete existing schedules")
CustomPeriodicTask.objects.all().delete()
self.stdout.write("recreate schedules")
for schedule in schedule_data:
task_name = schedule["name"]
description = TASK_CONFIG[task_name].get("title")
crontab, _ = CrontabSchedule.objects.get_or_create(
minute=schedule["crontab"]["minute"],
hour=schedule["crontab"]["hour"],
day_of_week=schedule["crontab"]["day_of_week"],
timezone=EnvironmentSettings.TZ,
)
task = CustomPeriodicTask.objects.create(
name=task_name,
task=task_name,
description=description,
crontab=crontab,
)
self.stdout.write(
self.style.SUCCESS(f" ✓ recreated schedule: {task}")
)

View File

@ -1,177 +0,0 @@
"""
Functionality:
- check that all connections are working
"""
from time import sleep
import requests
from common.src.env_settings import EnvironmentSettings
from common.src.es_connect import ElasticWrap
from common.src.ta_redis import RedisArchivist
from django.core.management.base import BaseCommand, CommandError
TOPIC = """
#######################
# Connection check #
#######################
"""
class Command(BaseCommand):
"""command framework"""
TIMEOUT = 120
MIN_MAJOR, MAX_MAJOR = 8, 8
MIN_MINOR = 0
# pylint: disable=no-member
help = "Check connections"
def handle(self, *args, **options):
"""run all commands"""
self.stdout.write(TOPIC)
self._redis_connection_check()
self._redis_config_set()
self._es_connection_check()
self._es_version_check()
self._es_path_check()
def _redis_connection_check(self):
"""check ir redis connection is established"""
self.stdout.write("[1] connect to Redis")
redis_conn = RedisArchivist().conn
for _ in range(5):
try:
pong = redis_conn.execute_command("PING")
if pong:
self.stdout.write(
self.style.SUCCESS(" ✓ Redis connection verified")
)
return
except Exception: # pylint: disable=broad-except
self.stdout.write(" ... retry Redis connection")
sleep(2)
message = " 🗙 Redis connection failed"
self.stdout.write(self.style.ERROR(f"{message}"))
try:
redis_conn.execute_command("PING")
except Exception as err: # pylint: disable=broad-except
message = f" 🗙 {type(err).__name__}: {err}"
self.stdout.write(self.style.ERROR(f"{message}"))
sleep(60)
raise CommandError(message)
def _redis_config_set(self):
"""set config for redis if not set already"""
self.stdout.write("[2] set Redis config")
redis_conn = RedisArchivist().conn
timeout_is = int(redis_conn.config_get("timeout").get("timeout"))
if not timeout_is:
redis_conn.config_set("timeout", 3600)
self.stdout.write(self.style.SUCCESS(" ✓ Redis config set"))
def _es_connection_check(self):
"""wait for elasticsearch connection"""
self.stdout.write("[3] connect to Elastic Search")
total = self.TIMEOUT // 5
for i in range(total):
self.stdout.write(f" ... waiting for ES [{i}/{total}]")
try:
_, status_code = ElasticWrap("/").get(
timeout=1, print_error=False
)
except (
requests.exceptions.ConnectionError,
requests.exceptions.Timeout,
):
sleep(5)
continue
if status_code and status_code == 401:
sleep(5)
continue
if status_code and status_code == 200:
path = (
"_cluster/health?"
"wait_for_status=yellow&"
"timeout=60s&"
"wait_for_active_shards=1"
)
_, _ = ElasticWrap(path).get(timeout=60)
self.stdout.write(
self.style.SUCCESS(" ✓ ES connection established")
)
return
response, status_code = ElasticWrap("/").get(
timeout=1, print_error=False
)
message = " 🗙 ES connection failed"
self.stdout.write(self.style.ERROR(f"{message}"))
self.stdout.write(f" error message: {response}")
self.stdout.write(f" status code: {status_code}")
sleep(60)
raise CommandError(message)
def _es_version_check(self):
"""check for minimal elasticsearch version"""
self.stdout.write("[4] Elastic Search version check")
response, _ = ElasticWrap("/").get()
version = response["version"]["number"]
major = int(version.split(".")[0])
if self.MIN_MAJOR <= major <= self.MAX_MAJOR:
self.stdout.write(
self.style.SUCCESS(" ✓ ES version check passed")
)
return
message = (
" 🗙 ES version check failed. "
+ f"Expected {self.MIN_MAJOR}.{self.MIN_MINOR} but got {version}"
)
self.stdout.write(self.style.ERROR(f"{message}"))
sleep(60)
raise CommandError(message)
def _es_path_check(self):
"""check that path.repo var is set"""
self.stdout.write("[5] check ES path.repo env var")
response, _ = ElasticWrap("_nodes/_all/settings").get()
snaphost_roles = [
"data",
"data_cold",
"data_content",
"data_frozen",
"data_hot",
"data_warm",
"master",
]
for node in response["nodes"].values():
if not (set(node["roles"]) & set(snaphost_roles)):
continue
if node["settings"]["path"].get("repo"):
self.stdout.write(
self.style.SUCCESS(" ✓ path.repo env var is set")
)
return
message = (
" 🗙 path.repo env var not found. "
+ "set the following env var to the ES container:\n"
+ " path.repo="
+ EnvironmentSettings.ES_SNAPSHOT_DIR
)
self.stdout.write(self.style.ERROR(message))
sleep(60)
raise CommandError(message)

View File

@ -1,218 +0,0 @@
"""
Functionality:
- Check environment at startup
- Process config file overwrites from env var
- Stop startup on error
- python management.py ta_envcheck
"""
import os
import re
from time import sleep
from common.src.env_settings import EnvironmentSettings
from django.core.management.base import BaseCommand, CommandError
from user.models import Account
LOGO = """
.... .....
...'',;:cc,. .;::;;,'...
..,;:cccllclc, .:ccllllcc;,..
..,:cllcc:;,'.',. ....'',;ccllc:,..
..;cllc:,'.. ...,:cccc:'.
.;cccc;.. ..,:ccc:'.
.ckkkOkxollllllllllllc. .,:::;. .,cclc;
.:0MMMMMMMMMMMMMMMMMMMX: .cNMMMWx. .;clc:
.;lOXK0000KNMMMMX00000KO; ;KMMMMMNl. .;ccl:,.
.;:c:'.....kMMMNo........ 'OMMMWMMMK: '::;;'.
....... .xMMMNl .dWMMXdOMMMO' ........
.:cc:;. .xMMMNc .lNMMNo.:XMMWx. .:cl:.
.:llc,. .:xxxd, ;KMMMk. .oWMMNl. .:llc'
.cll:. .;:;;:::,. 'OMMMK:';''kWMMK: .;llc,
.cll:. .,;;;;;;,. .,xWMMNl.:l:.;KMMMO' .;llc'
.:llc. .cOOOk; .lKNMMWx..:l:..lNMMWx. .:llc'
.;lcc,. .xMMMNc :KMMMM0, .:lc. .xWMMNl.'ccl:.
.cllc. .xMMMNc 'OMMMMXc...:lc...,0MMMKl:lcc,.
.,ccl:. .xMMMNc .xWMMMWo.,;;:lc;;;.cXMMMXdcc;.
.,clc:. .xMMMNc .lNMMMWk. .':clc:,. .dWMMW0o;.
.,clcc,. .ckkkx; .okkkOx, .';,. 'kKKK0l.
.':lcc:'..... . .. ..,;cllc,.
.,cclc,.... ....;clc;..
..,:,..,c:'.. ...';:,..,:,.
....:lcccc:;,'''.....'',;;:clllc,....
.'',;:cllllllccccclllllcc:,'..
...'',,;;;;;;;;;,''...
.....
"""
TOPIC = """
#######################
# Environment Setup #
#######################
"""
EXPECTED_ENV_VARS = [
"TA_USERNAME",
"TA_PASSWORD",
"ELASTIC_PASSWORD",
"ES_URL",
"TA_HOST",
]
UNEXPECTED_ENV_VARS = {
"TA_UWSGI_PORT": "Has been replaced with 'TA_BACKEND_PORT'",
"REDIS_HOST": "Has been replaced with 'REDIS_CON' connection string",
"REDIS_PORT": "Has been consolidated in 'REDIS_CON' connection string",
"ENABLE_CAST": "That is now a toggle in setting and DISABLE_STATIC_AUTH",
}
INST = "https://github.com/tubearchivist/tubearchivist#installing-and-updating"
NGINX = "/etc/nginx/sites-available/default"
class Command(BaseCommand):
"""command framework"""
# pylint: disable=no-member
help = "Check environment before startup"
def handle(self, *args, **options):
"""run all commands"""
self.stdout.write(LOGO)
self.stdout.write(TOPIC)
self._expected_vars()
self._unexpected_vars()
self._elastic_user_overwrite()
self._ta_port_overwrite()
self._ta_backend_port_overwrite()
self._disable_static_auth()
self._create_superuser()
def _expected_vars(self):
"""check if expected env vars are set"""
self.stdout.write("[1] checking expected env vars")
env = os.environ
for var in EXPECTED_ENV_VARS:
if not env.get(var):
message = f" 🗙 expected env var {var} not set\n {INST}"
self.stdout.write(self.style.ERROR(message))
sleep(60)
raise CommandError(message)
message = " ✓ all expected env vars are set"
self.stdout.write(self.style.SUCCESS(message))
def _unexpected_vars(self):
"""check for unexpected env vars"""
self.stdout.write("[2] checking for unexpected env vars")
for var, message in UNEXPECTED_ENV_VARS.items():
if not os.environ.get(var):
continue
message = (
f" 🗙 unexpected env var {var} found\n"
f" {message} \n"
" see release notes for a list of all changes."
)
self.stdout.write(self.style.ERROR(message))
sleep(60)
raise CommandError(message)
message = " ✓ no unexpected env vars found"
self.stdout.write(self.style.SUCCESS(message))
def _elastic_user_overwrite(self):
"""check for ELASTIC_USER overwrite"""
self.stdout.write("[3] check ES user overwrite")
env = EnvironmentSettings.ES_USER
self.stdout.write(self.style.SUCCESS(f" ✓ ES user is set to {env}"))
def _ta_port_overwrite(self):
"""set TA_PORT overwrite for nginx"""
self.stdout.write("[4] check TA_PORT overwrite")
overwrite = EnvironmentSettings.TA_PORT
if not overwrite:
self.stdout.write(self.style.SUCCESS(" TA_PORT is not set"))
return
regex = re.compile(r"listen [0-9]{1,5}")
to_overwrite = f"listen {overwrite}"
changed = file_overwrite(NGINX, regex, to_overwrite)
if changed:
message = f" ✓ TA_PORT changed to {overwrite}"
else:
message = f" ✓ TA_PORT already set to {overwrite}"
self.stdout.write(self.style.SUCCESS(message))
def _ta_backend_port_overwrite(self):
"""set TA_BACKEND_PORT overwrite"""
self.stdout.write("[5] check TA_BACKEND_PORT overwrite")
overwrite = EnvironmentSettings.TA_BACKEND_PORT
if not overwrite:
message = " TA_BACKEND_PORT is not set"
self.stdout.write(self.style.SUCCESS(message))
return
# modify nginx conf
regex = re.compile(r"proxy_pass http://localhost:[0-9]{1,5}")
to_overwrite = f"proxy_pass http://localhost:{overwrite}"
changed = file_overwrite(NGINX, regex, to_overwrite)
if changed:
message = f" ✓ TA_BACKEND_PORT changed to {overwrite}"
else:
message = f" ✓ TA_BACKEND_PORT already set to {overwrite}"
self.stdout.write(self.style.SUCCESS(message))
def _disable_static_auth(self):
"""cast workaround, remove auth for static files in nginx"""
self.stdout.write("[7] check DISABLE_STATIC_AUTH overwrite")
overwrite = EnvironmentSettings.DISABLE_STATIC_AUTH
if not overwrite:
self.stdout.write(
self.style.SUCCESS(" DISABLE_STATIC_AUTH is not set")
)
return
regex = re.compile(r"[^\S\r\n]*auth_request /api/ping/;\n")
changed = file_overwrite(NGINX, regex, "")
if changed:
message = " ✓ process nginx to disable static auth"
else:
message = " ✓ static auth is already disabled in nginx"
self.stdout.write(self.style.SUCCESS(message))
def _create_superuser(self):
"""create superuser if not exist"""
self.stdout.write("[8] create superuser")
is_created = Account.objects.filter(is_superuser=True)
if is_created:
message = " superuser already created"
self.stdout.write(self.style.SUCCESS(message))
return
name = EnvironmentSettings.TA_USERNAME
password = EnvironmentSettings.TA_PASSWORD
Account.objects.create_superuser(name, password)
message = f" ✓ new superuser with name {name} created"
self.stdout.write(self.style.SUCCESS(message))
def file_overwrite(file_path, regex, overwrite):
"""change file content from old to overwrite, return true when changed"""
with open(file_path, "r", encoding="utf-8") as f:
file_content = f.read()
changed = re.sub(regex, overwrite, file_content)
if changed == file_content:
return False
with open(file_path, "w", encoding="utf-8") as f:
f.write(changed)
return True

View File

@ -1,391 +0,0 @@
"""
Functionality:
- Application startup
- Apply migrations
"""
import os
from datetime import datetime
from random import randint
from time import sleep
from appsettings.src.config import AppConfig, ReleaseVersion
from appsettings.src.index_setup import ElasitIndexWrap
from appsettings.src.snapshot import ElasticSnapshot
from common.src.env_settings import EnvironmentSettings
from common.src.es_connect import ElasticWrap
from common.src.helper import clear_dl_cache
from common.src.ta_redis import RedisArchivist
from django.core.management.base import BaseCommand, CommandError
from django.utils import dateformat
from django_celery_beat.models import CrontabSchedule, PeriodicTasks
from redis.exceptions import ResponseError
from task.models import CustomPeriodicTask
from task.src.config_schedule import ScheduleBuilder
from task.src.task_manager import TaskManager
from task.tasks import version_check
TOPIC = """
#######################
# Application Start #
#######################
"""
class Command(BaseCommand):
"""command framework"""
# pylint: disable=no-member
def handle(self, *args, **options):
"""run all commands"""
self.stdout.write(TOPIC)
self._make_folders()
self._clear_redis_keys()
self._clear_tasks()
self._clear_dl_cache()
self._version_check()
self._index_setup()
self._snapshot_check()
self._mig_app_settings()
self._create_default_schedules()
self._update_schedule_tz()
self._init_app_config()
self._mig_channel_tags()
self._mig_video_channel_tags()
self._mig_fix_download_channel_indexed()
def _make_folders(self):
"""make expected cache folders"""
self.stdout.write("[1] create expected cache folders")
folders = [
"backup",
"channels",
"download",
"import",
"playlists",
"videos",
]
cache_dir = EnvironmentSettings.CACHE_DIR
for folder in folders:
folder_path = os.path.join(cache_dir, folder)
os.makedirs(folder_path, exist_ok=True)
self.stdout.write(self.style.SUCCESS(" ✓ expected folders created"))
def _clear_redis_keys(self):
"""make sure there are no leftover locks or keys set in redis"""
self.stdout.write("[2] clear leftover keys in redis")
all_keys = [
"dl_queue_id",
"dl_queue",
"downloading",
"manual_import",
"reindex",
"rescan",
"run_backup",
"startup_check",
"reindex:ta_video",
"reindex:ta_channel",
"reindex:ta_playlist",
]
redis_con = RedisArchivist()
has_changed = False
for key in all_keys:
if redis_con.del_message(key):
self.stdout.write(
self.style.SUCCESS(f" ✓ cleared key {key}")
)
has_changed = True
if not has_changed:
self.stdout.write(self.style.SUCCESS(" no keys found"))
def _clear_tasks(self):
"""clear tasks and messages"""
self.stdout.write("[3] clear task leftovers")
TaskManager().fail_pending()
redis_con = RedisArchivist()
to_delete = redis_con.list_keys("message:")
if to_delete:
for key in to_delete:
redis_con.del_message(key)
self.stdout.write(
self.style.SUCCESS(f" ✓ cleared {len(to_delete)} messages")
)
def _clear_dl_cache(self):
"""clear leftover files from dl cache"""
self.stdout.write("[4] clear leftover files from dl cache")
leftover_files = clear_dl_cache(EnvironmentSettings.CACHE_DIR)
if leftover_files:
self.stdout.write(
self.style.SUCCESS(f" ✓ cleared {leftover_files} files")
)
else:
self.stdout.write(self.style.SUCCESS(" no files found"))
def _version_check(self):
"""remove new release key if updated now"""
self.stdout.write("[5] check for first run after update")
new_version = ReleaseVersion().is_updated()
if new_version:
self.stdout.write(
self.style.SUCCESS(f" ✓ update to {new_version} completed")
)
else:
self.stdout.write(self.style.SUCCESS(" no new update found"))
version_task = CustomPeriodicTask.objects.filter(name="version_check")
if not version_task.exists():
return
if not version_task.first().last_run_at:
self.style.SUCCESS(" ✓ send initial version check task")
version_check.delay()
def _index_setup(self):
"""migration: validate index mappings"""
self.stdout.write("[6] validate index mappings")
ElasitIndexWrap().setup()
def _snapshot_check(self):
"""migration setup snapshots"""
self.stdout.write("[7] setup snapshots")
ElasticSnapshot().setup()
def _mig_app_settings(self) -> None:
"""update from v0.4.13 to v0.5.0, migrate application settings"""
self.stdout.write("[MIGRATION] move appconfig to ES")
try:
config = RedisArchivist().get_message("config")
except ResponseError:
self.stdout.write(
self.style.SUCCESS(" Redis does not support JSON decoding")
)
return
if not config or config == {"status": False}:
self.stdout.write(
self.style.SUCCESS(" no config values to migrate")
)
return
path = "ta_config/_doc/appsettings"
response, status_code = ElasticWrap(path).post(config)
if status_code in [200, 201]:
self.stdout.write(
self.style.SUCCESS(" ✓ migrated appconfig to ES")
)
RedisArchivist().del_message("config", save=True)
return
message = " 🗙 failed to migrate app config"
self.stdout.write(self.style.ERROR(message))
self.stdout.write(response)
sleep(60)
raise CommandError(message)
def _create_default_schedules(self) -> None:
"""create default schedules for new installations"""
self.stdout.write("[8] create initial schedules")
init_has_run = CustomPeriodicTask.objects.filter(
name="version_check"
).exists()
if init_has_run:
self.stdout.write(
self.style.SUCCESS(
" schedule init already done, skipping..."
)
)
return
builder = ScheduleBuilder()
check_reindex = builder.get_set_task(
"check_reindex", schedule=builder.SCHEDULES["check_reindex"]
)
check_reindex.task_config.update({"days": 90})
check_reindex.last_run_at = dateformat.make_aware(datetime.now())
check_reindex.save()
self.stdout.write(
self.style.SUCCESS(
f" ✓ created new default schedule: {check_reindex}"
)
)
thumbnail_check = builder.get_set_task(
"thumbnail_check", schedule=builder.SCHEDULES["thumbnail_check"]
)
thumbnail_check.last_run_at = dateformat.make_aware(datetime.now())
thumbnail_check.save()
self.stdout.write(
self.style.SUCCESS(
f" ✓ created new default schedule: {thumbnail_check}"
)
)
daily_random = f"{randint(0, 59)} {randint(0, 23)} *"
version_check_task = builder.get_set_task(
"version_check", schedule=daily_random
)
self.stdout.write(
self.style.SUCCESS(
f" ✓ created new default schedule: {version_check_task}"
)
)
self.stdout.write(
self.style.SUCCESS(" ✓ all default schedules created")
)
def _update_schedule_tz(self) -> None:
"""update timezone for Schedule instances"""
self.stdout.write("[9] validate schedules TZ")
tz = EnvironmentSettings.TZ
to_update = CrontabSchedule.objects.exclude(timezone=tz)
if not to_update.exists():
self.stdout.write(
self.style.SUCCESS(" all schedules have correct TZ")
)
return
updated = to_update.update(timezone=tz)
self.stdout.write(
self.style.SUCCESS(f" ✓ updated {updated} schedules to {tz}.")
)
PeriodicTasks.update_changed()
def _init_app_config(self) -> None:
"""init default app config to ES"""
self.stdout.write("[10] Check AppConfig")
response, status_code = ElasticWrap("ta_config/_doc/appsettings").get()
if status_code in [200, 201]:
self.stdout.write(
self.style.SUCCESS(" skip completed appsettings init")
)
updated_defaults = AppConfig().add_new_defaults()
for new_default in updated_defaults:
self.stdout.write(
self.style.SUCCESS(f" added new default: {new_default}")
)
return
if status_code != 404:
message = " 🗙 ta_config index lookup failed"
self.stdout.write(self.style.ERROR(message))
self.stdout.write(response)
sleep(60)
raise CommandError(message)
handler = AppConfig.__new__(AppConfig)
_, status_code = handler.sync_defaults()
self.stdout.write(
self.style.SUCCESS(" ✓ Created default appsettings.")
)
self.stdout.write(
self.style.SUCCESS(f" Status code: {status_code}")
)
def _mig_channel_tags(self) -> None:
"""update from v0.4.13 to v0.5.0, migrate incorrect data types"""
self.stdout.write("[MIGRATION] fix incorrect channel tags types")
path = "ta_channel/_update_by_query"
data = {
"query": {"match": {"channel_tags": False}},
"script": {
"source": "ctx._source.channel_tags = []",
"lang": "painless",
},
}
response, status_code = ElasticWrap(path).post(data)
if status_code in [200, 201]:
updated = response.get("updated")
if updated:
self.stdout.write(
self.style.SUCCESS(f" ✓ fixed {updated} channel tags")
)
else:
self.stdout.write(
self.style.SUCCESS(" no channel tags needed fixing")
)
return
message = " 🗙 failed to fix channel tags"
self.stdout.write(self.style.ERROR(message))
self.stdout.write(response)
sleep(60)
raise CommandError(message)
def _mig_video_channel_tags(self) -> None:
"""update from v0.4.13 to v0.5.0, migrate incorrect data types"""
self.stdout.write("[MIGRATION] fix incorrect video channel tags types")
path = "ta_video/_update_by_query"
data = {
"query": {"match": {"channel.channel_tags": False}},
"script": {
"source": "ctx._source.channel.channel_tags = []",
"lang": "painless",
},
}
response, status_code = ElasticWrap(path).post(data)
if status_code in [200, 201]:
updated = response.get("updated")
if updated:
self.stdout.write(
self.style.SUCCESS(
f" ✓ fixed {updated} video channel tags"
)
)
else:
self.stdout.write(
self.style.SUCCESS(
" no video channel tags needed fixing"
)
)
return
message = " 🗙 failed to fix video channel tags"
self.stdout.write(self.style.ERROR(message))
self.stdout.write(response)
sleep(60)
raise CommandError(message)
def _mig_fix_download_channel_indexed(self) -> None:
"""migrate from v0.5.2 to 0.5.3, fix missing channel_indexed"""
self.stdout.write("[MIGRATION] fix incorrect video channel tags types")
path = "ta_download/_update_by_query"
data = {
"query": {
"bool": {
"must_not": [{"exists": {"field": "channel_indexed"}}]
}
},
"script": {
"source": "ctx._source.channel_indexed = false",
"lang": "painless",
},
}
response, status_code = ElasticWrap(path).post(data)
if status_code in [200, 201]:
updated = response.get("updated")
if updated:
self.stdout.write(
self.style.SUCCESS(f" ✓ fixed {updated} queued videos")
)
else:
self.stdout.write(
self.style.SUCCESS(" no queued videos to fix")
)
return
message = " 🗙 failed to fix video channel tags"
self.stdout.write(self.style.ERROR(message))
self.stdout.write(response)
sleep(60)
raise CommandError(message)

View File

@ -1,40 +0,0 @@
"""stop on unexpected table"""
from time import sleep
from django.core.management.base import BaseCommand, CommandError
from django.db import connection
ERROR_MESSAGE = """
🗙 Database is incompatible, see latest release notes for instructions:
🗙 https://github.com/tubearchivist/tubearchivist/releases/tag/v0.5.0
"""
class Command(BaseCommand):
"""command framework"""
# pylint: disable=no-member
def handle(self, *args, **options):
"""handle"""
self.stdout.write("[MIGRATION] Confirming v0.5.0 table layout")
all_tables = self.list_tables()
for table in all_tables:
if table == "home_account":
self.stdout.write(self.style.ERROR(ERROR_MESSAGE))
sleep(60)
raise CommandError(ERROR_MESSAGE)
self.stdout.write(self.style.SUCCESS(" ✓ local DB is up-to-date."))
def list_tables(self):
"""raw list all tables"""
with connection.cursor() as cursor:
cursor.execute(
"SELECT name FROM sqlite_master WHERE type='table';"
)
tables = cursor.fetchall()
return [table[0] for table in tables]

View File

@ -1,95 +0,0 @@
"""download serializers"""
# pylint: disable=abstract-method
from common.serializers import PaginationSerializer, ValidateUnknownFieldsMixin
from rest_framework import serializers
from video.src.constants import VideoTypeEnum
class DownloadItemSerializer(serializers.Serializer):
"""serialize download item"""
auto_start = serializers.BooleanField()
channel_id = serializers.CharField()
channel_indexed = serializers.BooleanField()
channel_name = serializers.CharField()
duration = serializers.CharField()
published = serializers.CharField()
status = serializers.ChoiceField(choices=["pending", "ignore"])
timestamp = serializers.IntegerField()
title = serializers.CharField()
vid_thumb_url = serializers.CharField()
vid_type = serializers.ChoiceField(choices=VideoTypeEnum.values())
youtube_id = serializers.CharField()
message = serializers.CharField(required=False)
_index = serializers.CharField(required=False)
_score = serializers.IntegerField(required=False)
class DownloadListSerializer(serializers.Serializer):
"""serialize download list"""
data = DownloadItemSerializer(many=True)
paginate = PaginationSerializer()
class DownloadListQuerySerializer(
ValidateUnknownFieldsMixin, serializers.Serializer
):
"""serialize query params for download list"""
filter = serializers.ChoiceField(
choices=["pending", "ignore"], required=False
)
channel = serializers.CharField(required=False, help_text="channel ID")
page = serializers.IntegerField(required=False)
class DownloadListQueueDeleteQuerySerializer(serializers.Serializer):
"""serialize bulk delete download queue query string"""
filter = serializers.ChoiceField(choices=["pending", "ignore"])
class AddDownloadItemSerializer(serializers.Serializer):
"""serialize single item to add"""
youtube_id = serializers.CharField()
status = serializers.ChoiceField(choices=["pending", "ignore-force"])
class AddToDownloadListSerializer(serializers.Serializer):
"""serialize add to download queue data"""
data = AddDownloadItemSerializer(many=True)
class AddToDownloadQuerySerializer(serializers.Serializer):
"""add to queue query serializer"""
autostart = serializers.BooleanField(required=False)
class DownloadQueueItemUpdateSerializer(serializers.Serializer):
"""update single download queue item"""
status = serializers.ChoiceField(
choices=["pending", "ignore", "ignore-force", "priority"]
)
class DownloadAggBucketSerializer(serializers.Serializer):
"""serialize bucket"""
key = serializers.ListField(child=serializers.CharField())
key_as_string = serializers.CharField()
doc_count = serializers.IntegerField()
class DownloadAggsSerializer(serializers.Serializer):
"""serialize download channel bucket aggregations"""
doc_count_error_upper_bound = serializers.IntegerField()
sum_other_doc_count = serializers.IntegerField()
buckets = DownloadAggBucketSerializer(many=True)

View File

@ -1,367 +0,0 @@
"""
Functionality:
- handle download queue
- linked with ta_dowload index
"""
from datetime import datetime
from appsettings.src.config import AppConfig
from common.src.es_connect import ElasticWrap, IndexPaginate
from common.src.helper import get_duration_str, is_shorts, rand_sleep
from download.src.subscriptions import ChannelSubscription
from download.src.thumbnails import ThumbManager
from download.src.yt_dlp_base import YtWrap
from playlist.src.index import YoutubePlaylist
from video.src.constants import VideoTypeEnum
class PendingIndex:
"""base class holding all export methods"""
def __init__(self):
self.all_pending = False
self.all_ignored = False
self.all_videos = False
self.all_channels = False
self.channel_overwrites = False
self.video_overwrites = False
self.to_skip = False
def get_download(self):
"""get a list of all pending videos in ta_download"""
data = {
"query": {"match_all": {}},
"sort": [{"timestamp": {"order": "asc"}}],
}
all_results = IndexPaginate("ta_download", data).get_results()
self.all_pending = []
self.all_ignored = []
self.to_skip = []
for result in all_results:
self.to_skip.append(result["youtube_id"])
if result["status"] == "pending":
self.all_pending.append(result)
elif result["status"] == "ignore":
self.all_ignored.append(result)
def get_indexed(self):
"""get a list of all videos indexed"""
data = {
"query": {"match_all": {}},
"sort": [{"published": {"order": "desc"}}],
}
self.all_videos = IndexPaginate("ta_video", data).get_results()
for video in self.all_videos:
self.to_skip.append(video["youtube_id"])
def get_channels(self):
"""get a list of all channels indexed"""
self.all_channels = []
self.channel_overwrites = {}
data = {
"query": {"match_all": {}},
"sort": [{"channel_id": {"order": "asc"}}],
}
channels = IndexPaginate("ta_channel", data).get_results()
for channel in channels:
channel_id = channel["channel_id"]
self.all_channels.append(channel_id)
if channel.get("channel_overwrites"):
self.channel_overwrites.update(
{channel_id: channel.get("channel_overwrites")}
)
self._map_overwrites()
def _map_overwrites(self):
"""map video ids to channel ids overwrites"""
self.video_overwrites = {}
for video in self.all_pending:
video_id = video["youtube_id"]
channel_id = video["channel_id"]
overwrites = self.channel_overwrites.get(channel_id, False)
if overwrites:
self.video_overwrites.update({video_id: overwrites})
class PendingInteract:
"""interact with items in download queue"""
def __init__(self, youtube_id=False, status=False):
self.youtube_id = youtube_id
self.status = status
def delete_item(self):
"""delete single item from pending"""
path = f"ta_download/_doc/{self.youtube_id}"
_, _ = ElasticWrap(path).delete(refresh=True)
def delete_by_status(self):
"""delete all matching item by status"""
data = {"query": {"term": {"status": {"value": self.status}}}}
path = "ta_download/_delete_by_query"
_, _ = ElasticWrap(path).post(data=data)
def update_status(self):
"""update status of pending item"""
if self.status == "priority":
data = {
"doc": {
"status": "pending",
"auto_start": True,
"message": None,
}
}
else:
data = {"doc": {"status": self.status}}
path = f"ta_download/_update/{self.youtube_id}/?refresh=true"
_, _ = ElasticWrap(path).post(data=data)
def get_item(self):
"""return pending item dict"""
path = f"ta_download/_doc/{self.youtube_id}"
response, status_code = ElasticWrap(path).get()
return response["_source"], status_code
def get_channel(self):
"""
get channel metadata from queue to not depend on channel to be indexed
"""
data = {
"size": 1,
"query": {"term": {"channel_id": {"value": self.youtube_id}}},
}
response, _ = ElasticWrap("ta_download/_search").get(data=data)
hits = response["hits"]["hits"]
if not hits:
channel_name = "NA"
else:
channel_name = hits[0]["_source"].get("channel_name", "NA")
return {
"channel_id": self.youtube_id,
"channel_name": channel_name,
}
class PendingList(PendingIndex):
"""manage the pending videos list"""
yt_obs = {
"noplaylist": True,
"writethumbnail": True,
"simulate": True,
"check_formats": None,
}
def __init__(self, youtube_ids=False, task=False):
super().__init__()
self.config = AppConfig().config
self.youtube_ids = youtube_ids
self.task = task
self.to_skip = False
self.missing_videos = False
def parse_url_list(self, auto_start=False):
"""extract youtube ids from list"""
self.missing_videos = []
self.get_download()
self.get_indexed()
total = len(self.youtube_ids)
for idx, entry in enumerate(self.youtube_ids):
self._process_entry(entry, auto_start=auto_start)
if not self.task:
continue
self.task.send_progress(
message_lines=[f"Extracting items {idx + 1}/{total}"],
progress=(idx + 1) / total,
)
def _process_entry(self, entry, auto_start=False):
"""process single entry from url list"""
vid_type = self._get_vid_type(entry)
if entry["type"] == "video":
self._add_video(entry["url"], vid_type, auto_start=auto_start)
elif entry["type"] == "channel":
self._parse_channel(entry["url"], vid_type)
elif entry["type"] == "playlist":
self._parse_playlist(entry["url"])
else:
raise ValueError(f"invalid url_type: {entry}")
@staticmethod
def _get_vid_type(entry):
"""add vid type enum if available"""
vid_type_str = entry.get("vid_type")
if not vid_type_str:
return VideoTypeEnum.UNKNOWN
return VideoTypeEnum(vid_type_str)
def _add_video(self, url, vid_type, auto_start=False):
"""add video to list"""
if auto_start and url in set(
i["youtube_id"] for i in self.all_pending
):
PendingInteract(youtube_id=url, status="priority").update_status()
return
if url not in self.missing_videos and url not in self.to_skip:
self.missing_videos.append((url, vid_type))
else:
print(f"{url}: skipped adding already indexed video to download.")
def _parse_channel(self, url, vid_type):
"""add all videos of channel to list"""
video_results = ChannelSubscription().get_last_youtube_videos(
url, limit=False, query_filter=vid_type
)
for video_id, _, vid_type in video_results:
self._add_video(video_id, vid_type)
def _parse_playlist(self, url):
"""add all videos of playlist to list"""
playlist = YoutubePlaylist(url)
is_active = playlist.update_playlist()
if not is_active:
message = f"{playlist.youtube_id}: failed to extract metadata"
print(message)
raise ValueError(message)
entries = playlist.json_data["playlist_entries"]
to_add = [i["youtube_id"] for i in entries if not i["downloaded"]]
if not to_add:
return
for video_id in to_add:
# match vid_type later
self._add_video(video_id, VideoTypeEnum.UNKNOWN)
def add_to_pending(self, status="pending", auto_start=False):
"""add missing videos to pending list"""
self.get_channels()
total = len(self.missing_videos)
videos_added = []
for idx, (youtube_id, vid_type) in enumerate(self.missing_videos):
if self.task and self.task.is_stopped():
break
print(f"{youtube_id}: [{idx + 1}/{total}]: add to queue")
self._notify_add(idx, total)
video_details = self.get_youtube_details(youtube_id, vid_type)
if not video_details:
rand_sleep(self.config)
continue
video_details.update(
{
"status": status,
"auto_start": auto_start,
}
)
url = video_details["vid_thumb_url"]
ThumbManager(youtube_id).download_video_thumb(url)
es_url = f"ta_download/_doc/{youtube_id}"
_, _ = ElasticWrap(es_url).put(video_details)
videos_added.append(youtube_id)
if idx != total:
rand_sleep(self.config)
return videos_added
def _notify_add(self, idx, total):
"""send notification for adding videos to download queue"""
if not self.task:
return
self.task.send_progress(
message_lines=[
"Adding new videos to download queue.",
f"Extracting items {idx + 1}/{total}",
],
progress=(idx + 1) / total,
)
def get_youtube_details(self, youtube_id, vid_type=VideoTypeEnum.VIDEOS):
"""get details from youtubedl for single pending video"""
vid = YtWrap(self.yt_obs, self.config).extract(youtube_id)
if not vid:
return False
if vid.get("id") != youtube_id:
# skip premium videos with different id
print(f"{youtube_id}: skipping premium video, id not matching")
return False
# stop if video is streaming live now
if vid["live_status"] in ["is_upcoming", "is_live"]:
print(f"{youtube_id}: skip is_upcoming or is_live")
return False
if vid["live_status"] == "was_live":
vid_type = VideoTypeEnum.STREAMS
else:
if self._check_shorts(vid):
vid_type = VideoTypeEnum.SHORTS
else:
vid_type = VideoTypeEnum.VIDEOS
if not vid.get("channel"):
print(f"{youtube_id}: skip video not part of channel")
return False
return self._parse_youtube_details(vid, vid_type)
@staticmethod
def _check_shorts(vid):
"""check if vid is shorts video"""
if vid["width"] > vid["height"]:
return False
duration = vid.get("duration")
if duration and isinstance(duration, int):
if duration > 3 * 60:
return False
return is_shorts(vid["id"])
def _parse_youtube_details(self, vid, vid_type=VideoTypeEnum.VIDEOS):
"""parse response"""
vid_id = vid.get("id")
# build dict
youtube_details = {
"youtube_id": vid_id,
"channel_name": vid["channel"],
"vid_thumb_url": vid["thumbnail"],
"title": vid["title"],
"channel_id": vid["channel_id"],
"duration": get_duration_str(vid["duration"]),
"published": self._build_published(vid),
"timestamp": int(datetime.now().timestamp()),
"vid_type": vid_type.value,
"channel_indexed": vid["channel_id"] in self.all_channels,
}
return youtube_details
@staticmethod
def _build_published(vid):
"""build published date or timestamp"""
timestamp = vid["timestamp"]
if timestamp:
return timestamp
upload_date = vid["upload_date"]
upload_date_time = datetime.strptime(upload_date, "%Y%m%d")
published = upload_date_time.strftime("%Y-%m-%d")
return published

View File

@ -1,441 +0,0 @@
"""
Functionality:
- handle channel subscriptions
- handle playlist subscriptions
"""
from appsettings.src.config import AppConfig
from channel.src.index import YoutubeChannel
from common.src.es_connect import IndexPaginate
from common.src.helper import is_missing, rand_sleep
from common.src.urlparser import Parser
from download.src.thumbnails import ThumbManager
from download.src.yt_dlp_base import YtWrap
from playlist.src.index import YoutubePlaylist
from video.src.constants import VideoTypeEnum
from video.src.index import YoutubeVideo
class ChannelSubscription:
"""manage the list of channels subscribed"""
def __init__(self, task=False):
self.config = AppConfig().config
self.task = task
@staticmethod
def get_channels(subscribed_only=True):
"""get a list of all channels subscribed to"""
data = {
"sort": [{"channel_name.keyword": {"order": "asc"}}],
}
if subscribed_only:
data["query"] = {"term": {"channel_subscribed": {"value": True}}}
else:
data["query"] = {"match_all": {}}
all_channels = IndexPaginate("ta_channel", data).get_results()
return all_channels
def get_last_youtube_videos(
self,
channel_id,
limit=True,
query_filter=None,
channel_overwrites=None,
):
"""get a list of last videos from channel"""
query_handler = VideoQueryBuilder(self.config, channel_overwrites)
queries = query_handler.build_queries(query_filter)
last_videos = []
for vid_type_enum, limit_amount in queries:
obs = {
"skip_download": True,
"extract_flat": True,
}
vid_type = vid_type_enum.value
if limit:
obs["playlistend"] = limit_amount
url = f"https://www.youtube.com/channel/{channel_id}/{vid_type}"
channel_query = YtWrap(obs, self.config).extract(url)
if not channel_query:
continue
last_videos.extend(
[
(i["id"], i["title"], vid_type)
for i in channel_query["entries"]
]
)
return last_videos
def find_missing(self):
"""add missing videos from subscribed channels to pending"""
all_channels = self.get_channels()
if not all_channels:
return False
missing_videos = []
total = len(all_channels)
for idx, channel in enumerate(all_channels):
channel_id = channel["channel_id"]
print(f"{channel_id}: find missing videos.")
last_videos = self.get_last_youtube_videos(
channel_id,
channel_overwrites=channel.get("channel_overwrites"),
)
if last_videos:
ids_to_add = is_missing([i[0] for i in last_videos])
for video_id, _, vid_type in last_videos:
if video_id in ids_to_add:
missing_videos.append((video_id, vid_type))
if not self.task:
continue
if self.task.is_stopped():
self.task.send_progress(["Received Stop signal."])
break
self.task.send_progress(
message_lines=[f"Scanning Channel {idx + 1}/{total}"],
progress=(idx + 1) / total,
)
rand_sleep(self.config)
return missing_videos
@staticmethod
def change_subscribe(channel_id, channel_subscribed):
"""subscribe or unsubscribe from channel and update"""
channel = YoutubeChannel(channel_id)
channel.build_json()
channel.json_data["channel_subscribed"] = channel_subscribed
channel.upload_to_es()
channel.sync_to_videos()
return channel.json_data
class VideoQueryBuilder:
"""Build queries for yt-dlp."""
def __init__(self, config: dict, channel_overwrites: dict | None = None):
self.config = config
self.channel_overwrites = channel_overwrites or {}
def build_queries(
self, video_type: VideoTypeEnum | None, limit: bool = True
) -> list[tuple[VideoTypeEnum, int | None]]:
"""Build queries for all or specific video type."""
query_methods = {
VideoTypeEnum.VIDEOS: self.videos_query,
VideoTypeEnum.STREAMS: self.streams_query,
VideoTypeEnum.SHORTS: self.shorts_query,
}
if video_type:
# build query for specific type
query_method = query_methods.get(video_type)
if query_method:
query = query_method(limit)
if query[1] != 0:
return [query]
return []
# Build and return queries for all video types
queries = []
for build_query in query_methods.values():
query = build_query(limit)
if query[1] != 0:
queries.append(query)
return queries
def videos_query(self, limit: bool) -> tuple[VideoTypeEnum, int | None]:
"""Build query for videos."""
return self._build_generic_query(
video_type=VideoTypeEnum.VIDEOS,
overwrite_key="subscriptions_channel_size",
config_key="channel_size",
limit=limit,
)
def streams_query(self, limit: bool) -> tuple[VideoTypeEnum, int | None]:
"""Build query for streams."""
return self._build_generic_query(
video_type=VideoTypeEnum.STREAMS,
overwrite_key="subscriptions_live_channel_size",
config_key="live_channel_size",
limit=limit,
)
def shorts_query(self, limit: bool) -> tuple[VideoTypeEnum, int | None]:
"""Build query for shorts."""
return self._build_generic_query(
video_type=VideoTypeEnum.SHORTS,
overwrite_key="subscriptions_shorts_channel_size",
config_key="shorts_channel_size",
limit=limit,
)
def _build_generic_query(
self,
video_type: VideoTypeEnum,
overwrite_key: str,
config_key: str,
limit: bool,
) -> tuple[VideoTypeEnum, int | None]:
"""Generic query for video page scraping."""
if not limit:
return (video_type, None)
if (
overwrite_key in self.channel_overwrites
and self.channel_overwrites[overwrite_key] is not None
):
overwrite = self.channel_overwrites[overwrite_key]
return (video_type, overwrite)
if overwrite := self.config["subscriptions"].get(config_key):
return (video_type, overwrite)
return (video_type, 0)
class PlaylistSubscription:
"""manage the playlist download functionality"""
def __init__(self, task=False):
self.config = AppConfig().config
self.task = task
@staticmethod
def get_playlists(subscribed_only=True):
"""get a list of all active playlists"""
data = {
"sort": [{"playlist_channel.keyword": {"order": "desc"}}],
}
data["query"] = {
"bool": {"must": [{"term": {"playlist_active": {"value": True}}}]}
}
if subscribed_only:
data["query"]["bool"]["must"].append(
{"term": {"playlist_subscribed": {"value": True}}}
)
all_playlists = IndexPaginate("ta_playlist", data).get_results()
return all_playlists
def process_url_str(self, new_playlists, subscribed=True):
"""process playlist subscribe form url_str"""
for idx, playlist in enumerate(new_playlists):
playlist_id = playlist["url"]
if not playlist["type"] == "playlist":
print(f"{playlist_id} not a playlist, skipping...")
continue
playlist_h = YoutubePlaylist(playlist_id)
playlist_h.build_json()
if not playlist_h.json_data:
message = f"{playlist_h.youtube_id}: failed to extract data"
print(message)
raise ValueError(message)
playlist_h.json_data["playlist_subscribed"] = subscribed
playlist_h.upload_to_es()
playlist_h.add_vids_to_playlist()
self.channel_validate(playlist_h.json_data["playlist_channel_id"])
url = playlist_h.json_data["playlist_thumbnail"]
thumb = ThumbManager(playlist_id, item_type="playlist")
thumb.download_playlist_thumb(url)
if self.task:
self.task.send_progress(
message_lines=[
f"Processing {idx + 1} of {len(new_playlists)}"
],
progress=(idx + 1) / len(new_playlists),
)
@staticmethod
def channel_validate(channel_id):
"""make sure channel of playlist is there"""
channel = YoutubeChannel(channel_id)
channel.build_json(upload=True)
@staticmethod
def change_subscribe(playlist_id, subscribe_status):
"""change the subscribe status of a playlist"""
playlist = YoutubePlaylist(playlist_id)
playlist.build_json()
playlist.json_data["playlist_subscribed"] = subscribe_status
playlist.upload_to_es()
return playlist.json_data
def find_missing(self):
"""find videos in subscribed playlists not downloaded yet"""
all_playlists = [i["playlist_id"] for i in self.get_playlists()]
if not all_playlists:
return False
missing_videos = []
total = len(all_playlists)
for idx, playlist_id in enumerate(all_playlists):
playlist = YoutubePlaylist(playlist_id)
is_active = playlist.update_playlist()
if not is_active:
playlist.deactivate()
continue
playlist_entries = playlist.json_data["playlist_entries"]
size_limit = self.config["subscriptions"]["channel_size"]
if size_limit:
del playlist_entries[size_limit:]
to_check = [
i["youtube_id"]
for i in playlist_entries
if i["downloaded"] is False
]
needs_downloading = is_missing(to_check)
missing_videos.extend(needs_downloading)
if not self.task:
continue
if self.task.is_stopped():
self.task.send_progress(["Received Stop signal."])
break
self.task.send_progress(
message_lines=[f"Scanning Playlists {idx + 1}/{total}"],
progress=(idx + 1) / total,
)
rand_sleep(self.config)
return missing_videos
class SubscriptionScanner:
"""add missing videos to queue"""
def __init__(self, task=False):
self.task = task
self.missing_videos = False
self.auto_start = AppConfig().config["subscriptions"].get("auto_start")
def scan(self):
"""scan channels and playlists"""
if self.task:
self.task.send_progress(["Rescanning channels and playlists."])
self.missing_videos = []
self.scan_channels()
if self.task and not self.task.is_stopped():
self.scan_playlists()
return self.missing_videos
def scan_channels(self):
"""get missing from channels"""
channel_handler = ChannelSubscription(task=self.task)
missing = channel_handler.find_missing()
if not missing:
return
for vid_id, vid_type in missing:
self.missing_videos.append(
{"type": "video", "vid_type": vid_type, "url": vid_id}
)
def scan_playlists(self):
"""get missing from playlists"""
playlist_handler = PlaylistSubscription(task=self.task)
missing = playlist_handler.find_missing()
if not missing:
return
for i in missing:
self.missing_videos.append(
{
"type": "video",
"vid_type": VideoTypeEnum.VIDEOS.value,
"url": i,
}
)
class SubscriptionHandler:
"""subscribe to channels and playlists from url_str"""
def __init__(self, url_str, task=False):
self.url_str = url_str
self.task = task
self.to_subscribe = False
def subscribe(self, expected_type=False):
"""subscribe to url_str items"""
if self.task:
self.task.send_progress(["Processing form content."])
self.to_subscribe = Parser(self.url_str).parse()
total = len(self.to_subscribe)
for idx, item in enumerate(self.to_subscribe):
if self.task:
self._notify(idx, item, total)
self.subscribe_type(item, expected_type=expected_type)
def subscribe_type(self, item, expected_type):
"""process single item"""
if item["type"] == "playlist":
if expected_type and expected_type != "playlist":
raise TypeError(
f"expected {expected_type} url but got {item.get('type')}"
)
PlaylistSubscription().process_url_str([item])
return
if item["type"] == "video":
# extract channel id from video
video = YoutubeVideo(item["url"])
video.get_from_youtube()
video.process_youtube_meta()
channel_id = video.channel_id
elif item["type"] == "channel":
channel_id = item["url"]
else:
raise ValueError("failed to subscribe to: " + item["url"])
if expected_type and expected_type != "channel":
raise TypeError(
f"expected {expected_type} url but got {item.get('type')}"
)
self._subscribe(channel_id)
def _subscribe(self, channel_id):
"""subscribe to channel"""
_ = ChannelSubscription().change_subscribe(
channel_id, channel_subscribed=True
)
def _notify(self, idx, item, total):
"""send notification message to redis"""
subscribe_type = item["type"].title()
message_lines = [
f"Subscribe to {subscribe_type}",
f"Progress: {idx + 1}/{total}",
]
self.task.send_progress(message_lines, progress=(idx + 1) / total)

View File

@ -1,219 +0,0 @@
"""
functionality:
- base class to make all calls to yt-dlp
- handle yt-dlp errors
"""
from datetime import datetime
from http import cookiejar
from io import StringIO
import yt_dlp
from appsettings.src.config import AppConfig
from common.src.ta_redis import RedisArchivist
from django.conf import settings
class YtWrap:
"""wrap calls to yt"""
OBS_BASE = {
"default_search": "ytsearch",
"quiet": True,
"socket_timeout": 10,
"extractor_retries": 3,
"retries": 10,
}
def __init__(self, obs_request, config=False):
self.obs_request = obs_request
self.config = config
self.build_obs()
def build_obs(self):
"""build yt-dlp obs"""
self.obs = self.OBS_BASE.copy()
self.obs.update(self.obs_request)
if self.config:
self._add_cookie()
self._add_potoken()
if getattr(settings, "DEBUG", False):
del self.obs["quiet"]
print(self.obs)
def _add_cookie(self):
"""add cookie if enabled"""
if self.config["downloads"]["cookie_import"]:
cookie_io = CookieHandler(self.config).get()
self.obs["cookiefile"] = cookie_io
def _add_potoken(self):
"""add potoken if enabled"""
if self.config["downloads"].get("potoken"):
potoken = POTokenHandler(self.config).get()
self.obs.update(
{
"extractor_args": {
"youtube": {
"po_token": [potoken],
"player-client": ["web", "default"],
},
}
}
)
def download(self, url):
"""make download request"""
self.obs.update({"check_formats": "selected"})
with yt_dlp.YoutubeDL(self.obs) as ydl:
try:
ydl.download([url])
except yt_dlp.utils.DownloadError as err:
print(f"{url}: failed to download with message {err}")
if "Temporary failure in name resolution" in str(err):
raise ConnectionError("lost the internet, abort!") from err
return False, str(err)
self._validate_cookie()
return True, True
def extract(self, url):
"""make extract request"""
with yt_dlp.YoutubeDL(self.obs) as ydl:
try:
response = ydl.extract_info(url)
except cookiejar.LoadError as err:
print(f"cookie file is invalid: {err}")
return False
except yt_dlp.utils.ExtractorError as err:
print(f"{url}: failed to extract: {err}, continue...")
return False
except yt_dlp.utils.DownloadError as err:
if "This channel does not have a" in str(err):
return False
print(f"{url}: failed to get info from youtube: {err}")
if "Temporary failure in name resolution" in str(err):
raise ConnectionError("lost the internet, abort!") from err
return False
self._validate_cookie()
return response
def _validate_cookie(self):
"""check cookie and write it back for next use"""
if not self.obs.get("cookiefile"):
return
new_cookie = self.obs["cookiefile"].read()
old_cookie = RedisArchivist().get_message_str("cookie")
if new_cookie and old_cookie != new_cookie:
print("refreshed stored cookie")
RedisArchivist().set_message("cookie", new_cookie, save=True)
class CookieHandler:
"""handle youtube cookie for yt-dlp"""
def __init__(self, config):
self.cookie_io = False
self.config = config
def get(self):
"""get cookie io stream"""
cookie = RedisArchivist().get_message_str("cookie")
self.cookie_io = StringIO(cookie)
return self.cookie_io
def set_cookie(self, cookie):
"""set cookie str and activate in config"""
cookie_clean = cookie.strip("\x00")
RedisArchivist().set_message("cookie", cookie_clean, save=True)
AppConfig().update_config({"downloads": {"cookie_import": True}})
self.config["downloads"]["cookie_import"] = True
print("[cookie]: activated and stored in Redis")
@staticmethod
def revoke():
"""revoke cookie"""
RedisArchivist().del_message("cookie")
RedisArchivist().del_message("cookie:valid")
AppConfig().update_config({"downloads": {"cookie_import": False}})
print("[cookie]: revoked")
def validate(self):
"""validate cookie using the liked videos playlist"""
validation = RedisArchivist().get_message_dict("cookie:valid")
if validation:
print("[cookie]: used cached cookie validation")
return True
print("[cookie] validating cookie")
obs_request = {
"skip_download": True,
"extract_flat": True,
}
validator = YtWrap(obs_request, self.config)
response = bool(validator.extract("LL"))
self.store_validation(response)
# update in redis to avoid expiring
modified = validator.obs["cookiefile"].getvalue().strip("\x00")
if modified:
cookie_clean = modified.strip("\x00")
RedisArchivist().set_message("cookie", cookie_clean)
if not response:
mess_dict = {
"status": "message:download",
"level": "error",
"title": "Cookie validation failed, exiting...",
"message": "",
}
RedisArchivist().set_message(
"message:download", mess_dict, expire=4
)
print("[cookie]: validation failed, exiting...")
print(f"[cookie]: validation success: {response}")
return response
@staticmethod
def store_validation(response):
"""remember last validation"""
now = datetime.now()
message = {
"status": response,
"validated": int(now.timestamp()),
"validated_str": now.strftime("%Y-%m-%d %H:%M"),
}
RedisArchivist().set_message("cookie:valid", message, expire=3600)
class POTokenHandler:
"""handle po token"""
REDIS_KEY = "potoken"
def __init__(self, config):
self.config = config
def get(self) -> str | None:
"""get PO token"""
potoken = RedisArchivist().get_message_str(self.REDIS_KEY)
return potoken
def set_token(self, new_token: str) -> None:
"""set new PO token"""
RedisArchivist().set_message(self.REDIS_KEY, new_token)
AppConfig().update_config({"downloads": {"potoken": True}})
def revoke_token(self) -> None:
"""revoke token"""
RedisArchivist().del_message(self.REDIS_KEY)
AppConfig().update_config({"downloads": {"potoken": False}})

View File

@ -1,471 +0,0 @@
"""
functionality:
- handle yt_dlp
- build options and post processor
- download video files
- move to archive
"""
import os
import shutil
from datetime import datetime
from appsettings.src.config import AppConfig
from channel.src.index import YoutubeChannel
from common.src.env_settings import EnvironmentSettings
from common.src.es_connect import ElasticWrap, IndexPaginate
from common.src.helper import (
get_channel_overwrites,
ignore_filelist,
rand_sleep,
)
from common.src.ta_redis import RedisQueue
from download.src.queue import PendingList
from download.src.subscriptions import PlaylistSubscription
from download.src.yt_dlp_base import YtWrap
from playlist.src.index import YoutubePlaylist
from video.src.comments import CommentList
from video.src.constants import VideoTypeEnum
from video.src.index import YoutubeVideo, index_new_video
class DownloaderBase:
"""base class for shared config"""
CACHE_DIR = EnvironmentSettings.CACHE_DIR
MEDIA_DIR = EnvironmentSettings.MEDIA_DIR
CHANNEL_QUEUE = "download:channel"
PLAYLIST_QUEUE = "download:playlist:full"
PLAYLIST_QUICK = "download:playlist:quick"
VIDEO_QUEUE = "download:video"
def __init__(self, task):
self.task = task
self.config = AppConfig().config
self.channel_overwrites = get_channel_overwrites()
self.now = int(datetime.now().timestamp())
class VideoDownloader(DownloaderBase):
"""handle the video download functionality"""
def __init__(self, task=False):
super().__init__(task)
self.obs = False
self._build_obs()
def run_queue(self, auto_only=False) -> tuple[int, int]:
"""setup download queue in redis loop until no more items"""
downloaded = 0
failed = 0
while True:
video_data = self._get_next(auto_only)
if self.task.is_stopped() or not video_data:
self._reset_auto()
break
if downloaded > 0:
rand_sleep(self.config)
youtube_id = video_data["youtube_id"]
channel_id = video_data["channel_id"]
print(f"{youtube_id}: Downloading video")
self._notify(video_data, "Validate download format")
success = self._dl_single_vid(youtube_id, channel_id)
if not success:
failed += 1
continue
self._notify(video_data, "Add video metadata to index", progress=1)
video_type = VideoTypeEnum(video_data["vid_type"])
vid_dict = index_new_video(youtube_id, video_type=video_type)
RedisQueue(self.CHANNEL_QUEUE).add(channel_id)
RedisQueue(self.VIDEO_QUEUE).add(youtube_id)
self._notify(video_data, "Move downloaded file to archive")
self.move_to_archive(vid_dict)
self._delete_from_pending(youtube_id)
downloaded += 1
# post processing
DownloadPostProcess(self.task).run()
return downloaded, failed
def _notify(self, video_data, message, progress=False):
"""send progress notification to task"""
if not self.task:
return
typ = VideoTypeEnum(video_data["vid_type"]).value.rstrip("s").title()
title = video_data.get("title")
self.task.send_progress(
[f"Processing {typ}: {title}", message], progress=progress
)
def _get_next(self, auto_only):
"""get next item in queue"""
must_list = [{"term": {"status": {"value": "pending"}}}]
must_not_list = [{"exists": {"field": "message"}}]
if auto_only:
must_list.append({"term": {"auto_start": {"value": True}}})
data = {
"size": 1,
"query": {"bool": {"must": must_list, "must_not": must_not_list}},
"sort": [
{"auto_start": {"order": "desc"}},
{"timestamp": {"order": "asc"}},
],
}
path = "ta_download/_search"
response, _ = ElasticWrap(path).get(data=data)
if not response["hits"]["hits"]:
return False
return response["hits"]["hits"][0]["_source"]
def _progress_hook(self, response):
"""process the progress_hooks from yt_dlp"""
progress = False
try:
size = response.get("_total_bytes_str")
if size.strip() == "N/A":
size = response.get("_total_bytes_estimate_str", "N/A")
percent = response["_percent_str"]
progress = float(percent.strip("%")) / 100
speed = response["_speed_str"]
eta = response["_eta_str"]
message = f"{percent} of {size} at {speed} - time left: {eta}"
except KeyError:
message = "processing"
if self.task:
title = response["info_dict"]["title"]
self.task.send_progress([title, message], progress=progress)
def _build_obs(self):
"""collection to build all obs passed to yt-dlp"""
self._build_obs_basic()
self._build_obs_user()
self._build_obs_postprocessors()
def _build_obs_basic(self):
"""initial obs"""
self.obs = {
"merge_output_format": "mp4",
"outtmpl": (self.CACHE_DIR + "/download/%(id)s.mp4"),
"progress_hooks": [self._progress_hook],
"noprogress": True,
"continuedl": True,
"writethumbnail": False,
"noplaylist": True,
"color": "no_color",
}
def _build_obs_user(self):
"""build user customized options"""
if self.config["downloads"]["format"]:
self.obs["format"] = self.config["downloads"]["format"]
if self.config["downloads"]["format_sort"]:
format_sort = self.config["downloads"]["format_sort"]
format_sort_list = [i.strip() for i in format_sort.split(",")]
self.obs["format_sort"] = format_sort_list
if self.config["downloads"]["limit_speed"]:
self.obs["ratelimit"] = (
self.config["downloads"]["limit_speed"] * 1024
)
throttle = self.config["downloads"]["throttledratelimit"]
if throttle:
self.obs["throttledratelimit"] = throttle * 1024
def _build_obs_postprocessors(self):
"""add postprocessor to obs"""
postprocessors = []
if self.config["downloads"]["add_metadata"]:
postprocessors.append(
{
"key": "FFmpegMetadata",
"add_chapters": True,
"add_metadata": True,
}
)
postprocessors.append(
{
"key": "MetadataFromField",
"formats": [
"%(title)s:%(meta_title)s",
"%(uploader)s:%(meta_artist)s",
":(?P<album>)",
],
"when": "pre_process",
}
)
if self.config["downloads"]["add_thumbnail"]:
postprocessors.append(
{
"key": "EmbedThumbnail",
"already_have_thumbnail": True,
}
)
self.obs["writethumbnail"] = True
self.obs["postprocessors"] = postprocessors
def _set_overwrites(self, obs: dict, channel_id: str) -> None:
"""add overwrites to obs"""
overwrites = self.channel_overwrites.get(channel_id)
if overwrites and overwrites.get("download_format"):
obs["format"] = overwrites.get("download_format")
def _dl_single_vid(self, youtube_id: str, channel_id: str) -> bool:
"""download single video"""
obs = self.obs.copy()
self._set_overwrites(obs, channel_id)
dl_cache = os.path.join(self.CACHE_DIR, "download")
success, message = YtWrap(obs, self.config).download(youtube_id)
if not success:
self._handle_error(youtube_id, message)
if self.obs["writethumbnail"]:
# webp files don't get cleaned up automatically
all_cached = ignore_filelist(os.listdir(dl_cache))
to_clean = [i for i in all_cached if not i.endswith(".mp4")]
for file_name in to_clean:
file_path = os.path.join(dl_cache, file_name)
os.remove(file_path)
return success
@staticmethod
def _handle_error(youtube_id, message):
"""store error message"""
data = {"doc": {"message": message}}
_, _ = ElasticWrap(f"ta_download/_update/{youtube_id}").post(data=data)
def move_to_archive(self, vid_dict):
"""move downloaded video from cache to archive"""
host_uid = EnvironmentSettings.HOST_UID
host_gid = EnvironmentSettings.HOST_GID
# make folder
folder = os.path.join(
self.MEDIA_DIR, vid_dict["channel"]["channel_id"]
)
if not os.path.exists(folder):
os.makedirs(folder)
if host_uid and host_gid:
os.chown(folder, host_uid, host_gid)
# move media file
media_file = vid_dict["youtube_id"] + ".mp4"
old_path = os.path.join(self.CACHE_DIR, "download", media_file)
new_path = os.path.join(self.MEDIA_DIR, vid_dict["media_url"])
# move media file and fix permission
shutil.move(old_path, new_path, copy_function=shutil.copyfile)
if host_uid and host_gid:
os.chown(new_path, host_uid, host_gid)
@staticmethod
def _delete_from_pending(youtube_id):
"""delete downloaded video from pending index if its there"""
path = f"ta_download/_doc/{youtube_id}?refresh=true"
_, _ = ElasticWrap(path).delete()
def _reset_auto(self):
"""reset autostart to defaults after queue stop"""
path = "ta_download/_update_by_query"
data = {
"query": {"term": {"auto_start": {"value": True}}},
"script": {
"source": "ctx._source.auto_start = false",
"lang": "painless",
},
}
response, _ = ElasticWrap(path).post(data=data)
updated = response.get("updated")
if updated:
print(f"[download] reset auto start on {updated} videos.")
class DownloadPostProcess(DownloaderBase):
"""handle task to run after download queue finishes"""
def run(self):
"""run all functions"""
self.auto_delete_all()
self.auto_delete_overwrites()
self.refresh_playlist()
self.match_videos()
self.get_comments()
def auto_delete_all(self):
"""handle auto delete"""
autodelete_days = self.config["downloads"]["autodelete_days"]
if not autodelete_days:
return
print(f"auto delete older than {autodelete_days} days")
now_lte = str(self.now - autodelete_days * 24 * 60 * 60)
channel_overwrite = "channel.channel_overwrites.autodelete_days"
data = {
"query": {
"bool": {
"must": [
{"range": {"player.watched_date": {"lte": now_lte}}},
{"term": {"player.watched": True}},
],
"must_not": [
{"exists": {"field": channel_overwrite}},
],
}
},
"sort": [{"player.watched_date": {"order": "asc"}}],
}
self._auto_delete_watched(data)
def auto_delete_overwrites(self):
"""handle per channel auto delete from overwrites"""
for channel_id, value in self.channel_overwrites.items():
if "autodelete_days" in value:
autodelete_days = value.get("autodelete_days")
print(f"{channel_id}: delete older than {autodelete_days}d")
now_lte = str(self.now - autodelete_days * 24 * 60 * 60)
must_list = [
{"range": {"player.watched_date": {"lte": now_lte}}},
{"term": {"channel.channel_id": {"value": channel_id}}},
{"term": {"player.watched": True}},
]
data = {
"query": {"bool": {"must": must_list}},
"sort": [{"player.watched_date": {"order": "desc"}}],
}
self._auto_delete_watched(data)
@staticmethod
def _auto_delete_watched(data):
"""delete watched videos after x days"""
to_delete = IndexPaginate("ta_video", data).get_results()
if not to_delete:
return
for video in to_delete:
youtube_id = video["youtube_id"]
print(f"{youtube_id}: auto delete video")
YoutubeVideo(youtube_id).delete_media_file()
print("add deleted to ignore list")
vids = [{"type": "video", "url": i["youtube_id"]} for i in to_delete]
pending = PendingList(youtube_ids=vids)
pending.parse_url_list()
_ = pending.add_to_pending(status="ignore")
def refresh_playlist(self) -> None:
"""match videos with playlists"""
self.add_playlists_to_refresh()
queue = RedisQueue(self.PLAYLIST_QUEUE)
while True:
total = queue.max_score()
playlist_id, idx = queue.get_next()
if not playlist_id or not idx or not total:
break
playlist = YoutubePlaylist(playlist_id)
playlist.update_playlist(skip_on_empty=True)
if not self.task:
continue
channel_name = playlist.json_data["playlist_channel"]
playlist_title = playlist.json_data["playlist_name"]
message = [
f"Post Processing Playlists for: {channel_name}",
f"{playlist_title} [{idx}/{total}]",
]
progress = idx / total
self.task.send_progress(message, progress=progress)
rand_sleep(self.config)
def add_playlists_to_refresh(self) -> None:
"""add playlists to refresh"""
if self.task:
message = ["Post Processing Playlists", "Scanning for Playlists"]
self.task.send_progress(message)
self._add_playlist_sub()
self._add_channel_playlists()
self._add_video_playlists()
def _add_playlist_sub(self):
"""add subscribed playlists to refresh"""
subs = PlaylistSubscription().get_playlists()
to_add = [i["playlist_id"] for i in subs]
RedisQueue(self.PLAYLIST_QUEUE).add_list(to_add)
def _add_channel_playlists(self):
"""add playlists from channels to refresh"""
queue = RedisQueue(self.CHANNEL_QUEUE)
while True:
channel_id, _ = queue.get_next()
if not channel_id:
break
channel = YoutubeChannel(channel_id)
channel.get_from_es()
overwrites = channel.get_overwrites()
if "index_playlists" in overwrites:
channel.get_all_playlists()
to_add = [i[0] for i in channel.all_playlists]
RedisQueue(self.PLAYLIST_QUEUE).add_list(to_add)
def _add_video_playlists(self):
"""add other playlists for quick sync"""
all_playlists = RedisQueue(self.PLAYLIST_QUEUE).get_all()
must_not = [{"terms": {"playlist_id": all_playlists}}]
video_ids = RedisQueue(self.VIDEO_QUEUE).get_all()
must = [{"terms": {"playlist_entries.youtube_id": video_ids}}]
data = {
"query": {"bool": {"must_not": must_not, "must": must}},
"_source": ["playlist_id"],
}
playlists = IndexPaginate("ta_playlist", data).get_results()
to_add = [i["playlist_id"] for i in playlists]
RedisQueue(self.PLAYLIST_QUICK).add_list(to_add)
def match_videos(self) -> None:
"""scan rest of indexed playlists to match videos"""
queue = RedisQueue(self.PLAYLIST_QUICK)
while True:
total = queue.max_score()
playlist_id, idx = queue.get_next()
if not playlist_id or not idx or not total:
break
playlist = YoutubePlaylist(playlist_id)
playlist.get_from_es()
playlist.add_vids_to_playlist()
playlist.remove_vids_from_playlist()
if not self.task:
continue
message = [
"Post Processing Playlists.",
f"Validate Playlists: - {idx}/{total}",
]
progress = idx / total
self.task.send_progress(message, progress=progress)
def get_comments(self):
"""get comments from youtube"""
video_queue = RedisQueue(self.VIDEO_QUEUE)
comment_list = CommentList(task=self.task)
comment_list.add(video_ids=video_queue.get_all())
video_queue.clear()
comment_list.index()

View File

@ -1,18 +0,0 @@
"""all download API urls"""
from django.urls import path
from download import views
urlpatterns = [
path("", views.DownloadApiListView.as_view(), name="api-download-list"),
path(
"aggs/",
views.DownloadAggsApiView.as_view(),
name="api-download-aggs",
),
path(
"<slug:video_id>/",
views.DownloadApiView.as_view(),
name="api-download",
),
]

View File

@ -1,292 +0,0 @@
"""all download API views"""
from common.serializers import (
AsyncTaskResponseSerializer,
ErrorResponseSerializer,
)
from common.views_base import AdminOnly, ApiBaseView
from download.serializers import (
AddToDownloadListSerializer,
AddToDownloadQuerySerializer,
DownloadAggsSerializer,
DownloadItemSerializer,
DownloadListQuerySerializer,
DownloadListQueueDeleteQuerySerializer,
DownloadListSerializer,
DownloadQueueItemUpdateSerializer,
)
from download.src.queue import PendingInteract
from drf_spectacular.utils import OpenApiResponse, extend_schema
from rest_framework.response import Response
from task.tasks import download_pending, extrac_dl
class DownloadApiListView(ApiBaseView):
"""resolves to /api/download/
GET: returns latest videos in the download queue
POST: add a list of videos to download queue
DELETE: remove items based on query filter
"""
search_base = "ta_download/_search/"
valid_filter = ["pending", "ignore"]
permission_classes = [AdminOnly]
@extend_schema(
responses={
200: OpenApiResponse(DownloadListSerializer()),
},
parameters=[DownloadListQuerySerializer()],
)
def get(self, request):
"""get download queue list"""
query_filter = request.GET.get("filter", False)
self.data.update(
{
"sort": [
{"auto_start": {"order": "desc"}},
{"timestamp": {"order": "asc"}},
],
}
)
serializer = DownloadListQuerySerializer(data=request.query_params)
serializer.is_valid(raise_exception=True)
validated_data = serializer.validated_data
must_list = []
query_filter = validated_data.get("filter")
if query_filter:
must_list.append({"term": {"status": {"value": query_filter}}})
filter_channel = validated_data.get("channel")
if filter_channel:
must_list.append(
{"term": {"channel_id": {"value": filter_channel}}}
)
self.data["query"] = {"bool": {"must": must_list}}
self.get_document_list(request)
serializer = DownloadListSerializer(self.response)
return Response(serializer.data)
@staticmethod
@extend_schema(
request=AddToDownloadListSerializer(),
parameters=[AddToDownloadQuerySerializer()],
responses={
200: OpenApiResponse(
AsyncTaskResponseSerializer(),
description="New async task started",
),
400: OpenApiResponse(
ErrorResponseSerializer(), description="Bad request"
),
},
)
def post(request):
"""add list of videos to download queue"""
data_serializer = AddToDownloadListSerializer(data=request.data)
data_serializer.is_valid(raise_exception=True)
validated_data = data_serializer.validated_data
query_serializer = AddToDownloadQuerySerializer(
data=request.query_params
)
query_serializer.is_valid(raise_exception=True)
validated_query = query_serializer.validated_data
auto_start = validated_query.get("autostart")
print(f"auto_start: {auto_start}")
to_add = validated_data["data"]
pending = [i["youtube_id"] for i in to_add if i["status"] == "pending"]
url_str = " ".join(pending)
task = extrac_dl.delay(url_str, auto_start=auto_start)
message = {
"message": "add to queue task started",
"task_id": task.id,
}
response_serializer = AsyncTaskResponseSerializer(message)
return Response(response_serializer.data)
@extend_schema(
parameters=[DownloadListQueueDeleteQuerySerializer()],
responses={
204: OpenApiResponse(description="Download items deleted"),
400: OpenApiResponse(
ErrorResponseSerializer(), description="Bad request"
),
},
)
def delete(self, request):
"""bulk delete download queue items by filter"""
serializer = DownloadListQueueDeleteQuerySerializer(
data=request.query_params
)
serializer.is_valid(raise_exception=True)
validated_query = serializer.validated_data
query_filter = validated_query["filter"]
message = f"delete queue by status: {query_filter}"
print(message)
PendingInteract(status=query_filter).delete_by_status()
return Response(status=204)
class DownloadApiView(ApiBaseView):
"""resolves to /api/download/<video_id>/
GET: returns metadata dict of an item in the download queue
POST: update status of item to pending or ignore
DELETE: forget from download queue
"""
search_base = "ta_download/_doc/"
valid_status = ["pending", "ignore", "ignore-force", "priority"]
permission_classes = [AdminOnly]
@extend_schema(
responses={
200: OpenApiResponse(DownloadItemSerializer()),
404: OpenApiResponse(
ErrorResponseSerializer(),
description="Download item not found",
),
},
)
def get(self, request, video_id):
# pylint: disable=unused-argument
"""get download queue item"""
self.get_document(video_id)
if not self.response:
error = ErrorResponseSerializer(
{"error": "Download item not found"}
)
return Response(error.data, status=404)
response_serializer = DownloadItemSerializer(self.response)
return Response(response_serializer.data, status=self.status_code)
@extend_schema(
request=DownloadQueueItemUpdateSerializer(),
responses={
200: OpenApiResponse(
DownloadQueueItemUpdateSerializer(),
description="Download item update",
),
400: OpenApiResponse(
ErrorResponseSerializer(), description="Bad request"
),
404: OpenApiResponse(
ErrorResponseSerializer(),
description="Download item not found",
),
},
)
def post(self, request, video_id):
"""post to video to change status"""
data_serializer = DownloadQueueItemUpdateSerializer(data=request.data)
data_serializer.is_valid(raise_exception=True)
validated_data = data_serializer.validated_data
item_status = validated_data["status"]
if item_status == "ignore-force":
extrac_dl.delay(video_id, status="ignore")
return Response(data_serializer.data)
_, status_code = PendingInteract(video_id).get_item()
if status_code == 404:
error = ErrorResponseSerializer(
{"error": "Download item not found"}
)
return Response(error.data, status=404)
print(f"{video_id}: change status to {item_status}")
PendingInteract(video_id, item_status).update_status()
if item_status == "priority":
download_pending.delay(auto_only=True)
return Response(data_serializer.data)
@staticmethod
@extend_schema(
responses={
204: OpenApiResponse(description="delete download item"),
404: OpenApiResponse(
ErrorResponseSerializer(),
description="Download item not found",
),
},
)
def delete(request, video_id):
# pylint: disable=unused-argument
"""delete single video from queue"""
print(f"{video_id}: delete from queue")
PendingInteract(video_id).delete_item()
return Response(status=204)
class DownloadAggsApiView(ApiBaseView):
"""resolves to /api/download/aggs/
GET: get download aggregations
"""
search_base = "ta_download/_search"
valid_filter_view = ["ignore", "pending"]
@extend_schema(
parameters=[DownloadListQueueDeleteQuerySerializer()],
responses={
200: OpenApiResponse(DownloadAggsSerializer()),
400: OpenApiResponse(
ErrorResponseSerializer(), description="bad request"
),
},
)
def get(self, request):
"""get aggs"""
serializer = DownloadListQueueDeleteQuerySerializer(
data=request.query_params
)
serializer.is_valid(raise_exception=True)
validated_query = serializer.validated_data
filter_view = validated_query.get("filter")
if filter_view:
if filter_view not in self.valid_filter_view:
message = f"invalid filter: {filter_view}"
return Response({"message": message}, status=400)
self.data.update(
{
"query": {"term": {"status": {"value": filter_view}}},
}
)
self.data.update(
{
"aggs": {
"channel_downloads": {
"multi_terms": {
"size": 30,
"terms": [
{"field": "channel_name.keyword"},
{"field": "channel_id"},
],
"order": {"_count": "desc"},
}
}
}
}
)
self.get_aggs()
serializer = DownloadAggsSerializer(self.response["channel_downloads"])
return Response(serializer.data)

View File

@ -1,92 +0,0 @@
"""playlist serializers"""
# pylint: disable=abstract-method
from common.serializers import PaginationSerializer
from rest_framework import serializers
class PlaylistEntrySerializer(serializers.Serializer):
"""serialize single playlist entry"""
youtube_id = serializers.CharField()
title = serializers.CharField()
uploader = serializers.CharField()
idx = serializers.IntegerField()
downloaded = serializers.BooleanField()
class PlaylistSerializer(serializers.Serializer):
"""serialize playlist"""
playlist_active = serializers.BooleanField()
playlist_channel = serializers.CharField()
playlist_channel_id = serializers.CharField()
playlist_description = serializers.CharField()
playlist_entries = PlaylistEntrySerializer(many=True)
playlist_id = serializers.CharField()
playlist_last_refresh = serializers.CharField()
playlist_name = serializers.CharField()
playlist_subscribed = serializers.BooleanField()
playlist_thumbnail = serializers.CharField()
playlist_type = serializers.ChoiceField(choices=["regular", "custom"])
_index = serializers.CharField(required=False)
_score = serializers.IntegerField(required=False)
class PlaylistListSerializer(serializers.Serializer):
"""serialize list of playlists"""
data = PlaylistSerializer(many=True)
paginate = PaginationSerializer()
class PlaylistListQuerySerializer(serializers.Serializer):
"""serialize playlist list query params"""
channel = serializers.CharField(required=False)
subscribed = serializers.BooleanField(required=False)
type = serializers.ChoiceField(
choices=["regular", "custom"], required=False
)
page = serializers.IntegerField(required=False)
class PlaylistSingleAddSerializer(serializers.Serializer):
"""single item to add"""
playlist_id = serializers.CharField()
playlist_subscribed = serializers.ChoiceField(choices=[True])
class PlaylistBulkAddSerializer(serializers.Serializer):
"""bulk add playlists serializers"""
data = PlaylistSingleAddSerializer(many=True)
class PlaylistSingleUpdate(serializers.Serializer):
"""update state of single playlist"""
playlist_subscribed = serializers.BooleanField()
class PlaylistListCustomPostSerializer(serializers.Serializer):
"""serialize list post custom playlist"""
playlist_name = serializers.CharField()
class PlaylistCustomPostSerializer(serializers.Serializer):
"""serialize playlist custom action"""
action = serializers.ChoiceField(
choices=["create", "remove", "up", "down", "top", "bottom"]
)
video_id = serializers.CharField()
class PlaylistDeleteQuerySerializer(serializers.Serializer):
"""serialize playlist delete query params"""
delete_videos = serializers.BooleanField(required=False)

View File

@ -1,10 +0,0 @@
"""playlist constants"""
import enum
class PlaylistTypesEnum(enum.Enum):
"""all playlist_type options"""
REGULAR = "regular"
CUSTOM = "custom"

View File

@ -1,444 +0,0 @@
"""
functionality:
- get metadata from youtube for a playlist
- index and update in es
"""
import json
from datetime import datetime
from common.src.env_settings import EnvironmentSettings
from common.src.es_connect import ElasticWrap, IndexPaginate
from common.src.index_generic import YouTubeItem
from download.src.thumbnails import ThumbManager
from video.src import index as ta_video
class YoutubePlaylist(YouTubeItem):
"""represents a single youtube playlist"""
es_path = False
index_name = "ta_playlist"
yt_obs = {
"extract_flat": True,
"allow_playlist_files": True,
}
yt_base = "https://www.youtube.com/playlist?list="
def __init__(self, youtube_id):
super().__init__(youtube_id)
self.all_members = False
self.nav = False
def build_json(self, scrape=False):
"""collection to create json_data"""
self.get_from_es()
if self.json_data:
subscribed = self.json_data.get("playlist_subscribed")
else:
subscribed = False
if scrape or not self.json_data:
self.get_from_youtube()
if not self.youtube_meta:
self.json_data = False
return
self.process_youtube_meta()
self._ensure_channel()
ids_found = self.get_local_vids()
self.get_entries(ids_found)
self.json_data["playlist_entries"] = self.all_members
self.json_data["playlist_subscribed"] = subscribed
def process_youtube_meta(self):
"""extract relevant fields from youtube"""
try:
playlist_thumbnail = self.youtube_meta["thumbnails"][-1]["url"]
except IndexError:
print(f"{self.youtube_id}: thumbnail extraction failed")
playlist_thumbnail = False
self.json_data = {
"playlist_id": self.youtube_id,
"playlist_active": True,
"playlist_name": self.youtube_meta["title"],
"playlist_channel": self.youtube_meta["channel"],
"playlist_channel_id": self.youtube_meta["channel_id"],
"playlist_thumbnail": playlist_thumbnail,
"playlist_description": self.youtube_meta["description"] or False,
"playlist_last_refresh": int(datetime.now().timestamp()),
"playlist_type": "regular",
}
def _ensure_channel(self):
"""make sure channel is indexed"""
from channel.src.index import YoutubeChannel
channel_id = self.json_data["playlist_channel_id"]
channel_handler = YoutubeChannel(channel_id)
channel_handler.build_json(upload=True)
def get_local_vids(self) -> list[str]:
"""get local video ids from youtube entries"""
entries = self.youtube_meta["entries"]
data = {
"query": {"terms": {"youtube_id": [i["id"] for i in entries]}},
"_source": ["youtube_id"],
}
indexed_vids = IndexPaginate("ta_video", data).get_results()
ids_found = [i["youtube_id"] for i in indexed_vids]
return ids_found
def get_entries(self, ids_found) -> None:
"""get all videos in playlist, match downloaded with ids_found"""
all_members = []
for idx, entry in enumerate(self.youtube_meta["entries"]):
to_append = {
"youtube_id": entry["id"],
"title": entry["title"],
"uploader": entry.get("channel"),
"idx": idx,
"downloaded": entry["id"] in ids_found,
}
all_members.append(to_append)
self.all_members = all_members
def get_playlist_art(self):
"""download artwork of playlist"""
url = self.json_data["playlist_thumbnail"]
ThumbManager(self.youtube_id, item_type="playlist").download(url)
def add_vids_to_playlist(self):
"""sync the playlist id to videos"""
script = (
'if (!ctx._source.containsKey("playlist")) '
+ "{ctx._source.playlist = [params.playlist]} "
+ "else if (!ctx._source.playlist.contains(params.playlist)) "
+ "{ctx._source.playlist.add(params.playlist)} "
+ "else {ctx.op = 'none'}"
)
bulk_list = []
for entry in self.json_data["playlist_entries"]:
video_id = entry["youtube_id"]
action = {"update": {"_id": video_id, "_index": "ta_video"}}
source = {
"script": {
"source": script,
"lang": "painless",
"params": {"playlist": self.youtube_id},
}
}
bulk_list.append(json.dumps(action))
bulk_list.append(json.dumps(source))
# add last newline
bulk_list.append("\n")
query_str = "\n".join(bulk_list)
ElasticWrap("_bulk").post(query_str, ndjson=True)
def remove_vids_from_playlist(self):
"""remove playlist ids from videos if needed"""
needed = [i["youtube_id"] for i in self.json_data["playlist_entries"]]
data = {
"query": {"match": {"playlist": self.youtube_id}},
"_source": ["youtube_id"],
}
data = {
"query": {"term": {"playlist.keyword": {"value": self.youtube_id}}}
}
result = IndexPaginate("ta_video", data).get_results()
to_remove = [
i["youtube_id"] for i in result if i["youtube_id"] not in needed
]
s = "ctx._source.playlist.removeAll(Collections.singleton(params.rm))"
for video_id in to_remove:
query = {
"script": {
"source": s,
"lang": "painless",
"params": {"rm": self.youtube_id},
},
"query": {"match": {"youtube_id": video_id}},
}
path = "ta_video/_update_by_query"
_, status_code = ElasticWrap(path).post(query)
if status_code == 200:
print(f"{self.youtube_id}: removed {video_id} from playlist")
def update_playlist(self, skip_on_empty=False):
"""update metadata for playlist with data from YouTube"""
self.build_json(scrape=True)
if not self.json_data:
# return false to deactivate
return False
if skip_on_empty:
has_item_downloaded = any(
i["downloaded"] for i in self.json_data["playlist_entries"]
)
if not has_item_downloaded:
return True
self.upload_to_es()
self.add_vids_to_playlist()
self.remove_vids_from_playlist()
self.get_playlist_art()
return True
def build_nav(self, youtube_id):
"""find next and previous in playlist of a given youtube_id"""
cache_root = EnvironmentSettings().get_cache_root()
all_entries_available = self.json_data["playlist_entries"]
all_entries = [i for i in all_entries_available if i["downloaded"]]
current = [i for i in all_entries if i["youtube_id"] == youtube_id]
# stop if not found or playlist of 1
if not current or not len(all_entries) > 1:
return
current_idx = all_entries.index(current[0])
if current_idx == 0:
previous_item = None
else:
previous_item = all_entries[current_idx - 1]
prev_id = previous_item["youtube_id"]
prev_thumb_path = ThumbManager(prev_id).vid_thumb_path()
previous_item["vid_thumb"] = f"{cache_root}/{prev_thumb_path}"
if current_idx == len(all_entries) - 1:
next_item = None
else:
next_item = all_entries[current_idx + 1]
next_id = next_item["youtube_id"]
next_thumb_path = ThumbManager(next_id).vid_thumb_path()
next_item["vid_thumb"] = f"{cache_root}/{next_thumb_path}"
self.nav = {
"playlist_meta": {
"current_idx": current[0]["idx"],
"playlist_id": self.youtube_id,
"playlist_name": self.json_data["playlist_name"],
"playlist_channel": self.json_data["playlist_channel"],
},
"playlist_previous": previous_item,
"playlist_next": next_item,
}
return
def delete_metadata(self):
"""delete metadata for playlist"""
self.delete_videos_metadata()
script = (
"ctx._source.playlist.removeAll("
+ "Collections.singleton(params.playlist)) "
)
data = {
"query": {
"term": {"playlist.keyword": {"value": self.youtube_id}}
},
"script": {
"source": script,
"lang": "painless",
"params": {"playlist": self.youtube_id},
},
}
_, _ = ElasticWrap("ta_video/_update_by_query").post(data)
self.del_in_es()
def is_custom_playlist(self):
self.get_from_es()
return self.json_data["playlist_type"] == "custom"
def delete_videos_metadata(self, channel_id=None):
"""delete video metadata for a specific channel"""
self.get_from_es()
playlist = self.json_data["playlist_entries"]
i = 0
while i < len(playlist):
video_id = playlist[i]["youtube_id"]
video = ta_video.YoutubeVideo(video_id)
video.get_from_es()
if (
channel_id is None
or video.json_data["channel"]["channel_id"] == channel_id
):
playlist.pop(i)
self.remove_playlist_from_video(video_id)
i -= 1
i += 1
self.set_playlist_thumbnail()
self.upload_to_es()
def delete_videos_playlist(self):
"""delete playlist with all videos"""
print(f"{self.youtube_id}: delete playlist")
self.get_from_es()
all_youtube_id = [
i["youtube_id"]
for i in self.json_data["playlist_entries"]
if i["downloaded"]
]
for youtube_id in all_youtube_id:
ta_video.YoutubeVideo(youtube_id).delete_media_file()
self.delete_metadata()
def create(self, name):
self.json_data = {
"playlist_id": self.youtube_id,
"playlist_active": False,
"playlist_name": name,
"playlist_last_refresh": int(datetime.now().timestamp()),
"playlist_entries": [],
"playlist_type": "custom",
"playlist_channel": None,
"playlist_channel_id": None,
"playlist_description": False,
"playlist_thumbnail": False,
"playlist_subscribed": False,
}
self.upload_to_es()
self.get_playlist_art()
return True
def add_video_to_playlist(self, video_id):
self.get_from_es()
video_metadata = self.get_video_metadata(video_id)
video_metadata["idx"] = len(self.json_data["playlist_entries"])
if not self.playlist_entries_contains(video_id):
self.json_data["playlist_entries"].append(video_metadata)
self.json_data["playlist_last_refresh"] = int(
datetime.now().timestamp()
)
self.set_playlist_thumbnail()
self.upload_to_es()
video = ta_video.YoutubeVideo(video_id)
video.get_from_es()
if "playlist" not in video.json_data:
video.json_data["playlist"] = []
video.json_data["playlist"].append(self.youtube_id)
video.upload_to_es()
return True
def remove_playlist_from_video(self, video_id):
video = ta_video.YoutubeVideo(video_id)
video.get_from_es()
if video.json_data is not None and "playlist" in video.json_data:
video.json_data["playlist"].remove(self.youtube_id)
video.upload_to_es()
def move_video(self, video_id, action, hide_watched=False):
self.get_from_es()
video_index = self.get_video_index(video_id)
playlist = self.json_data["playlist_entries"]
item = playlist[video_index]
playlist.pop(video_index)
if action == "remove":
self.remove_playlist_from_video(item["youtube_id"])
else:
if action == "up":
while True:
video_index = max(0, video_index - 1)
if (
not hide_watched
or video_index == 0
or (
not self.get_video_is_watched(
playlist[video_index]["youtube_id"]
)
)
):
break
elif action == "down":
while True:
video_index = min(len(playlist), video_index + 1)
if (
not hide_watched
or video_index == len(playlist)
or (
not self.get_video_is_watched(
playlist[video_index - 1]["youtube_id"]
)
)
):
break
elif action == "top":
video_index = 0
else:
video_index = len(playlist)
playlist.insert(video_index, item)
self.json_data["playlist_last_refresh"] = int(
datetime.now().timestamp()
)
for i, item in enumerate(playlist):
item["idx"] = i
self.set_playlist_thumbnail()
self.upload_to_es()
return True
def del_video(self, video_id):
playlist = self.json_data["playlist_entries"]
i = 0
while i < len(playlist):
if video_id == playlist[i]["youtube_id"]:
playlist.pop(i)
self.set_playlist_thumbnail()
i -= 1
i += 1
def get_video_index(self, video_id):
for i, child in enumerate(self.json_data["playlist_entries"]):
if child["youtube_id"] == video_id:
return i
return -1
def playlist_entries_contains(self, video_id):
return (
len(
list(
filter(
lambda x: x["youtube_id"] == video_id,
self.json_data["playlist_entries"],
)
)
)
> 0
)
def get_video_is_watched(self, video_id):
video = ta_video.YoutubeVideo(video_id)
video.get_from_es()
return video.json_data["player"]["watched"]
def set_playlist_thumbnail(self):
playlist = self.json_data["playlist_entries"]
self.json_data["playlist_thumbnail"] = False
for video in playlist:
url = ThumbManager(video["youtube_id"]).vid_thumb_path()
if url is not None:
self.json_data["playlist_thumbnail"] = url
break
self.get_playlist_art()
def get_video_metadata(self, video_id):
video = ta_video.YoutubeVideo(video_id)
video.get_from_es()
video_json_data = {
"youtube_id": video.json_data["youtube_id"],
"title": video.json_data["title"],
"uploader": video.json_data["channel"]["channel_name"],
"idx": 0,
"downloaded": "date_downloaded" in video.json_data
and video.json_data["date_downloaded"] > 0,
}
return video_json_data

View File

@ -1,52 +0,0 @@
"""build query for playlists"""
from playlist.src.constants import PlaylistTypesEnum
class QueryBuilder:
"""contain functionality"""
def __init__(self, **kwargs):
self.request_params = kwargs
def build_data(self) -> dict:
"""build data dict"""
data = {}
data["query"] = self.build_query()
if sort := self.parse_sort():
data.update(sort)
return data
def build_query(self) -> dict:
"""build query key"""
must_list = []
channel = self.request_params.get("channel")
if channel:
must_list.append({"match": {"playlist_channel_id": channel}})
subscribed = self.request_params.get("subscribed")
if subscribed:
must_list.append({"match": {"playlist_subscribed": subscribed}})
playlist_type = self.request_params.get("type")
if playlist_type:
type_list = self.parse_type(playlist_type)
must_list.append(type_list)
query = {"bool": {"must": must_list}}
return query
def parse_type(self, playlist_type: str) -> dict:
"""parse playlist type"""
if not hasattr(PlaylistTypesEnum, playlist_type.upper()):
raise ValueError(f"'{playlist_type}' not in PlaylistTypesEnum")
type_parsed = getattr(PlaylistTypesEnum, playlist_type.upper()).value
return {"match": {"playlist_type.keyword": type_parsed}}
def parse_sort(self) -> dict:
"""return sort"""
return {"sort": [{"playlist_name.keyword": {"order": "asc"}}]}

View File

@ -1,30 +0,0 @@
"""test playlist query building"""
import pytest
from playlist.src.query_building import QueryBuilder
def test_build_data():
"""test for correct key building"""
qb = QueryBuilder(
channel="test_channel",
subscribed=True,
type="regular",
)
result = qb.build_data()
must_list = result["query"]["bool"]["must"]
assert "query" in result
assert "sort" in result
assert result["sort"] == [{"playlist_name.keyword": {"order": "asc"}}]
assert {"match": {"playlist_channel_id": "test_channel"}} in must_list
assert {"match": {"playlist_subscribed": True}} in must_list
def test_parse_type():
"""validate type"""
qb = QueryBuilder(type="regular")
with pytest.raises(ValueError):
qb.parse_type("invalid")
result = qb.parse_type("custom")
assert result == {"match": {"playlist_type.keyword": "custom"}}

View File

@ -1,27 +0,0 @@
"""all playlist API urls"""
from django.urls import path
from playlist import views
urlpatterns = [
path(
"",
views.PlaylistApiListView.as_view(),
name="api-playlist-list",
),
path(
"custom/",
views.PlaylistCustomApiListView.as_view(),
name="api-custom-playlist-list",
),
path(
"custom/<slug:playlist_id>/",
views.PlaylistCustomApiView.as_view(),
name="api-custom-playlist",
),
path(
"<slug:playlist_id>/",
views.PlaylistApiView.as_view(),
name="api-playlist",
),
]

View File

@ -1,273 +0,0 @@
"""all playlist API views"""
import uuid
from common.serializers import (
AsyncTaskResponseSerializer,
ErrorResponseSerializer,
)
from common.views_base import AdminWriteOnly, ApiBaseView
from download.src.subscriptions import PlaylistSubscription
from drf_spectacular.utils import OpenApiResponse, extend_schema
from playlist.serializers import (
PlaylistBulkAddSerializer,
PlaylistCustomPostSerializer,
PlaylistDeleteQuerySerializer,
PlaylistListCustomPostSerializer,
PlaylistListQuerySerializer,
PlaylistListSerializer,
PlaylistSerializer,
PlaylistSingleUpdate,
)
from playlist.src.index import YoutubePlaylist
from playlist.src.query_building import QueryBuilder
from rest_framework.response import Response
from task.tasks import subscribe_to
from user.src.user_config import UserConfig
class PlaylistApiListView(ApiBaseView):
"""resolves to /api/playlist/
GET: returns list of indexed playlists
params:
- channel:str=<channel-id>
- subscribed: bool
- type:enum=regular|custom
POST: change subscribe state
"""
search_base = "ta_playlist/_search/"
permission_classes = [AdminWriteOnly]
@extend_schema(
responses={
200: OpenApiResponse(PlaylistListSerializer()),
400: OpenApiResponse(
ErrorResponseSerializer(), description="Bad request"
),
},
parameters=[PlaylistListQuerySerializer],
)
def get(self, request):
"""get playlist list"""
query_serializer = PlaylistListQuerySerializer(
data=request.query_params
)
query_serializer.is_valid(raise_exception=True)
validated_query = query_serializer.validated_data
try:
data = QueryBuilder(**validated_query).build_data()
except ValueError as err:
error = ErrorResponseSerializer({"error": str(err)})
return Response(error.data, status=400)
self.data = data
self.get_document_list(request)
response_serializer = PlaylistListSerializer(self.response)
return Response(response_serializer.data)
@extend_schema(
request=PlaylistBulkAddSerializer(),
responses={
200: OpenApiResponse(AsyncTaskResponseSerializer()),
400: OpenApiResponse(
ErrorResponseSerializer(), description="Bad request"
),
},
)
def post(self, request):
"""async subscribe to list of playlists"""
data_serializer = PlaylistBulkAddSerializer(data=request.data)
data_serializer.is_valid(raise_exception=True)
validated_data = data_serializer.validated_data
pending = [i["playlist_id"] for i in validated_data["data"]]
if not pending:
error = ErrorResponseSerializer({"error": "nothing to subscribe"})
return Response(error.data, status=400)
url_str = " ".join(pending)
task = subscribe_to.delay(url_str, expected_type="playlist")
message = {
"message": "playlist subscribe task started",
"task_id": task.id,
}
serializer = AsyncTaskResponseSerializer(message)
return Response(serializer.data)
class PlaylistCustomApiListView(ApiBaseView):
"""resolves to /api/playlist/custom/
POST: Create new custom playlist
"""
search_base = "ta_playlist/_search/"
permission_classes = [AdminWriteOnly]
@extend_schema(
request=PlaylistListCustomPostSerializer(),
responses={
200: OpenApiResponse(PlaylistSerializer()),
400: OpenApiResponse(
ErrorResponseSerializer(), description="Bad request"
),
},
)
def post(self, request):
"""create new custom playlist"""
serializer = PlaylistListCustomPostSerializer(data=request.data)
serializer.is_valid(raise_exception=True)
validated_data = serializer.validated_data
custom_name = validated_data["playlist_name"]
playlist_id = f"TA_playlist_{uuid.uuid4()}"
custom_playlist = YoutubePlaylist(playlist_id)
custom_playlist.create(custom_name)
response_serializer = PlaylistSerializer(custom_playlist.json_data)
return Response(response_serializer.data)
class PlaylistCustomApiView(ApiBaseView):
"""resolves to /api/playlist/custom/<playlist_id>/
POST: modify custom playlist
"""
search_base = "ta_playlist/_doc/"
permission_classes = [AdminWriteOnly]
@extend_schema(
request=PlaylistCustomPostSerializer(),
responses={
200: OpenApiResponse(PlaylistSerializer()),
400: OpenApiResponse(
ErrorResponseSerializer(), description="bad request"
),
404: OpenApiResponse(
ErrorResponseSerializer(), description="playlist not found"
),
},
)
def post(self, request, playlist_id):
"""modify custom playlist"""
data_serializer = PlaylistCustomPostSerializer(data=request.data)
data_serializer.is_valid(raise_exception=True)
validated_data = data_serializer.validated_data
self.get_document(playlist_id)
if not self.response:
error = ErrorResponseSerializer({"error": "playlist not found"})
return Response(error.data, status=404)
if not self.response["playlist_type"] == "custom":
error = ErrorResponseSerializer(
{"error": f"playlist with ID {playlist_id} is not custom"}
)
return Response(error.data, status=400)
action = validated_data.get("action")
video_id = validated_data.get("video_id")
playlist = YoutubePlaylist(playlist_id)
if action == "create":
try:
playlist.add_video_to_playlist(video_id)
except TypeError:
error = ErrorResponseSerializer(
{"error": f"failed to add video {video_id} to playlist"}
)
return Response(error.data, status=400)
else:
hide = UserConfig(request.user.id).get_value("hide_watched")
playlist.move_video(video_id, action, hide_watched=hide)
response_serializer = PlaylistSerializer(playlist.json_data)
return Response(response_serializer.data)
class PlaylistApiView(ApiBaseView):
"""resolves to /api/playlist/<playlist_id>/
GET: returns metadata dict of playlist
"""
search_base = "ta_playlist/_doc/"
permission_classes = [AdminWriteOnly]
valid_custom_actions = ["create", "remove", "up", "down", "top", "bottom"]
@extend_schema(
responses={
200: OpenApiResponse(PlaylistSerializer()),
404: OpenApiResponse(
ErrorResponseSerializer(), description="playlist not found"
),
},
)
def get(self, request, playlist_id):
# pylint: disable=unused-argument
"""get playlist"""
self.get_document(playlist_id)
if not self.response:
error = ErrorResponseSerializer({"error": "playlist not found"})
return Response(error.data, status=404)
response_serializer = PlaylistSerializer(self.response)
return Response(response_serializer.data)
@extend_schema(
request=PlaylistSingleUpdate(),
responses={
200: OpenApiResponse(PlaylistSerializer()),
404: OpenApiResponse(
ErrorResponseSerializer(), description="playlist not found"
),
},
)
def post(self, request, playlist_id):
"""update subscribed state of playlist"""
data_serializer = PlaylistSingleUpdate(data=request.data)
data_serializer.is_valid(raise_exception=True)
validated_data = data_serializer.validated_data
self.get_document(playlist_id)
if not self.response:
error = ErrorResponseSerializer({"error": "playlist not found"})
return Response(error.data, status=404)
subscribed = validated_data["playlist_subscribed"]
playlist_sub = PlaylistSubscription()
json_data = playlist_sub.change_subscribe(playlist_id, subscribed)
response_serializer = PlaylistSerializer(json_data)
return Response(response_serializer.data)
@extend_schema(
parameters=[PlaylistDeleteQuerySerializer],
responses={
204: OpenApiResponse(description="playlist deleted"),
},
)
def delete(self, request, playlist_id):
"""delete playlist"""
print(f"{playlist_id}: delete playlist")
query_serializer = PlaylistDeleteQuerySerializer(
data=request.query_params
)
query_serializer.is_valid(raise_exception=True)
validated_query = query_serializer.validated_data
delete_videos = validated_query.get("delete_videos", False)
if delete_videos:
YoutubePlaylist(playlist_id).delete_videos_playlist()
else:
YoutubePlaylist(playlist_id).delete_metadata()
return Response(status=204)

View File

@ -1,10 +0,0 @@
-r requirements.txt
ipython==9.3.0
pre-commit==4.2.0
pylint-django==2.6.1
pylint==3.3.7
pytest-django==4.11.1
pytest==8.4.1
python-dotenv==1.1.1
requirementscheck==0.0.6
types-requests==2.32.4.20250611

View File

@ -1,15 +0,0 @@
apprise==1.9.3
celery==5.5.3
django-auth-ldap==5.2.0
django-celery-beat==2.8.1
django-cors-headers==4.7.0
Django==5.2.3
djangorestframework==3.16.0
drf-spectacular==0.28.0
Pillow==11.2.1
redis==6.2.0
requests==2.32.4
ryd-client==0.0.6
uvicorn==0.35.0
whitenoise==6.9.0
yt-dlp[default]==2025.6.30

View File

@ -1,110 +0,0 @@
"""serializers for stats"""
# pylint: disable=abstract-method
from rest_framework import serializers
class VideoStatsItemSerializer(serializers.Serializer):
"""serialize video stats item"""
doc_count = serializers.IntegerField()
media_size = serializers.IntegerField()
duration = serializers.IntegerField()
duration_str = serializers.CharField()
class VideoStatsSerializer(serializers.Serializer):
"""serialize video stats"""
doc_count = serializers.IntegerField()
media_size = serializers.IntegerField()
duration = serializers.IntegerField()
duration_str = serializers.CharField()
type_videos = VideoStatsItemSerializer(allow_null=True)
type_shorts = VideoStatsItemSerializer(allow_null=True)
type_streams = VideoStatsItemSerializer(allow_null=True)
active_true = VideoStatsItemSerializer(allow_null=True)
active_false = VideoStatsItemSerializer(allow_null=True)
class ChannelStatsSerializer(serializers.Serializer):
"""serialize channel stats"""
doc_count = serializers.IntegerField(allow_null=True)
active_true = serializers.IntegerField(allow_null=True)
active_false = serializers.IntegerField(allow_null=True)
subscribed_true = serializers.IntegerField(allow_null=True)
subscribed_false = serializers.IntegerField(allow_null=True)
class PlaylistStatsSerializer(serializers.Serializer):
"""serialize playlists stats"""
doc_count = serializers.IntegerField(allow_null=True)
active_true = serializers.IntegerField(allow_null=True)
active_false = serializers.IntegerField(allow_null=True)
subscribed_false = serializers.IntegerField(allow_null=True)
subscribed_true = serializers.IntegerField(allow_null=True)
class DownloadStatsSerializer(serializers.Serializer):
"""serialize download stats"""
pending = serializers.IntegerField(allow_null=True)
ignore = serializers.IntegerField(allow_null=True)
pending_videos = serializers.IntegerField(allow_null=True)
pending_shorts = serializers.IntegerField(allow_null=True)
pending_streams = serializers.IntegerField(allow_null=True)
class WatchTotalStatsSerializer(serializers.Serializer):
"""serialize total watch stats"""
duration = serializers.IntegerField()
duration_str = serializers.CharField()
items = serializers.IntegerField()
class WatchItemStatsSerializer(serializers.Serializer):
"""serialize watch item stats"""
duration = serializers.IntegerField()
duration_str = serializers.CharField()
progress = serializers.FloatField()
items = serializers.IntegerField()
class WatchStatsSerializer(serializers.Serializer):
"""serialize watch stats"""
total = WatchTotalStatsSerializer(allow_null=True)
unwatched = WatchItemStatsSerializer(allow_null=True)
watched = WatchItemStatsSerializer(allow_null=True)
class DownloadHistItemSerializer(serializers.Serializer):
"""serialize download hist item"""
date = serializers.CharField()
count = serializers.IntegerField()
media_size = serializers.IntegerField()
class BiggestChannelQuerySerializer(serializers.Serializer):
"""serialize biggest channel query"""
order = serializers.ChoiceField(
choices=["doc_count", "duration", "media_size"], default="doc_count"
)
class BiggestChannelItemSerializer(serializers.Serializer):
"""serialize biggest channel item"""
id = serializers.CharField()
name = serializers.CharField()
doc_count = serializers.IntegerField()
duration = serializers.IntegerField()
duration_str = serializers.CharField()
media_size = serializers.IntegerField()

View File

@ -1,369 +0,0 @@
"""aggregations"""
from common.src.env_settings import EnvironmentSettings
from common.src.es_connect import ElasticWrap
from common.src.helper import get_duration_str
class AggBase:
"""base class for aggregation calls"""
path: str = ""
data: dict = {}
name: str = ""
def get(self):
"""make get call"""
response, _ = ElasticWrap(self.path).get(self.data)
print(f"[agg][{self.name}] took {response.get('took')} ms to process")
return response.get("aggregations")
def process(self):
"""implement in subclassess"""
raise NotImplementedError
class Video(AggBase):
"""get video stats"""
name = "video_stats"
path = "ta_video/_search"
data = {
"size": 0,
"aggs": {
"video_type": {
"terms": {"field": "vid_type"},
"aggs": {
"media_size": {"sum": {"field": "media_size"}},
"duration": {"sum": {"field": "player.duration"}},
},
},
"video_active": {
"terms": {"field": "active"},
"aggs": {
"media_size": {"sum": {"field": "media_size"}},
"duration": {"sum": {"field": "player.duration"}},
},
},
"video_media_size": {"sum": {"field": "media_size"}},
"video_count": {"value_count": {"field": "youtube_id"}},
"duration": {"sum": {"field": "player.duration"}},
},
}
def process(self):
"""process aggregation"""
aggregations = self.get()
if not aggregations:
return None
duration = int(aggregations["duration"]["value"])
response = {
"doc_count": aggregations["video_count"]["value"],
"media_size": int(aggregations["video_media_size"]["value"]),
"duration": duration,
"duration_str": get_duration_str(duration),
}
for bucket in aggregations["video_type"]["buckets"]:
duration = int(bucket["duration"].get("value"))
response.update(
{
f"type_{bucket['key']}": {
"doc_count": bucket.get("doc_count"),
"media_size": int(bucket["media_size"].get("value")),
"duration": duration,
"duration_str": get_duration_str(duration),
}
}
)
for bucket in aggregations["video_active"]["buckets"]:
duration = int(bucket["duration"].get("value"))
response.update(
{
f"active_{bucket['key_as_string']}": {
"doc_count": bucket.get("doc_count"),
"media_size": int(bucket["media_size"].get("value")),
"duration": duration,
"duration_str": get_duration_str(duration),
}
}
)
return response
class Channel(AggBase):
"""get channel stats"""
name = "channel_stats"
path = "ta_channel/_search"
data = {
"size": 0,
"aggs": {
"channel_count": {"value_count": {"field": "channel_id"}},
"channel_active": {"terms": {"field": "channel_active"}},
"channel_subscribed": {"terms": {"field": "channel_subscribed"}},
},
}
def process(self):
"""process aggregation"""
aggregations = self.get()
if not aggregations:
return None
response = {
"doc_count": aggregations["channel_count"].get("value"),
}
for bucket in aggregations["channel_active"]["buckets"]:
key = f"active_{bucket['key_as_string']}"
response.update({key: bucket.get("doc_count")})
for bucket in aggregations["channel_subscribed"]["buckets"]:
key = f"subscribed_{bucket['key_as_string']}"
response.update({key: bucket.get("doc_count")})
return response
class Playlist(AggBase):
"""get playlist stats"""
name = "playlist_stats"
path = "ta_playlist/_search"
data = {
"size": 0,
"aggs": {
"playlist_count": {"value_count": {"field": "playlist_id"}},
"playlist_active": {"terms": {"field": "playlist_active"}},
"playlist_subscribed": {"terms": {"field": "playlist_subscribed"}},
},
}
def process(self):
"""process aggregation"""
aggregations = self.get()
if not aggregations:
return None
response = {"doc_count": aggregations["playlist_count"].get("value")}
for bucket in aggregations["playlist_active"]["buckets"]:
key = f"active_{bucket['key_as_string']}"
response.update({key: bucket.get("doc_count")})
for bucket in aggregations["playlist_subscribed"]["buckets"]:
key = f"subscribed_{bucket['key_as_string']}"
response.update({key: bucket.get("doc_count")})
return response
class Download(AggBase):
"""get downloads queue stats"""
name = "download_queue_stats"
path = "ta_download/_search"
data = {
"size": 0,
"aggs": {
"status": {"terms": {"field": "status"}},
"video_type": {
"filter": {"term": {"status": "pending"}},
"aggs": {"type_pending": {"terms": {"field": "vid_type"}}},
},
},
}
def process(self):
"""process aggregation"""
aggregations = self.get()
response = {}
if not aggregations:
return None
for bucket in aggregations["status"]["buckets"]:
response.update({bucket["key"]: bucket.get("doc_count")})
for bucket in aggregations["video_type"]["type_pending"]["buckets"]:
key = f"pending_{bucket['key']}"
response.update({key: bucket.get("doc_count")})
return response
class WatchProgress(AggBase):
"""get watch progress"""
name = "watch_progress"
path = "ta_video/_search"
data = {
"size": 0,
"aggs": {
name: {
"terms": {"field": "player.watched"},
"aggs": {
"watch_docs": {
"filter": {"terms": {"player.watched": [True, False]}},
"aggs": {
"true_count": {"value_count": {"field": "_index"}},
"duration": {"sum": {"field": "player.duration"}},
},
},
},
},
"total_duration": {"sum": {"field": "player.duration"}},
"total_vids": {"value_count": {"field": "_index"}},
},
}
def process(self):
"""make the call"""
aggregations = self.get()
response = {}
if not aggregations:
return None
buckets = aggregations[self.name]["buckets"]
all_duration = int(aggregations["total_duration"].get("value"))
response.update(
{
"total": {
"duration": all_duration,
"duration_str": get_duration_str(all_duration),
"items": aggregations["total_vids"].get("value"),
}
}
)
for bucket in buckets:
response.update(self._build_bucket(bucket, all_duration))
return response
@staticmethod
def _build_bucket(bucket, all_duration):
"""parse bucket"""
duration = int(bucket["watch_docs"]["duration"]["value"])
duration_str = get_duration_str(duration)
items = bucket["watch_docs"]["true_count"]["value"]
if bucket["key_as_string"] == "false":
key = "unwatched"
else:
key = "watched"
bucket_parsed = {
key: {
"duration": duration,
"duration_str": duration_str,
"progress": duration / all_duration if all_duration else 0,
"items": items,
}
}
return bucket_parsed
class DownloadHist(AggBase):
"""get downloads histogram last week"""
name = "videos_last_week"
path = "ta_video/_search"
data = {
"size": 0,
"aggs": {
name: {
"date_histogram": {
"field": "date_downloaded",
"calendar_interval": "day",
"format": "yyyy-MM-dd",
"order": {"_key": "desc"},
"time_zone": EnvironmentSettings.TZ,
},
"aggs": {
"total_videos": {"value_count": {"field": "youtube_id"}},
"media_size": {"sum": {"field": "media_size"}},
},
}
},
"query": {
"range": {
"date_downloaded": {
"gte": "now-7d/d",
"time_zone": EnvironmentSettings.TZ,
}
}
},
}
def process(self):
"""process query"""
aggregations = self.get()
if not aggregations:
return None
buckets = aggregations[self.name]["buckets"]
response = [
{
"date": i.get("key_as_string"),
"count": i.get("doc_count"),
"media_size": i["media_size"].get("value"),
}
for i in buckets
]
return response
class BiggestChannel(AggBase):
"""get channel aggregations"""
def __init__(self, order):
self.data["aggs"][self.name]["multi_terms"]["order"] = {order: "desc"}
name = "channel_stats"
path = "ta_video/_search"
data = {
"size": 0,
"aggs": {
name: {
"multi_terms": {
"terms": [
{"field": "channel.channel_name.keyword"},
{"field": "channel.channel_id"},
],
"order": {"doc_count": "desc"},
},
"aggs": {
"doc_count": {"value_count": {"field": "_index"}},
"duration": {"sum": {"field": "player.duration"}},
"media_size": {"sum": {"field": "media_size"}},
},
},
},
}
order_choices = ["doc_count", "duration", "media_size"]
def process(self):
"""process aggregation, order_by validated in the view"""
aggregations = self.get()
if not aggregations:
return None
buckets = aggregations[self.name]["buckets"]
response = [
{
"id": i["key"][1],
"name": i["key"][0].title(),
"doc_count": i["doc_count"]["value"],
"duration": i["duration"]["value"],
"duration_str": get_duration_str(int(i["duration"]["value"])),
"media_size": i["media_size"]["value"],
}
for i in buckets
]
return response

View File

@ -1,42 +0,0 @@
"""all stats API urls"""
from django.urls import path
from stats import views
urlpatterns = [
path(
"video/",
views.StatVideoView.as_view(),
name="api-stats-video",
),
path(
"channel/",
views.StatChannelView.as_view(),
name="api-stats-channel",
),
path(
"playlist/",
views.StatPlaylistView.as_view(),
name="api-stats-playlist",
),
path(
"download/",
views.StatDownloadView.as_view(),
name="api-stats-download",
),
path(
"watch/",
views.StatWatchProgress.as_view(),
name="api-stats-watch",
),
path(
"downloadhist/",
views.StatDownloadHist.as_view(),
name="api-stats-downloadhist",
),
path(
"biggestchannels/",
views.StatBiggestChannel.as_view(),
name="api-stats-biggestchannels",
),
]

View File

@ -1,139 +0,0 @@
"""all stats API views"""
from common.serializers import ErrorResponseSerializer
from common.views_base import ApiBaseView
from drf_spectacular.utils import OpenApiResponse, extend_schema
from rest_framework.response import Response
from stats.serializers import (
BiggestChannelItemSerializer,
BiggestChannelQuerySerializer,
ChannelStatsSerializer,
DownloadHistItemSerializer,
DownloadStatsSerializer,
PlaylistStatsSerializer,
VideoStatsSerializer,
WatchStatsSerializer,
)
from stats.src.aggs import (
BiggestChannel,
Channel,
Download,
DownloadHist,
Playlist,
Video,
WatchProgress,
)
class StatVideoView(ApiBaseView):
"""resolves to /api/stats/video/
GET: return video stats
"""
@extend_schema(responses=VideoStatsSerializer())
def get(self, request):
"""get video stats"""
# pylint: disable=unused-argument
serializer = VideoStatsSerializer(Video().process())
return Response(serializer.data)
class StatChannelView(ApiBaseView):
"""resolves to /api/stats/channel/
GET: return channel stats
"""
@extend_schema(responses=ChannelStatsSerializer())
def get(self, request):
"""get channel stats"""
# pylint: disable=unused-argument
serializer = ChannelStatsSerializer(Channel().process())
return Response(serializer.data)
class StatPlaylistView(ApiBaseView):
"""resolves to /api/stats/playlist/
GET: return playlist stats
"""
@extend_schema(responses=PlaylistStatsSerializer())
def get(self, request):
"""get playlist stats"""
# pylint: disable=unused-argument
serializer = PlaylistStatsSerializer(Playlist().process())
return Response(serializer.data)
class StatDownloadView(ApiBaseView):
"""resolves to /api/stats/download/
GET: return download stats
"""
@extend_schema(responses=DownloadStatsSerializer())
def get(self, request):
"""get download stats"""
# pylint: disable=unused-argument
serializer = DownloadStatsSerializer(Download().process())
return Response(serializer.data)
class StatWatchProgress(ApiBaseView):
"""resolves to /api/stats/watch/
GET: return watch/unwatch progress stats
"""
@extend_schema(responses=WatchStatsSerializer())
def get(self, request):
"""get watched stats"""
# pylint: disable=unused-argument
serializer = WatchStatsSerializer(WatchProgress().process())
return Response(serializer.data)
class StatDownloadHist(ApiBaseView):
"""resolves to /api/stats/downloadhist/
GET: return download video count histogram for last days
"""
@extend_schema(responses=DownloadHistItemSerializer(many=True))
def get(self, request):
"""get download hist items"""
# pylint: disable=unused-argument
download_items = DownloadHist().process()
serializer = DownloadHistItemSerializer(download_items, many=True)
return Response(serializer.data)
class StatBiggestChannel(ApiBaseView):
"""resolves to /api/stats/biggestchannels/
GET: return biggest channels
param: order
"""
@extend_schema(
responses={
200: OpenApiResponse(BiggestChannelItemSerializer(many=True)),
400: OpenApiResponse(
ErrorResponseSerializer(), description="Bad request"
),
},
)
def get(self, request):
"""get biggest channels stats"""
query_serializer = BiggestChannelQuerySerializer(
data=request.query_params
)
query_serializer.is_valid(raise_exception=True)
validated_query = query_serializer.validated_data
order = validated_query["order"]
channel_items = BiggestChannel(order).process()
serializer = BiggestChannelItemSerializer(channel_items, many=True)
return Response(serializer.data)

Some files were not shown because too many files have changed in this diff Show More