babkin.dev

How I Built a 9-App YouTube Toolkit Solo: 25,000 Lines, 37 Languages, 8 AI Models

Mykhailo Babkin·

Oxys is a 9-app YouTube creator toolkit built for AIR Media-Tech, a company managing 3,800+ YouTube creators. I built it as the sole developer over the course of a year. It is 25,000+ lines of Python, supports 37 languages, uses 8 AI models, and runs in production on Streamlit Cloud.

This is the story of what I built, why I made the decisions I did, and what I learned.

What Oxys does

AIR manages thousands of YouTube channels across dozens of countries. Their team needed tools for metadata optimization, content moderation, channel auditing, and idea generation. These were manual processes that took hours per channel.

Oxys automates all of it. Nine apps, each solving a different problem:

Metadata Lab generates optimized titles, descriptions, and tags for YouTube videos. It analyzes the video content, researches trending keywords, and outputs metadata in any of 37 languages with proper orthographic rules. Czech diacritics, Spanish inverted punctuation, Arabic right-to-left text. Each handled without a separate translation step.

Idea Generator analyzes a channel's content DNA (what I call the "logic core") from its last 100 videos, then generates new video ideas that match the channel's style. It also compares against competitors to find content gaps.

Live Streaming optimizes live stream metadata using competitor research and trending keyword analysis. It has an auto mode that runs the full 4-step optimization pipeline with one click.

Video Moderation checks videos for policy violations at scale. It can scan an entire channel (5-50 videos), detect severity levels, and run authenticity audits to catch mass-produced or AI-generated content.

AI Audit is a 5-step channel audit wizard. It scrapes a channel, classifies its vertical using AIR's proprietary methodology, discovers competitors through a 3-stage pipeline, and packages everything into a downloadable data package. (This later became its own standalone product, which I wrote about separately.)

Thumbnail Generation creates and edits thumbnails using AI image generation. The editor includes automatic compliance checking against YouTube's 176-line content policy.

Speech Generation converts text to audio using Gemini's TTS model.

Whisper transcribes audio files to text using OpenAI's Whisper model running locally.

Architecture decisions that mattered

Streamlit was the right choice (and also the wrong one)

Streamlit let me ship a functional web app in days instead of weeks. No frontend framework, no API layer, no build step. Write Python, get a UI. For an internal tool serving a team of 20-30 people, that trade-off made sense.

The wrong part came later. At 25,000 lines, Streamlit's session state model becomes painful. Every user interaction re-runs the script from top to bottom. State management requires constant attention. The lesson: store expensive computations in session state, not local variables.

If I were starting today, I would use the same architecture for the first 5,000 lines and migrate the most complex apps (Metadata Lab, AI Audit) to a proper framework once they stabilized.

Multi-model AI architecture

Oxys uses 8+ AI models, each chosen for what it does best:

  • OpenAI GPT-4.1 for text generation (metadata, ideas, moderation analysis)
  • Gemini 2.5 Flash for video analysis and vertical detection
  • Gemini 3 Flash Preview for internet search (competitor discovery, trending keywords)
  • Gemini 2.5 Flash Image Preview for thumbnail editing
  • OpenAI gpt-image-1 for thumbnail generation
  • Gemini TTS for speech generation
  • Whisper (local) for transcription

No single model handles everything well. Gemini performs better at internet search. OpenAI produces more reliable structured text output. Routing tasks to the right model improved results over forcing one provider to do everything.

I built a unified wrapper (AIResource.py) that handles both providers with defensive response validation. When Gemini refuses a request (it applies more aggressive safety filters), the system falls back to OpenAI automatically.

37 languages without a translation layer

Oxys does not translate. It generates natively in the target language.

Every AI prompt includes language-specific orthographic rules injected directly. Czech requires háčky and čárky. Spanish requires inverted punctuation marks at the beginning of questions and exclamations. Arabic text flows right-to-left and has specific numeral handling.

I built a shared language_utils.py module with rules for 37 languages, 23 of which have explicit orthographic instructions. Injecting rules into prompts works better than generating in English and translating afterward, because translation loses SEO keyword targeting and cultural nuance.

The 3-stage competitor pipeline

Finding competitors for a YouTube channel sounds simple. Ask AI who the competitors are and you get a list. The problem: AI hallucinates YouTube channels that do not exist.

My solution is a 3-stage pipeline:

  1. AI internet search via Gemini finds candidate channels
  2. YouTube API verification checks if those channels actually exist and have real videos
  3. AI verification with real YouTube data confirms the channels are genuine competitors, not just similarly named

Running candidates through the YouTube API before passing them back to the model cut hallucinated channels to zero. The general principle applies broadly: verify AI-generated references against authoritative data sources before using them.

What I learned

Ship early, improve continuously. The first version of Oxys had 3 apps. I shipped it, got feedback from the team, and added apps based on what they actually needed. The Metadata Lab went through 7 quality improvement phases after launch.

Pin everything in Streamlit Cloud. At scale, the re-run model fights you. Pin all package versions (Streamlit Cloud has no dependency lock), keep expensive computations in st.session_state, and accept that some things will look awkward.

Routing between models saves money and improves quality. The cost difference between providers is significant. Using Gemini for search (cheap, good at grounding) and OpenAI for generation (reliable, structured output) brought both costs down and results up.

Validate every AI response defensively. Gemini in particular can return truncated JSON, None values, or safety refusals. Every response gets validated before processing. This pattern has saved me debugging time across multiple projects.

The numbers

MetricValue
Lines of code25,000+
Python files60+
Apps9
AI models used8+
Languages supported37
Dependencies140 packages
Development time~12 months
UsersAIR's content team (20-30 people)

Oxys runs in production on Streamlit Cloud, auto-deploying on push to master. It serves AIR's team daily for metadata optimization, content moderation, and channel auditing across their portfolio of 3,800+ creators.

If you run a business and want AI built into your workflows, book a free 45-min walkthrough. I will look at how your team works and map out where AI saves real hours.