Oxys: How I Built a 9-App YouTube Toolkit with 8 AI Models
Oxys is a 9-app YouTube creator toolkit built for AIR Media-Tech, a company managing 3,800+ YouTube creators. I built it as the sole developer over the course of a year. It is 25,000+ lines of Python, supports 37 languages, uses 8 AI models, and runs in production on Streamlit Cloud.
This is the story of what I built, why I made the decisions I did, and what I learned.
What Oxys does
AIR manages thousands of YouTube channels across dozens of countries. Their team needed tools for metadata optimization, content moderation, channel auditing, and idea generation. These were manual processes that took hours per channel.
Oxys automates all of it. Nine apps, each solving a different problem:
Metadata Lab generates optimized titles, descriptions, and tags for YouTube videos. It analyzes the video content, researches trending keywords, and outputs metadata in any of 37 languages with proper orthographic rules. Czech diacritics, Spanish inverted punctuation, Arabic right-to-left text. Each handled without a separate translation step.
Idea Generator analyzes a channel's content DNA (what I call the "logic core") from its last 100 videos, then generates new video ideas that match the channel's style. It also compares against competitors to find content gaps.
Live Streaming optimizes live stream metadata using competitor research and trending keyword analysis. It has an auto mode that runs the full 4-step optimization pipeline with one click.
Video Moderation checks videos for policy violations at scale. It can scan an entire channel (5-50 videos), detect severity levels, and run authenticity audits to catch mass-produced or AI-generated content.
AI Audit is a 5-step channel audit wizard. It scrapes a channel, classifies its vertical using AIR's proprietary methodology, discovers competitors through a 3-stage pipeline, and packages everything into a downloadable data package. (This later became its own standalone product, which I wrote about separately.)
Thumbnail Generation creates and edits thumbnails using AI image generation. The editor includes automatic compliance checking against YouTube's 176-line content policy.
Speech Generation converts text to audio using Gemini's TTS model.
Whisper transcribes audio files to text using OpenAI's Whisper model running locally.
Architecture decisions that mattered
Streamlit was the right choice (and also the wrong one)
Streamlit let me ship a functional web app in days instead of weeks. No frontend framework, no API layer, no build step. Write Python, get a UI. For an internal tool serving a team of 20-30 people, that trade-off made sense.
The wrong part came later. At 25,000 lines, Streamlit's session state model becomes painful. Every user interaction re-runs the script from top to bottom. State management requires constant attention. The lesson: store expensive computations in session state, not local variables.
If I were starting today, I would use the same architecture for the first 5,000 lines and migrate the most complex apps (Metadata Lab, AI Audit) to a proper framework once they stabilized.
Multi-model AI architecture
Oxys uses 8+ AI models, each chosen for what it does best. The full list of models and OpenAI API endpoints used:
- OpenAI GPT-4.1 for text generation (metadata, ideas, moderation analysis)
- Gemini 2.5 Flash for video analysis and vertical detection
- Gemini 3 Flash Preview for internet search (competitor discovery, trending keywords)
- Gemini 2.5 Flash Image Preview for thumbnail editing
- OpenAI gpt-image-1 for thumbnail generation
- Gemini TTS for speech generation
- Whisper (local) for transcription
No single model handles everything well. Gemini performs better at internet search. OpenAI produces more reliable structured text output. Routing tasks to the right model improved results over forcing one provider to do everything.
I built a unified wrapper (AIResource.py) that handles both providers with defensive response validation. When Gemini refuses a request (it applies more aggressive safety filters), the system falls back to OpenAI automatically.
37 languages without a translation layer
Oxys does not translate. It generates natively in the target language.
Every AI prompt includes language-specific orthographic rules injected directly. Czech requires háčky and čárky. Spanish requires inverted punctuation marks at the beginning of questions and exclamations. Arabic text flows right-to-left and has specific numeral handling.
I built a shared language_utils.py module with rules for 37 languages, 23 of which have explicit orthographic instructions. Injecting rules into prompts works better than generating in English and translating afterward, because translation loses SEO keyword targeting and cultural nuance.
The 3-stage competitor pipeline
Finding competitors for a YouTube channel sounds simple. Ask AI who the competitors are and you get a list. The problem: AI hallucinates YouTube channels that do not exist.
My solution is a 3-stage pipeline:
- AI internet search via Gemini finds candidate channels
- YouTube API verification checks if those channels actually exist and have real videos
- AI verification with real YouTube data confirms the channels are genuine competitors, not just similarly named
Running candidates through the YouTube API before passing them back to the model cut hallucinated channels to zero. The general principle applies broadly: verify AI-generated references against authoritative data sources before using them.
What I learned
Ship early, improve continuously. The first version of Oxys had 3 apps. I shipped it, got feedback from the team, and added apps based on what they actually needed. The Metadata Lab went through 7 quality improvement phases after launch.
Pin everything in Streamlit Cloud. At scale, the re-run model fights you. Pin all package versions (Streamlit Cloud has no dependency lock), keep expensive computations in st.session_state, and accept that some things will look awkward.
Routing between models saves money and improves quality. The cost difference between providers is significant. Using Gemini for search (cheap, good at grounding) and OpenAI for generation (reliable, structured output) brought both costs down and results up.
Validate every AI response defensively. Gemini in particular can return truncated JSON, None values, or safety refusals. Every response gets validated before processing. This pattern has saved me debugging time across multiple projects.
The development system that made shipping this fast is the same one I wrote about in my advanced Claude Code setup. If you want someone to set up an AI workflow like this for your business, that is the service I offer.
The numbers
| Metric | Value |
|---|---|
| Lines of code | 25,000+ |
| Python files | 60+ |
| Apps | 9 |
| AI models used | 8+ |
| Languages supported | 37 |
| Dependencies | 140 packages |
| Development time | ~12 months |
| Users | AIR's content team (20-30 people) |
Oxys runs in production on Streamlit Cloud, auto-deploying on push to master. It serves AIR's team daily for metadata optimization, content moderation, and channel auditing across their portfolio of 3,800+ creators.