How AI Taught RunPee to Speak 41 Languages

RunPee speaks 41 languages now. Not "Submit" and "Cancel" translated by a robot — the whole thing. The Peetimes, the little personality-soaked cues that tell you exactly when to run to the restroom without missing anything good, the movie synopses, the jokes, the actor names spelled correctly in scripts I can't even read. All of it.

A few years ago that sentence would have been a fantasy. Localizing an app into 40+ languages was the kind of thing you needed a department for. A localization vendor, a budget with commas in it, and months of back-and-forth. For a small solo-founder app like RunPee, it was simply off the table. You picked English, maybe Spanish if you were ambitious, and you called it a day.

That math has changed completely. And I want to walk you through how, because the interesting part isn't that AI can translate — everybody knows that now. The interesting part is everything that had to go right around the translation to make it actually work.

What actually needs translating

First, you have to understand that RunPee has two completely different translation problems, and pretending they're the same is the first mistake everyone makes.

Problem one: the interface. Buttons, labels, error messages, settings screens. "Loading…" and "Are you sure you want to delete this?" This stuff is annoying but mechanical. There's a finite number of strings, they don't change much, and a button that says "Save" means the same thing whether you're in Tokyo or Toronto. We call this system Logos — every piece of English text in the app gets a key, the key gets translated once per language, and the app looks it up at runtime.

Ninety-percent-plus of the words are these standard, easy-to-translate terms. But what about translating "About these Peetimes..."? How does an AI translate that? More importantly, how does an AI translate that consistently? This is where the translation lexicon comes in. Claude Code was able to read all of the text in the app and look for those words/terms that are RunPee-specific. Like the term RunPee itself. Then, we need to decide how to translate those terms, which can be different for each language. Initially, I thought that one lexicon file would be enough, but it turns out that to do it correctly requires a unique lexicon file for each language.

Problem two: the content. This is the actual soul of RunPee — the Peetimes. A Peetime isn't a string. It's a tiny piece of writing with a voice: "A good Peetime full of exposition. It’s a nice stopgap if you can’t make it to the next Peetime, which is recommended." It references a specific scene in a specific movie. It assumes the reader knows what's coming and what they can afford to skip.

Translating problem one is data entry. Translating problem two is writing. And writing in 41 languages, with the right tone, while preserving movie-specific context, is the thing that used to be difficult for human translators and impossible for an AI.

Why this couldn't have worked before

Here's the part worth pausing on. We've had machine translation for two decades. Google Translate has been free and pretty good since roughly forever. So why wasn't RunPee already in 41 languages back in 2018?

Because old machine translation translated words, not intent. Feed it a Peetime joke and you'd get something grammatically valid and tonally dead — or worse, accidentally wrong, because the machine had no idea that "the bald guy" referred to a specific character or that a particular line was sarcasm. It would translate the literal surface and quietly destroy everything underneath. For UI strings, fine. For content with a voice, useless.

The unlock with modern large language models is that they understand context. You can hand one a Peetime, plus the synopsis, plus a note that says "this scene is played for laughs, keep it light," plus a glossary of how this movie's character names should be rendered, and it will produce a translation that a native speaker reads and goes "yeah, that's natural." That's not a small improvement over Google Translate. It's a different category of thing.

So the capability arrived. But capability isn't a product. Here's what it took to turn it into one.

The trick: the AI agents are the workforce

The first decision was a money decision, and it's a good one to understand because it's a little counterintuitive.

The obvious way to translate at scale is to call a translation API — send text, get text back, pay per word. At 41 languages times thousands of Peetimes, that bill adds up fast, and it's a recurring, per-use cost that scales with exactly the thing you're trying to grow.

Instead, RunPee's translations are produced by Claude agents — the same AI I'm using to build the app — spawned as a temporary workforce. When Claude kicks off a translation run, it's not metering API calls one word at a time. I'm dispatching a small team of AI workers, each one taking a chunk of languages, each one being the model that does the translating. The agent's output is the translation. Same model, same quality, flat cost instead of a meter ticking.

The mental shift there is real. I stopped thinking of translation as a service I buy and started thinking of it as labor I direct. That reframing is a big part of why a one-person operation can do this at all.

Obstacle 1: How many languages can AI translate into?

Originally I thought, "Translating for 1 other language is schematically no different than translating for 100, so let's go big." I settled on 64 languages, some of them ridiculously niche. But obviously, niche languages have poor support in the training data and so Claude Code isn't great at translating into those languages. In the end, after researching the major movie theater markets and consulting with Claude Code about which languages it thought it could handle, we trimmed the language list down to 41 + US English.

I say US English because we decided that it would be best to support regional differences in languages. For instance, Spanish in Spain is different than Spanish in Mexico. Here are the language families we translate into.

Languages with regional variants

Language Variants we translate into Count
English US, GB, CA, AU 4
Spanish MX, Spain, LatAm 3
German DE, AT — Austria, CH — Switzerland 3
Chinese CN, TW — Taiwan, HK — Hong Kong 3
French FR, CA — Canada 2
Portuguese Brazil, Portugal 2

That's 6 language families covering 17 locale variants.

The other 25 — single-locale languages

The remaining 25 ship as one locale each:

Arabic · Bengali · Bulgarian · Czech · Dutch · Farsi (Persian) · Finnish · Greek · Hebrew · Hindi · Hungarian · Indonesian · Italian · Japanese · Korean · Polish · Romanian · Russian · Swedish · Tamil · Telugu · Thai · Turkish · Ukrainian · Vietnamese

17 variants + 25 single-locale = 42 active locales.

Obstacle 2: Getting the tone right (the Prompt Lab)

A translation can be 100% accurate and still wrong. If a Peetime synopsis is supposed to be breezy and the translation comes out stiff and formal, it's a bad synopsis even though every word is technically correct.

You don't fix that by hoping. You fix it by treating the prompt — the instructions you give the AI — as a thing to be engineered and measured, not guessed at.

So I built what amounted to a translation laboratory. I froze a set of real Peetimes as a test corpus, ran them through five representative languages, and then iterated on the prompt itself across a dozen versions. Each version got scored — not by me eyeballing it, but by a second AI model acting as an independent judge against a rubric: accuracy, tone, naturalness, conciseness, and a couple of others. One model translates, a different model grades. (Using two different models matters — you don't want the same AI grading its own homework.)

The rewrites paid off in measurable jumps. One revision alone moved the quality score up six points. And the lab surfaced a genuinely surprising finding that I never would have guessed from the armchair:

Languages expand and contract differently, and you have to account for it. Spanish, French, and German tend to use more words than English to say the same thing — give those a little extra room (we landed around 115% of the English length) and quality goes up. But Japanese and Chinese are the opposite — they're denser, and forcing them to fill an English-sized space actually made them worse, padding them with filler to hit a length they never needed. The right answer wasn't one rule. It was per-language rules, learned from data.

That's the kind of thing the lab existed to catch. You can't intuit it. You have to measure it.

Obstacle 3: Names you can't just translate

Movies are full of proper nouns. Character names, actor names, place names, the occasional "FBI." And proper nouns are a trap.

You don't translate a name — you transliterate it, which means re-spelling it so it sounds right in the target script. "Becket" doesn't become a different word in Japanese; it becomes ベケット — the same sound, written in characters a Japanese reader can pronounce. In Russian it's Бекет. In Arabic, بيكيت.

Early on I had a rule that said "keep English names in English." Sensible-sounding. Completely wrong for non-Latin scripts. Because if you're reading the app in Japanese and a Peetime suddenly drops the Latin letters "Becket" into the middle of a Japanese sentence, you might not even know how to say it. The whole point of localizing is that the reader shouldn't hit a wall.

So the "keep English" rule had to get smarter: keep the original spelling for languages that use the Latin alphabet, but for everything else, transliterate — re-spell the name so it actually reads. I also had to teach the system about acronyms (an "FBI" might need a little parenthetical gloss the first time) and build a per-movie glossary so that every Peetime for a given film spells its characters consistently. Nothing breaks the illusion faster than a character whose name is spelled three different ways across three Peetimes.

Obstacle 4: The gremlins that corrupt everything silently

This is the section every developer reading this will nod along to, because these are the bugs that don't announce themselves. They just quietly ruin your data and let you find out later.

The UTF-16 file corruption. RunPee ships each language's translations as a small database file that gets copied onto your phone. At one point the script that moved those files onto Android devices was mangling them — re-encoding binary database files as text, which doubled their size and stuffed invisible junk at the front. The files looked fine. They were quietly broken. The fix was making sure binary files got moved as binary, untouched. But the lesson is the one every data pipeline teaches eventually: the scariest corruption is the kind that doesn't throw an error.

If you're a non-developer, then you may be surprised to learn that developers love error messages. These are things that we know about and can fix. It's the bugs that don't produce error messages that we hate.

The everything-is-Portuguese bug. For a while, a mismatch between how a value was named in one place (camelCase) versus another (snake_case) meant a key piece of information was silently falling through. The symptom? Every translation, regardless of target language, was quietly defaulting to Portuguese. Not crashing. Not erroring. Just... Portuguese. Everywhere. Because the code asked for a value by a name that didn't exist, shrugged, and used the fallback. A capital letter in the wrong place cost real hours.

Obstacle 5: The last mile is human

Here's my favorite story from the whole project, because it's humbling.

The AI translations were good. Genuinely good. I was proud of them. And then my wife, who is a native Chinese speaker, looked at the Chinese version and immediately flagged "Peetime." The AI had translated it as 尿点时刻 — which is technically fine, literally "pee-point moment." But a native speaker doesn't say that. The natural term is just 尿点. The extra characters were the linguistic equivalent of saying "ATM machine" — redundant in a way only a native ear catches.

The AI got it 95% of the way there. It took someone who actually lives in the language to close the last 5%. And that 5% is the difference between an app that's been translated and an app that feels like it was written for you.

I think that's the honest shape of where AI is right now, and it's worth saying plainly: AI doesn't replace the native speaker. It changes what you need them for. Before, you'd need that person to translate ten thousand strings from scratch — unaffordable, slow, never going to happen. Now you need them to review and catch the handful of things that are technically correct but subtly off. That's a job that takes an afternoon instead of a career. AI didn't remove the human. It moved the human to the exact spot where humans are still irreplaceable.

Obstacle 6: Doing it reliably, thousands of times

A translation run isn't one translation. It's thousands — every Peetime, every synopsis, across 41 languages. And it has to run reliably enough that I can kick it off and trust the output without re-reading all of it.

That took its own engineering. The work gets split across a small team of AI workers — some handling a batch of languages each, with the trickiest ones (Chinese, Japanese, Korean, Hindi) routed to the strongest model. The split sounds arbitrary but it's load-bearing; the exact division of labor was tuned after watching runs drift when a single worker was asked to hold too much in its head at once. The payoff is runs that complete with thousands of translations inserted and zero errors, over and over. Reliability at scale isn't glamorous, but it's the difference between a demo and a product.

What this actually means

Step back from the bugs and the prompt versions for a second, because there's a bigger thing here.

A few years ago, "ship a movie app, in 41 languages, with content that has personality and gets the cultural details right" was a sentence that described a funded company with a localization team. Today it describes a guy and a fleet of AI agents he points at the problem overnight.

That's the part I keep turning over. The technology didn't just make translation cheaper. It collapsed an entire category of work from "needs an organization" down to "needs a clear head and repeatable instructions." The leverage a single person has now is genuinely new. I'm not translating RunPee. I'm directing the translation of RunPee, and the difference between those two verbs is the whole story of the last few years.

But — and the Chinese-speaker story is why I keep the "but" — leverage isn't magic. The AI made the hard part easy and did nothing for the tedious part. It got me to 95% and stopped. The remaining work was real: pick a source of truth, measure the tone, transliterate the names, fix the encodings, find the native speaker. None of that was the AI's job. All of it was mine.

So: does RunPee speak your language? If you're one of 41 languages, very likely yes — and it speaks it like someone who actually wanted you to understand, not like a machine that ran your words through a dictionary. That distinction took a lot of unglamorous work to earn.

And the next time someone tells you AI is going to do everything for you, remember: the AI will get you most of the way; closing the gap is still on you.