Translating and dubbing video with a neural network

Translating video is a chain of tasks that neural networks now do almost automatically: hear the speech, translate it, re-voice it in another language and hit the lips. Not long ago this was an expensive studio process; today a clip in one language can be made to "speak" another in minutes.

What a dub is made of

It's a combination of several technologies from neighbouring books:

Speech recognition — transcribe the original track into text.
Translation — translate that text, keeping the meaning and tone.
Speech synthesis — voice the translation; ideally in the original narrator's voice to keep recognizability.
Lip-sync — fit the lip movement to the new track (like a talking avatar).

The more links work together, the more "seamless" the result: the best dub is when the viewer doesn't guess the original language was different.

Upload a clip, choose a language — get a version with translation and voicing. Video and voicing cost more than text: the first dub is available after signing up and onboarding — which grant starter tokens.

Загрузка…

Levels of "depth" of a dub

You don't always need a full dub — choose by task and budget:

Subtitles only — cheap, fast, the viewer hears the original. A separate chapter.
Voice-over translation — a synthesized voice over the muted original. Cheap, clear, no lip-sync.
Full dub — a new track instead of the original, ideally in the narrator's voice and with lip-sync. More expensive, but it looks like a native video.

To make the translation sound alive

Clean original sound. Noise and music interfere with recognition — the cleaner the track, the more accurate the translation.
Proofread the translation. Machine translation is worth adjusting to conversational intonation — synthesis will voice exactly what's written.
Watch phrase length. Lines are different lengths in different languages; sometimes the translation is trimmed to hit the timing.

Guide: “Video for a foreign market”

A localization checklist: what to translate, what to leave, how not to lose the intonation and where a dub saves money and where it spoils things.

Гость

Аккаунт

Входит в подписку

What's next

We've learned to translate and voice. One last common task with a finished video remains — removing the extra from the frame.

In the Twelver chat translating and voicing video are gathered in one conversation. Starter tokens are granted after signing up and onboarding.

Try it yourself

Everything in this guide runs inside Twelver

One chat for text, images, video, music and voice — no separate services or subscriptions.

Open Twelver chat

Related pageVideo Generation

Оцените свой опыт