Fork me on GitHub

Fleet of ships on TV, with subtitles in English and Spanish Review page with images, audio and bilingual subtitles Flash card with image and audio on front, bilingual subtitles on back

GitHub stars GitHub issues Latest version License

Can you learn a language by watching TV? Yes. But at least in my experience, it only works if you have a certain minimum level of comprehension to begin with. For example, when I started watching Buffy in French, I could follow maybe 40% of the dialog. By the end of the first season, I could understand about 70%. After 3 seasons, my comprehension was comfortably over 90%. I repeated this process with several other series, and by the time I was done, I could understand most easy French TV.

What's going on here? Well, the linguist Stephen Krashen once claimed that, "We acquire language when we understand messages." According to this theory, if we understand something in another language, our brain adds the patterns to an internal database, from which it can generalize. As with Google's statistical machine translation, our brains need raw data to build models. Even in adults, there's a huge unconscious component to this process. But sadly, we can't get this data by watching TV we don't understand at all. Victor Hart has been conducting a fascinating experiment with Mandarin TV, and results have been slow.

But what if we exploited every trick of technology and experimental psychology to artificially boost our understanding? Could we learn a language by watching TV from day 1?

Well, the remarkable polyglot Judith Meyer did manage to go from zero Japanese to understanding TV dialog like “英語なんかできなくだって いいんだよ 碁を打つだけだから” in 30 hours. I didn't make it quite that far with Spanish and Avatar, but my comprehension still improved at a startling rate.

Earworms, comprehensible input, the testing effect, and the spacing effect

substudy is an experimental tool which tries to help you exploit the following psychological phenomena:

  1. Earworms. Depending on your age and musical tastes, you've probably heard at least one of 99 Luftballons, Du Hast or Gangnam Style enough times that you can nearly sing along, even if you don't speak the language. There's a good chance that you can spontaneously immitate the singers' accents, too. This works for spoken words, as well: Have you ever caught yourself speaking along with opening credits of a TV show? And of course, once this stuff gets stuck in your head, you can't get it out.
  2. The input hypthothesis. As mentioned above, Krashen claimed language learning was due almost entirely to understanding input. I'm not sure if I'd go that far, but I've seen a lot of people who struggled with French or Spanish for years, and then one day picked up a book and started watching TV, and saw uncannily rapid improvement.
  3. The testing effect. When you work to retreive a memory, you strengthen it, and make futher retrievals easier.
  4. The spacing effect. If you retrieve a memory shortly before you would have otherwise forgotten it, you're likely to remember it nearly twice as long the next time.

So how can we take advantage of these effects to speed up our learning? We'll need some tools.

Anki: Open source flashcards with automatic spacing

Flash card with image, audio and text "Tierra" and "Earth"

Anki is an open source flashcard system designed to exploit the spacing effect. It's a favorite tool of many serious language learners, and I've personally done over 35,000 flash card reviews over 3 languages. I created many of my flashcards using the brilliant Windows application subs2srs. But we can do it from the command-line, too, using substudy.

Assuming we have a video file named episode_01_01.mkv, a Spanish subtitle file named episode_01_01.es.srt, and an English subtitle file named subtitle file episode_01_01.en.srt, we can run:

substudy export csv episode_01_01.mkv \
    episode_01_01.es.srt episode_01_01.en.srt

This will create a directory episode_01_01_csv containing a cards.csv file and a bunch of media files which we can important into Anki following these instructions for subs2srs. When reviewing these cards:

  1. Delete cards aggressively. Too hard? Incomprehensible? Annoying? It should be gone. You only want the low-hanging fruit. Aggressive deletion keeps your deck healthy, and you can always get more cards.
  2. If you can mostly understand the foreign language audio, click "Good" or "Easy." If you struggle, click "Hard" or "Again."
  3. Don't learn more than 10 or 20 new cards a day. It's really tempting, but you'll eventually end up reviewing 5× or 10× that number. It's best to avoid increasing the number of cards until you've done a month or so reviews.
  4. Expect to see your first results after three or four days, as your brain learns the audio. More substantial results will take about 500–750 cards and 20–30 days. Judith Meyer studied 1,500 cards over 30 days, at which point she could more-or-less watch new episodes without subtitles.

Anki is amazingly good at creating earworms, and many initially difficult cards will suddenly become easy after 20 or 30 days. And Anki is also how we can build the link between those earworms and what they actually mean.

Just be sure to keep Anki fun and stress-free. If it's not, you probably need to make easier cards, or to delete more.

Bilingual subtitles

Fleet of ships on TV, with subtitles in English and Spanish

We can also generate bilingual subtitles by running:

substudy combine \
    episode_01_01.es.srt \
    episode_01_01.en.srt \
    > episode_01_01.bilingual.srt

These should work well with the VLC player, or with the Chrome extension Videostream for Google Chromecast. The two languages are displayed together, on the same edge of the screen. This covers up more of the video, but it makes it easier for our eyes to jump rapidly between the two languages.

Personally, I find that on days where I watch entire episodes, my Anki reviews are much easier and faster. And after learning about 750 cards, I tried watching a half-dozen new episodes without subtitles, and I was able to follow the plot of 2/3rds of them.

Reviewing an episode as a web page

If we want to go through an entire episode and puzzle out the hard bits, we can also generate a web page with an audio clip and subtitles for each line of dialog. For example, here's a snippet from Pan's Labyrinth:

Play button (clickable)

Hola. Soy la princesa Moanna…

Hello. I'm Princess Moanna…

y no te tengo miedo.

and I'm not afraid of you.

Phrases like tengo miedo may not make much sense at first, but usually they'll make more sense after you run into them a few times. And you can always look them on WordReference.com.

We can create a page with an entire episode's worth of these clips by running:

substudy export review episode_01_01.mkv \
    episode_01_01.es.srt episode_01_01.en.srt

We could use this page together with the readlang extension to easy look up unfamiliar words and translate confusing phrases.

Finding subtitles

You'll have much better results if you choose an easy show with clear dialog. For example, I found Avatar to be much more effective than the film Y Tu Mamá También. And a typical series will provide several dozens of hours of video, all with the same basic vocabulary and voices, which offers a very helpful boost in the beginning.

Unfortunately, finding accurate subtitles is a nuisance. You'll have the best luck using:

  1. OpenSubtitles has many subtitles of varying quality. They'll probably require cleanup.
  2. HandBrake can extract subtitle tracks from many DVDs, but they'll be in image-based formats. In general, the *.mkv video container format seems to be a good choice for subtitle processing, because it can hold multiple languages, and because some subtitle editors can open it directly.
  3. Subtitle Edit or another high-quality subtitle editor. These can adjust subtitle timings, perform OCR on subtitles, split long subtitles in half, and do many other essential things. The substudy tool does not intend to replace these tools.

You want subtitles in the SRT format, which is a simple, text-based format:

10
00:00:31,836 --> 00:00:34,520
¡Solo el Avatar es capaz de dominar
los cuatro elementos!

How can I try this?

Unfortunately, this will require either MacOS X or Linux, and some prior knowledge of the terminal. You can find installation instructions on GitHub. But please feel free to experiment, to file bugs, and to modify it to try out other ideas. If you discover a new way to use subtitles to help learn a language, I'd love to hear about it. Please feel free to contact me me with any questions or ideas!