Can you learn a language by watching TV? Yes. But at least in my experience, it only works if you have a certain minimum level of comprehension to begin with. For example, when I started watching Buffy in French, I could follow maybe 40% of the dialog. By the end of the first season, I could understand about 70%. After 3 seasons, my comprehension was comfortably over 90%. I repeated this process with several other series, and by the time I was done, I could understand most easy French TV.
What's going on here? Well, the linguist Stephen Krashen once claimed that, "We acquire language when we understand messages." According to this theory, if we understand something in another language, our brain adds the patterns to an internal database, from which it can generalize. As with Google's statistical machine translation, our brains need raw data to build models. Even in adults, there's a huge unconscious component to this process. But sadly, we can't get this data by watching TV we don't understand at all. Victor Hart has been conducting a fascinating experiment with Mandarin TV, and results have been slow.
But what if we exploited every trick of technology and experimental psychology to artificially boost our understanding? Could we learn a language by watching TV from day 1?
Well, the remarkable polyglot Judith Meyer did manage to go from zero Japanese to understanding TV dialog like “英語なんかできなくだって いいんだよ 碁を打つだけだから” in 30 hours. I didn't make it quite that far with Spanish and Avatar, but my comprehension still improved at a startling rate.
Earworms, comprehensible input, the testing effect, and the spacing effect
substudy is an experimental tool which tries to help you exploit the
following psychological phenomena:
- Earworms. Depending on your age and musical tastes, you've probably heard at least one of 99 Luftballons, Du Hast or Gangnam Style enough times that you can nearly sing along, even if you don't speak the language. There's a good chance that you can spontaneously immitate the singers' accents, too. This works for spoken words, as well: Have you ever caught yourself speaking along with opening credits of a TV show? And of course, once this stuff gets stuck in your head, you can't get it out.
- The input hypthothesis. As mentioned above, Krashen claimed language learning was due almost entirely to understanding input. I'm not sure if I'd go that far, but I've seen a lot of people who struggled with French or Spanish for years, and then one day picked up a book and started watching TV, and saw uncannily rapid improvement.
- The testing effect. When you work to retreive a memory, you strengthen it, and make futher retrievals easier.
- The spacing effect. If you retrieve a memory shortly before you would have otherwise forgotten it, you're likely to remember it nearly twice as long the next time.
So how can we take advantage of these effects to speed up our learning? We'll need some tools.
Anki is an open source flashcard system designed to exploit the
spacing effect. It's a favorite tool of many serious language
learners, and I've personally done over 35,000 flash card reviews over 3
languages. I created many of my flashcards using the brilliant Windows
application subs2srs. But we can do it from the command-line, too,
Assuming we have a video file named
episode_01_01.mkv, a Spanish subtitle
episode_01_01.es.srt, and an English subtitle file named
episode_01_01.en.srt, we can run:
substudy export csv episode_01_01.mkv \ episode_01_01.es.srt episode_01_01.en.srt
This will create a directory
episode_01_01_csv containing a
file and a bunch of media files which we can important into Anki following
these instructions for subs2srs. When reviewing
- Delete cards aggressively. Too hard? Incomprehensible? Annoying? It should be gone. You only want the low-hanging fruit. Aggressive deletion keeps your deck healthy, and you can always get more cards.
- If you can mostly understand the foreign language audio, click "Good" or "Easy." If you struggle, click "Hard" or "Again."
- Don't learn more than 10 or 20 new cards a day. It's really tempting, but you'll eventually end up reviewing 5× or 10× that number. It's best to avoid increasing the number of cards until you've done a month or so reviews.
- Expect to see your first results after three or four days, as your brain learns the audio. More substantial results will take about 500–750 cards and 20–30 days. Judith Meyer studied 1,500 cards over 30 days, at which point she could more-or-less watch new episodes without subtitles.
Anki is amazingly good at creating earworms, and many initially difficult cards will suddenly become easy after 20 or 30 days. And Anki is also how we can build the link between those earworms and what they actually mean.
Just be sure to keep Anki fun and stress-free. If it's not, you probably need to make easier cards, or to delete more.
We can also generate bilingual subtitles by running:
substudy combine \ episode_01_01.es.srt \ episode_01_01.en.srt \ > episode_01_01.bilingual.srt
These should work well with the VLC player, or with the Chrome extension Videostream for Google Chromecast. The two languages are displayed together, on the same edge of the screen. This covers up more of the video, but it makes it easier for our eyes to jump rapidly between the two languages.
Personally, I find that on days where I watch entire episodes, my Anki reviews are much easier and faster. And after learning about 750 cards, I tried watching a half-dozen new episodes without subtitles, and I was able to follow the plot of 2/3rds of them.
If we want to go through an entire episode and puzzle out the hard bits, we can also generate a web page with an audio clip and subtitles for each line of dialog. For example, here's a snippet from Pan's Labyrinth:
Hola. Soy la princesa Moanna…
Hello. I'm Princess Moanna…
y no te tengo miedo.
and I'm not afraid of you.
Phrases like tengo miedo may not make much sense at first, but usually they'll make more sense after you run into them a few times. And you can always look them on WordReference.com.
We can create a page with an entire episode's worth of these clips by running:
substudy export review episode_01_01.mkv \ episode_01_01.es.srt episode_01_01.en.srt
We could use this page together with the readlang extension to easy look up unfamiliar words and translate confusing phrases.
You'll have much better results if you choose an easy show with clear dialog. For example, I found Avatar to be much more effective than the film Y Tu Mamá También. And a typical series will provide several dozens of hours of video, all with the same basic vocabulary and voices, which offers a very helpful boost in the beginning.
Unfortunately, finding accurate subtitles is a nuisance. You'll have the best luck using:
- OpenSubtitles has many subtitles of varying quality. They'll probably require cleanup.
- HandBrake can extract subtitle tracks from many DVDs, but they'll be
in image-based formats. In general, the
*.mkvvideo container format seems to be a good choice for subtitle processing, because it can hold multiple languages, and because some subtitle editors can open it directly.
- Subtitle Edit or another high-quality subtitle editor. These can
adjust subtitle timings, perform OCR on subtitles, split long subtitles
in half, and do many other essential things. The
substudytool does not intend to replace these tools.
You want subtitles in the SRT format, which is a simple, text-based format:
10 00:00:31,836 --> 00:00:34,520 ¡Solo el Avatar es capaz de dominar los cuatro elementos!
How can I try this?
Unfortunately, this will require either MacOS X or Linux, and some prior knowledge of the terminal. You can find installation instructions on GitHub. But please feel free to experiment, to file bugs, and to modify it to try out other ideas. If you discover a new way to use subtitles to help learn a language, I'd love to hear about it. Please feel free to contact me me with any questions or ideas!