I’m a casual fan of science fiction, and by that I mean I love Star Wars and Star Trek and any video game involving interplanetary war, but not to the point that I have tattoos of the Rebel Alliance or can speak Klingon or anything like that (though if anyone has a suit of Storm Trooper armor that they’d like to send my way, my wife says it’s OK!).
In most of the sci-fi content that I’ve experienced, communication difficulties caused by different civilizations speaking dramatically different languages has largely been solved, often through the use of translator androids or implantable devices. But could such a tool exist, now or in the future?
As it turns out, maybe! Researchers at Google recently devised a method for efficient cross-linguistic translation using the statistical properties of language. To start, the researchers assumed that all the world’s languages describe a large but finite set of concepts. Linguists may argue with that assumption, but I’m not going to moderate a disagreement between a linguist and a computer programmer, so we’ll move on.
Next, they fed large sets of linguistic data into a computer program. The program looked at how often certain words and phrases appeared in the context of other words and phrases. So for example, it might be interested in knowing that the blank space in the sentence “I am going to the ___ after work” is filled with “store” 10% of the time, “bank” 5% of the time, “club” .3% of the time, etc.
After feeding enough linguistic data into the computer program, the researchers could construct a vector (set of data) for each of the words and phrases in a language. From there, translation from one language to another simply becomes a matter of finding the equations to transform one set of vectors into another. I don’t know how to do that kind of math, but apparently it is pretty straightforward for people with doctorates in mathematics.
The researchers claim that their translation program approached 90% precision@5, which I believe means that when the program gave its top five guesses for the translation of a particular word, the correct translation appeared in that list 90% of the time (if I’m grossly misinterpreting the concept of precision@n, please hit me up on twitter @jimkloet or leave a comment!). On top of that, the translation program was able to identify and fill in gaps in some pre-existing translation dictionaries.
This translation program sounds quite promising, but I don’t think we can expect a real C3PO anytime soon. For starters, the Google team’s work only applies to words and phrases, not to sentences or whole conversations. It also doesn’t account for any of the subtleties of speech and language, like changes in the prosody of voice, sarcasm, or humor.
That being said, it’s exciting to know that people are actually working on this sort of thing. An automatic language translator would bring the world together much more efficiently than asking everyone to learn the roughly 7100 different languages spoken today. And it would probably come in handy if we were to ever find intelligent life outside of our planet, too.