Are Speech-to-Speech Translation Apps in Our Future?
September 6, 2011
It’s easy to instantaneously translate text online. Just go to Google Translate and enter any sentence. Translating speech into different languages is now also easily possible on your iPhone or Android phones.
But what about instantaneously translating speech and then sending it over the phone? Current speech translation apps take your spoken words and translate them into text. Some can even speak back the translations. But you still need to physically show other people the screen of your phone or have them within earshot.
Instantaneous and conversational spoken translations have been a linguistic challenge for years, but now Google and NTT DoCoMo (the main Japanese mobile phone operator) are working on developing software for a phone that can do just that. You say something in one language into your phone, and people on the other end can hear the translated version on their phones.
Almost like magic.
Google’s initiative is based around its already popular Google Translate app, which now features a new setting called “Conversation Mode.” Currently available only in English and Spanish, one person speaks an English sentence into the phone and it is automatically translated into spoken Spanish.
This setting provides two spoken languages but no real telecommunication capabilities. You still need to be close enough to someone to have them see your phone or hear it speak. Soon, however, that may not be the case.
Franz Och, the head of Google’s translation services, is leading the project to develop new smartphone software that will capture speech, translate it, send it from one phone to another, and then speak it aloud. Och’s target is to have a reasonably well-working product in a few years, a formidable challenge since this will require much better translation and voice-recognition technologies.
News of Och’s project at Google was first reported in early 2010, but the Japanese mobile provider NTT DoCoMo might have beaten him to the final punch.
The Japanese app isn’t an original innovation, since DoCoMo used the best from already existing technologies. It is unique, however, in that the entire program is based in the cloud.
At the trade show, a DoCoMo staff member read a Japanese newspaper while talking on the phone to another colleague who heard an English translation. While not perfect, the end result was a clearly audible and rather coherent version of the original article.
Even though it’s certainly impressive, DoCoMo (or Google) won’t be revolutionizing Sony board meetings just yet. Both companies’ apps are based on current machine-translation and voice-recognition technologies, which are great for casual conversations but definitely have their limits.
But although it’s just the beginning, this kind of technology holds huge potential for international business and politics. One of biggest barriers to trade in the history of ancient and modern civilization would be gone, or at least somewhat flattened.
Haggling for a pair of shoes is about the most complex conversation that automated speech-to-speech translators can handle as of now. Business and political leaders will have to wait a few more years.