Researchers at Microsoft are working on a universal translator that will take the user's voice and translate his/her words into another language without altering the actual voice. Applications for this software tool should be obvious: making languages easier to learn and more personal, to help visitors relate to locals in other countries without having to flip through books and stumble over pronunciations, and more.
Technology Review reports that research scientist Frank Soong demonstrated the software at Microsoft's Redmond, Washington campus on Tuesday. He created the translation system along with colleagues at Microsoft Research Asia, the company's second-largest research lab, in Beijing, China. Currently the system needs about an hour of training to develop a method of speech identical to the user's own voice.
For the demonstration, his boss Rick Rashid, who leads Microsoft's research department, actually read out text that was immediately translated into Spanish. Craig Mundie, Microsoft's chief research and strategy officer, did the same, only his translation was Mandarian. A sample audio clip can be heard here in four versions: the original English and separate Spanish, Italian and Mandarin versions.
"We will be able to do quite a few scenario applications," Soong promises. "For a monolingual speaker traveling in a foreign country, we'll do speech recognition followed by translation, followed by the final text to speech output [in] a different language, but still in his own voice."
After creating a "model" based on the user's method of speech, the system converts the model into one that's able to read out test in another language. This is done by comparing the user's voice with a stock text-to-speech model for the target language. The system then tweaks the second language to match the individual's articulation as best as possible. This method supposedly can convert any pair of 26 languages, including Mandarin Chinese, Spanish and Italian.
"The word is just one part of what a person is saying," he says, and to truly convey all the information in a person's speech, translation systems will need to be able to preserve voices and much more. "Preserving voice, preserving intonation, those things matter, and this project clearly knows that," said Shrikanth Narayanan, a professor at the University of Southern California, in Los Angeles. "Our systems need to capture the expression a person is trying to convey, who they are, and how they're saying it."
So far there's no indication of when the system will be ready on a consumer level, but how much do you want to bet it will become a Windows Phone-only feature?