Deep learning, a technique derived from neural networks (an early AI technology), is poised to revolutionize machine translation. Google recently open sourced its word2vec package, which analyzes English language texts to discover the meanings and relationships of words. The results are pretty impressive and point toward a significant advance in machine translation technology in the next few years. Current machine translation systems rely on statistical techniques to train systems. Statistical machine translation works by feeding a large corpus of aligned texts, one in each language, to train the system. The system does not understand either language in any meaningful sense, but simply learns that Hello is highly correlated with Hola in Spanish. Given enough text, it can produce decent quality output that is suitable for comprehension, but not useful for publication without post-editing. Deep learning has the potential to change all of this by enabling the construction of systems that autonomously learn how words within each language relate to each other, their synonyms, and other linguistic structures. This will enable machine translation systems to understand the material they are translating. For example, the word2vec package enables you to find the “distance” to the closest words or phrases from an input phrase. Type in “France”, and it will display:
spain 0.678515 Machine translation vendors would be wise to take a close look at word2vec and related projects, as they have the potential to render current techniques obsolete. (Its a good bet that Google’s translation team is already working on this problem). Otherwise, they stand a good chance of being left in the dust by new systems and companies. 转载自:http:///2013/08/16/deep-learning-will-revolutionize-machine-translation/ |
|