Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18] [ 19] [ 20]

@

Journal of Information Science and Engineering, Vol. 30 No. 4, pp. 1407-1424 (September 2014)


Woodpecker: An Automatic Methodology for Machine Translation Diagnosis with Rich Linguistic Knowledge*


BO WANG1, MING ZHOU2, SHUJIE LIU2, MU LI2 AND DONGDONG ZHANG2
1School of Computer Science and Technology
Tianjin University
Tianjin, 300000 P.R. China
2Microsoft Research Asia
Beijing, 100000 P.R. China
E-mail: bo.wang.1979@gmail.com; {mingzhou; shujieli; muli; dozhang}@microsoft.com

Different from the black-box evaluation, the diagnostic evaluation aims to provide a better explanatory power into various aspects of the performance of artificial intelligence systems. However, for machine translation (MT) systems, due to its complexity and knowledge dependency, such diagnostic evaluation often demands a large amount of manual work. To tackle this problem, we propose an automatic diagnostic evaluation methodology, called Woodpecker, which enables multi-factored evaluation of MT systems based on linguistic categories and automatically constructed linguistic checkpoints. The taxonomy of the categories is defined with rich linguistic knowledge, including phenomena on different linguistic levels. The instances of the categories are composed into test cases called linguistic checkpoints. We present a method that automatically extracts checkpoints from parallel sentences, through which, Woodpecker can automatically monitor a MT system in translating various linguistic phenomena, thereby facilitating diagnostic evaluation. The effectiveness of Woodpecker is verified through in-house experiments and open MT evaluation tracks on various types of MT systems.

Keywords: evaluation, diagnosis, machine translation, linguistic knowledge, checkpoint

Full Text () Retrieve PDF document (201409_07.pdf)

Received June 11, 2013; revised July 31, 2013; accepted August 4, 2013.
Communicated by Hsin-Hsi Chen.
* This work is supported by the Natural Science Foundation of China (61105072), the Chinese National Program on Key Basic Research Project (2013CB329304), the Tianjin Younger Natural Science Foundation (14JCQNJC00400) and Tianjin Key Laboratory of Cognitive Computing and Application. The primary work was supported by the NLC group of Microsoft Research Asia and is completed in the Open University, UK.