Soft syntactic constraints for Arabic-English hierarchical phrase-based translation

Yuval Marton; David Chiang; Philip Resnik

doi:10.1007/s10590-011-9111-z

Machine Translation

Paper

26 Oct 2011

Soft syntactic constraints for Arabic-English hierarchical phrase-based translation

View publication

Abstract

In adding syntax to statistical machine translation, there is a tradeoff between taking advantage of linguistic analysis and allowing the model to exploit parallel training data with no linguistic analysis: translation quality versus coverage. A number of previous efforts have tackled this tradeoff by starting with a commitment to linguistically motivated analyses and then finding appropriate ways to soften that commitment. We present an approach that explores the tradeoff from the other direction, starting with a translation model learned directly from aligned parallel text, and then adding soft constituent-level constraints based on parses of the source language. We argue that in order for these constraints to improve translation, they must be fine-grained: the constraints should vary by constituent type, and by the type of match or mismatch with the parse. We also use a different feature weight optimization technique, capable of handling large amount of features, thus eliminating the bottleneck of feature selection. We obtain substantial improvements in performance for translation from Arabic to English. © 2011 Springer Science+Business Media B.V.

Conference paper