A unigram orientation model for statistical machine translation
Christoph Tillmann
NAACL-HLT 2004
The paper presents a novel sentence pair extraction algorithm for comparable data, where a large set of candidate sentence pairs is scored directly at the sentence-level. The sentence-level extraction relies on a very efficient implementation of a simple symmetric scoring function: a computation speed-up by a factor of 30 is reported. On Spanish-English data, the extraction algorithm finds the highest scoring sentence pairs from close to 1 trillion candidate pairs without search errors. Significant improvements in BLEU are reported by including the extracted sentence pairs into the training of a phrase-based SMT (Statistical Machine Translation) system.
Christoph Tillmann
NAACL-HLT 2004
Xiaoqiang Luo, Radu Florian, et al.
NAACL-HLT 2009
Fei Huang, Jian-Ming Xu, et al.
ACL 2014
Christoph Tillmann, Hermann Ney
Computational Linguistics