Saurabh Paul, Christos Boutsidis, et al.
JMLR
This paper extends previous work on extracting parallel sentence pairs from comparable data (Munteanu and Marcu, 2005). For a given source sentence S, a maximum entropy (ME) classifier is applied to a large set of candidate target translations. A beam-search algorithm is used to abandon target sentences as non-parallel early on during classification if they fall outside the beam. This way, our novel algorithm avoids any document-level pre-filtering step. The algorithm increases the number of extracted parallel sentence pairs significantly, which leads to a BLEU improvement of about 1 % on our Spanish-English data. © 2009 ACL and AFNLP.
Saurabh Paul, Christos Boutsidis, et al.
JMLR
Joxan Jaffar
Journal of the ACM
Rakesh Mohan, Ramakant Nevatia
IEEE Transactions on Pattern Analysis and Machine Intelligence
Cristina Cornelio, Judy Goldsmith, et al.
JAIR