Sentence alignment using local and global information
详细信息    查看全文
文摘

We propose an integer linear programming algorithm to extract parallel sentences.

We build an English–Persian parallel corpus from Wikipedia articles.

Intrinsic evaluation using gold data shows the effectiveness of the ILP method.

Extrinsic evaluation via SMT and CLIR confirms high quality of the created corpus.

The extracted parallel corpus is freely available for research purposes.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700