中英新闻可比语料库在线构建系统的设计
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Design and Implementation of Online Building System for Chinese-English News Comparable Corpora
  • 作者:赵永标 ; 张其林 ; 谷琼
  • 英文作者:ZHAO Yongbiao;ZHANG Qilin;GU Qiong;School of Computer Engineering,Hubei University of Science and Arts;
  • 关键词:双语语料库 ; 可比语料库 ; 可比度 ; 新闻
  • 英文关键词:bilingual corpora;;comparable corpora;;comparability;;news
  • 中文刊名:ASSZ
  • 英文刊名:Journal of Anshun University
  • 机构:湖北文理学院计算机工程学院;
  • 出版日期:2019-06-15
  • 出版单位:安顺学院学报
  • 年:2019
  • 期:v.21;No.105
  • 基金:国家语委十三五科研规划项目“基于主题模型的Web可比语料在线挖掘研究”(项目编号:YB135-22)
  • 语种:中文;
  • 页:ASSZ201903027
  • 页数:4
  • CN:03
  • ISSN:52-1145/G4
  • 分类号:127-130
摘要
可比语料库是重要的基础资源,在线挖掘可比语料是构建大规模可比语料库的有效途径,合适的语料来源网站和有效的可比度计算方法能够简化在线挖掘过程。选择环球时报英文版和凤凰网作为语料来源,设计了一个中英新闻可比语料库在线构建系统。测试结果表明,系统能够连续稳定地生成可比语料。
        Comparable corpora are useful lingual resources.Mining comparable texts online from the web is an effective way to building comparable corpora of large scale.Suitable source websites and effective comparability measurement will facilitate the mining process.An online mining system for Chinese-English bilingual news comparable corpus is designed with globaltimes.cn and ifeng.com as the English and Chinese news source websites respectively.The system test results indicate that it can output comparable news pair steadily.
引文
[1]柳路芳,李波,陈鹏,等.基于词向量与可比语料库的双语词典提取研究[J].计算机工程与科学,2018(2):368-373.
    [2]庞伟.双语语料库构建研究综述[J].信息技术与信息化,2015(3):105-108.
    [3]Talvensaari T,Laurikkala J,Jarvelin K,et al.Creating and exploiting a comparable corpus in cross-language information retrieval[J].ACM Transactions on Information Systems,2007(1):4-es.
    [4]房璐,葛运东,洪宇,等.可比较语料库构建及在跨语言信息检索中的应用[J].广西师范大学学报(自然科学版),2010(3):126-130.
    [5]Saad M,Langlois D,Smaili K.Extracting Comparable Articles from Wikipedia and Measuring their Comparabilities[J].Procedia-Social and Behavioral Sciences,2013,95:40-47.
    [6]Malek Hajjem,Maroua Trabelsi,Chiraz Latiri.Building comparable corpora from social networks[C].Workshop on Building&Using Comparable Corpora.International Conference on Language Resources and Evaluation,2014.
    [7]Li B,Gaussier E.Improving Corpus Comparability for Bilingual Lexicon Extraction from Comparable Corpora[C].23rd International Conference on Computational Linguistics,Proceedings of the Conference,2010.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700