基于NLP技术的日语词性赋码器信度研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research on the Reliability of Japanese POS Tagger Based on NLP Technology
  • 作者:赵骞
  • 英文作者:ZHAO Qian;School of Foreign Languages, Henan Normal University;
  • 关键词:自然语言识别 ; 语料库 ; 词性赋码 ; 依存句法
  • 英文关键词:Natural Language Processing;;Corpus;;POS Tagging;;Dependency Parser
  • 中文刊名:LSSZ
  • 英文刊名:Journal of Leshan Normal University
  • 机构:河南师范大学外国语学院;
  • 出版日期:2019-04-15
  • 出版单位:乐山师范学院学报
  • 年:2019
  • 期:v.34;No.271
  • 语种:中文;
  • 页:LSSZ201904009
  • 页数:6
  • CN:04
  • ISSN:51-1610/G4
  • 分类号:49-53+74
摘要
词性赋码在语料库建设中起着十分重要的作用,随着自然语言识别技术的不断发展,利用NLP技术对日语词性赋码成为可能。文章采用PyKNP、CaboCha和Yahoo! Dependency Parser三种基于NLP技术的日语词性赋码器对样本语料进行赋码及依存句法树分析,并对影响其赋码准确率的因素进行解读。结果表明,基于NLP技术的赋码器在赋码精度上有一定提升,并扩展了数据的解析维度。
        POS tagging plays a very important role in corpus construction. As the development of natural language processing technology, it is possible to employ NLP technology in Japanese POS tagging. In this study, three kinds of Japanese POS taggers based on NLP technology(PyKNP, CaboCha and Yahoo!Dependency Parser) were adopted to analyze the sample sentences through dependency parse, as well as the factors affecting coding accuracy. Our results show that the POS taggers can be improved in accuracy to some extent, and the data can be extended in analytic dimension. This research partly serves as the application for POS tagger based on NLP technology, or corpus construction combined with artificial intelligence.
引文
[1]毛文伟.日语自动词性赋码器的信度研究[J].外语电化教学,2012(3):10-14.
    [2]毛文伟.日语语料库建设的现状综述[J].日语学习与研究,2009(6):42-47.
    [3]梁茂成.学习者英语书面语料自动词性赋码的信度研究[J].外语教学与研究,2006,38(4):279-286.
    [4]郭建芳.从自然语言处理视域新探汉英词类差异[J].中北大学学报(社会科学版),2011,27(2):98-102.
    [5]工藤拓,松本裕治.チャンキングの段階適用による係り受け解析[J].情報処理学会研究報告自然言語処理,2001(20):97-104.
    [6]齋藤雅裕,萩原将文.類推を行う言語処理ニューラルネットワーク[J].Japan Society for Fuzzy Theory and Intelligent Informatics,2009(25):175-175.
    [7]笹野遼平,奥村学.大規模コーパスに基づく日本語二重目的語構文の基本語順の分析[J].The Institute of Electronics,Information and Communication Engineers,2017,24(5):687-703.
    [8]TOSHIFUMI T,MASAHITO T,KOSHO S. A lexicon of multiword expressions for linguistically precise,wide-coverage natural language processing[J].Computer Speech&Language,2014,28(6).
    [9]TEUBERT W. My version of corpus linguistics[J].International Journal of Corpus Linguistics,2005,10(1):1-13.
    [10]MANNING C,SCHüTZE H.Foundations of statistical natural language processing[M].USA:The MIT Press,1999.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700