基于依存树与规则相结合的汉泰新闻事件要素抽取方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A News Event Extraction Method in Chinese and Thai Languages Based on Dependency Tree Elements Combined with Rules
  • 作者:程良 ; 郜洪奎 ; 王红斌
  • 英文作者:CHENG Liang;GAO Hong-kui;WANG Hong-bin;City College,Kunming University of Science and Technology;Faculty of Information Engineering and Automation,Kunming University of Science and Technology;
  • 关键词:依存树 ; 规则 ; 泰语 ; 要素抽取 ; 自然语言处理
  • 英文关键词:dependency tree;;rule;;Thai language;;factor extraction;;natural language processing
  • 中文刊名:RJDK
  • 英文刊名:Software Guide
  • 机构:昆明理工大学城市学院;昆明理工大学信息工程与自动化学院;
  • 出版日期:2018-07-15
  • 出版单位:软件导刊
  • 年:2018
  • 期:v.17;No.189
  • 基金:国家自然科学基金项目(61462054);; 云南省科技厅面上项目(2015FB135);; 云南省教育厅科学研究基金项目(2018JS035)
  • 语种:中文;
  • 页:RJDK201807012
  • 页数:9
  • CN:07
  • ISSN:42-1671/TP
  • 分类号:53-60+67
摘要
针对汉泰新闻事件要素抽取进行研究,首先分析汉泰语言特点,发现泰语的定语、状语和补语后置与中文语法结构类似,进一步分析发现汉泰依存结构相同。因此,通过平行句对构建汉泰依存树,再根据泰语语言特点定义若干规则,利用依存树与规则相结合抽取泰语句子的主语、宾语和状语。实验验证,泰语主语名词短语、宾语名词短语和状语名词短语的事件要素抽取正确率分别为62.13%、64.18%和70.21%,说明基于依存树与规则相结合抽取泰语新闻事件元素是可行的。
        This research aims to study the extraction method for news in both Chinese and Thai languages.An analysis on the characteristics of Chinese and Thai language was carried out.It was found that the attributive,adverbial and post-complement were similar in both languages,which further indicated that Chinese and Thai language shared the same dependency structure.Therefore,Chinese and Thai dependency structure trees were developed by parallel sentences.Then,according to the rules of Thai language features,subject,object and adverbial of Thai sentences were extracted by combining dependency tree and the defined rules.The research confirmed the main elements in Thai news included subject noun phrases,object noun phrases,and adverbial noun phrases,with the correct extraction rate of 62.13%,64.18% and 70.21% respectively.It is evident that dependency structure tree in combination with language rules could be applied in extracting the elements in Thai news.
引文
[1]朱振明.中泰建交以来中泰关系的回顾与展望[J].东南亚南亚研究,2000(2):24-32.
    [2]梁源灵.中泰经贸关系的回顾与展望[J].东南亚纵横,2000(s2):9-15.
    [3]DODDINGTON G,MITCHELL A,PRZYBOEKI M.The automatic content extraction program-tasks,data and evaluation[C].Proc Lrec Lisbon,2004:837-840.
    [4]ALLAN J,GUPTA R,KHANDELWAL V.Temporal summaries of news topics[C].International Acm Sigir Conference on Research&Development in Information Retrieval,2001:10-18.
    [5]HAN B,GATES D,LEVIN L.From language to time:a temporal expression anchorer[C].Proceeding of Thirteenth International Symposium on Temporal Representation and Reasoning,2006:196-203.
    [6]MANI I,WILSON G.Robust temporal processing of news[C].Proceedings of the 38Annual Meeting on Association for Computational Linguistics,2000:69-76.
    [7]YANKOVA M,BOYTCHEVA S.Focusing on scenario recognition in information extraction[C].Tenth Conference on European Chapter of the Association for Computational Linguistics,2003:41-48.
    [8]SURDEANU M,HARABAGIU S,WILLIAMS J,et al.Using predicate-argument structures for information extraction[C].ACL'2003Proceedings of the 41st Annual Meeting,2003:8-15.
    [9]李芳,毛顺福,蒋德良,等.中文新闻事件要素自动抽取研究[D].上海:上海交通大学,2007.
    [10]付剑锋,刘宗田,刘炜,等.基于特征加权的事件要素识别[J].计算机科学,2010,37(3):239-241.
    [11]AHN D.The stages of event extraction[C].Proceedings of the Workshop on Annotating and Reasoning about Time and Events,2006:1-8.
    [12]赵妍妍,万翔.中文事件抽取技术研究[J].中文信息学报,2008,22(1):3-8.
    [13]丁效.音乐领域典型事件抽取方法研究[J].中文信息学报,2011:25(2):15-20.
    [14]SAEEDI P.Feature engineering using shallow parsing in argument classification of Persian verbs[C].Proceedings of the 16th CSI International Symposium on Artificial Intelligence and Signal Processing,2012:333-338.
    [15]WANG W.Chinese news event 5WLH elements extraction using semantic role labeling[C].Proceedings of the Third International Symposium on Information Processing,2010:484-489.
    [16]杨尔弘.突发事件信息提取研究[D].北京:北京语言大学,2005.
    [17]赵治鹏.采用机器学习方法实现泰语分词[D].昆明:云南大学,2014.
    [18]SUESATPANIT K.Thai word segmentation using character-level information[C].Inter BEST 2009Thai Word Segmentation Workshop,2009:18-23.
    [19]KRUENGKRAI C.Construction of Thai lexicon from existing dictionaries and texts on the web[C].IEICE-Transactions on Information and Systems,2006:2286-2293.
    [20]陶广奉.基于跨语言迁移学习的泰语依存句法解析方法研究[D].昆明:昆明理工大学,2017.
    [21]张凌.基于词性模板与依存分析的中文微博情感要素抽取[J].计算机科学,2015(42):474-478.
    [22]邓丽娜.泰语与汉语的同异性与对泰汉语教学[J].成都大学学报:教育科学版,2008,22(4):64-67.
    [23]柯伟智.汉语结果补语与泰语对应形式的对比研究[D].北京:北京大学,2013.
    [24]邱鲁阳.汉泰语中定语的语序差异及泰国学生汉语定语习得研究[D].杭州:浙江大学,2012.
    [25]张金花.汉泰语对比浅析[J].群文天地月刊,2012(2):98.
    [26]孙汉萍.汉泰语的同异性比较[J].湘潭师范学院学报:社会科学版,1995(2):34-39.
    [27]赵世瑜.泰语词法分析关键技术研究[D].昆明:昆明理工大学,2017.
    [28]周国光.汉语配价语法论略[J].南京师范大学学报:社科版,1994(4):103-106.
    [29]付剑锋,刘宗田,付雪峰,等.基于依存分析的事件识别[J].计算机科学,2009,36(11):217-219.
    [30]彭籍冲.泰语新闻事件抽取方法研究[D].昆明:昆明理工大学,2017.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700