用户名: 密码: 验证码:
泰语人名、地名、机构名实体识别研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Thai Language Names, Place Names and Organization Names Entity Recognition
  • 作者:王红斌 ; 郜洪奎 ; 沈强 ; 线岩团
  • 英文作者:Wang Hongbin;Gao Hongkui;Shen Qiang;Xian Yantuan;College of Information Engineering and Automation, Kunming University of Science and Technology;Key Laboratory of Intelligent Information Processing, Kunming University of Science and Technology;
  • 关键词:命名实体识别 ; 隐马尔科夫统计模型 ; 条件随机场统计模型 ; 序列标注
  • 英文关键词:named entity recognition;;hidden Markov statistical model;;conditional random field statistical model;;sequence labeling
  • 中文刊名:XTFZ
  • 英文刊名:Journal of System Simulation
  • 机构:昆明理工大学信息工程与自动化学院;昆明理工大学智能信息处理重点实验室;
  • 出版日期:2019-05-08
  • 出版单位:系统仿真学报
  • 年:2019
  • 期:v.31
  • 基金:国家自然科学基金(61462054,61363044);; 云南省科技厅面上项目(2015FB135);; 昆明理工大学省级人培项目(KKSY201403028)
  • 语种:中文;
  • 页:XTFZ201905024
  • 页数:9
  • CN:05
  • ISSN:11-3092/V
  • 分类号:196-204
摘要
泰语命名实体识别是把泰语文本中的人名、地名、机构名等识别出来。由于泰语构词方法和语法规则复杂,针对这一问题,将泰语命名实体识别任务转化为对泰语句子中的词汇序列进行标记。结合泰语语言特点,选择合适的泰语上下文特征,分别使用隐马尔科夫模型和条件随机场模型在泰语实体识别训练语料上进行了模型构建,并在测试语料上对所构建的序列标注模型进行了实验验证。实验结果表明使用隐马尔科夫模型和条件随机场模型进行泰语人名、地名、机构名实体识别是可行的,并取得了较好的效果。
        Named entity recognition in Thai language is aimed to identify the names of a person, a locality,an organization or an institution,and so on. Due to the complexity of Thai word formation method and grammar rules, to solve this problem, the idea of the approach proposed is to treat the task of named entity recognition in Thai language as labeling the sign of a series of words in Thai sentence. Given the characteristics of Thai language itself, certain features in the context of the samples in the Thai entity recognition corpus are extracted to train the hidden Markov model and the conditional random field model respectively, and then the labeling model is built based on the training corpus. We verify the labeling model on the test corpus through experiments. The experiment result shows that the method adopting the hidden Markov model and the conditional random field model is feasible to accomplish the task of recognizing the identification of the person, the location, and the organization or the institution;and the recognition effectiveness is well.
引文
[1]宗成庆.统计自然语言处理[M].北京:清华大学出版社,2013.Zong Chengqing.Statistical natural language processing[M].Beijing:Tsinghua University Press,2013.
    [2]闫丹辉,毕玉德.基于规则的越南语命名实体识别研究[J].中文信息学报,2014,28(5):198-214.Yan Danhui,Bi Yude.Study on entity recognition of Vietnamese naming based on rule[J].Chinese Journal of Information,2014,28(5):198-214.
    [3]邱泉清,苗夺谦,张志飞.中文微博命名实体识别[J].计算机科学,2013,40(6):196-198.Qiu Quanqing,Miao Duoqian,Zhang Zhifei.Chinese microblogging named entity recognition[J].Computer Science,2013,40(6):196-198.
    [4]钟志农,刘方驰,吴烨,等.主动学习与自学习的中文命名实体识别[J].国防科技大学学报,2014,36(4):82-88.Zhong Zhinong,Liu Fangchi,Wu Ye,et al.Chinese named entity recognition combined active learning with self-training[J].Journal of National University of Defense Technology,2014,36(04):82-88.
    [5]何炎祥,罗楚威,胡彬尧.基于CRF和规则想结合的地理命名实体识别方法[J].计算机应用与软件,2015,32(1):179-202.He Yanxiang,Luo Chuwei,Hu Binyao.A method of geographic named entity recognition based on CRF and rule combination[J].Computer applications and software,2015,32(1):179-202.
    [6]赵军.命名实体识别、排歧和跨语言关联[J].中文信息学报,2009,23(2):3-17.Zhao Jun.Named entity recognition,disambiguation and Cross Language Association[J].Chinese Journal of information science,2009,23(2):3-17.
    [7]栗伟,赵大哲,李博,等.CRF与规则相结合的医学病历实体识别[J].计算机应用研究,2015,32(4):1082-1086.Li Wei,Zhao Dazhe,Li Bo,et al.The combination of CRF and rules for the identification of medical records in[J].Application Research of Computers,2015,32(4):1082-1086.
    [8]陈晖.半监督的命名实体识别[D].北京:北京交通大学,2011.Chen Hui.Semi supervised named entity recognition[D].Beijing:Beijing Jiaotong University,2011.
    [9]黄诗琳,郑小林,陈德人.针对产品命名实体识别的半监督学习方法[J].北京邮电大学学报,2013,36(2):20-23.Huang Shilin,Zheng Xiaolin,Chen Deren.According to the product named entity recognition of semi supervised learning method[J].Journal of Beijing University of Posts and Telecommunications,2013,36(2):20-23.
    [10]王红斌,沈强,线岩团.融合迁移学习的中文命名实体识别[J].小型微型计算机系统,2017,38(2):346-351.Wang Hongbin,Shen Qiang,Xian Yantuan.Chinese Named Entity Recognition Based on fusion migration learning[J].Mini micro computer system,2017,38(2):346-351.
    [11]Chanlekha H,Kawtrakul A,Varasrai P,et al.Statistical and Heuristic Rule Based Model for Thai Named Entity Recognition[C].Proceeding of the SNLP-Oriental COCOSDA,Thailand,Hua Hin:O-COCOSDA,2002.
    [12]Nutcha Tirasaroj,Wirote Aroonmanakun.Thai Named Entity Recognition Based on Conditional Random Fields[C].Proceedings of the 2009 Eighth International Symposium on Natural Language Processing,Piscataway,NJ:IEEE,2009:216-220.
    [13]Nutcha Tirasaroj,Wirote Aroonmanakun.The Effect of Answer Patterns for Supervised Named Entity Recognition in Thai[C].Proceedings of the Pacific Asia Conference on Language,Information and Computation,Stroudsburg,PA:ACL,2011:392-399.
    [14]赵世瑜.泰语词法分析关键技术研究[D].昆明:昆明理工大学,2016.Zhao Shiyu.Study on Key Techniques of Thai Lexical Analysis[D].Kunming:Kunming University of Science and Technology,2016.
    [15]Liu Feifan,Zhao Jun,LüBibo,et al.Study on Product Naming Entity Recognition for Business Information Extraction[J].Journal of Chinese Information Processing(S1003-0077),2006,20(1):7-13.
    [16]俞鸿魁,张华平,刘群,等.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94.Yu Hongkui,Zhang Huaping,Liu Qun,et al.Chinese Naming Entity Recognition Based on Cascaded Hidden Markov Model[J].Journal of Communications,2006,27(2):87-94.
    [17]张玉珍,丁思捷,王建宇,等.基于HMM的融合多模态的事件检测[J].系统仿真学报,2012,24(8):80-84.Zhang Yuzhen,Ding Sijie,Wang Jianyu,et al.Event Detection by Fusing Multimodal Objects Using HMM[J].Journal of System Simulation,2012,24(8):80-84.
    [18]张朝胜,郭剑毅,线岩团,等.基于条件随机场的英文产品命名实体识别[J].计算机工程与科学,2010,32(6):115-117.Zhang Chaosheng,Guo Jianyi,Xian Yantuan,et al.Optimization of English product naming entities based on conditional random field[J].Computer Engineering and Science,2010,32(6):115-117.
    [19]孙晓,孙重远,任福继.基于深层条件随机场的生物医学命名实体识别[J].模式识别与人工智能,2016,29(11):997-1008.Sun Xiao,Sun Zhongyuan,Ren Fuji.Biomedical Naming Entity Recognition Based on Deep Condition Random Field[J].Pattern Recognition and Artificial Intelligence,2016,29(11):997-1008.
    [20]王爱平,潘衡岳,李思昆.一种在线学习的视频图像分割算法[J].系统仿真学报,2012,24(1):81-85.Wang Aiping,Pan Hengyue,Li Sikun.Online Learning Based Video Segmentation Algorithm[J].Journal of System Simulation,2012,24(1):81-85.
    [21]Boriboon M,Kriengket K,Chootrakool P,et al.Best corpus development and analysis[C].2009 International Conference on Asian Language Processing.IEEE,2009:322-327.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700