多场景文本的细粒度命名实体识别
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Fine-grained Named Entity Recognition for Multi-scenario
  • 作者:盛剑 ; 向政鹏 ; 秦兵 ; 刘铭 ; 王莉峰
  • 英文作者:SHENG Jian;XIANG Zhengpeng;QIN Bing;LIU Ming;WANG Lifeng;Research Center for Social Computing and Information Retrieval,Harbin Institute of Technology;Tencent Technology (Shenzhen) CO.,Ltd.;
  • 关键词:命名实体识别 ; 细粒度类别划分 ; 语料回标
  • 英文关键词:named entity recognition;;fine-grained category annotation;;corpus annotation
  • 中文刊名:MESS
  • 英文刊名:Journal of Chinese Information Processing
  • 机构:哈尔滨工业大学社会技术与信息检索研究中心;腾讯科技(深圳)有限公司;
  • 出版日期:2019-06-15
  • 出版单位:中文信息学报
  • 年:2019
  • 期:v.33
  • 基金:国家自然科学基金(61632011,61772156,61472107)
  • 语种:中文;
  • 页:MESS201906012
  • 页数:8
  • CN:06
  • ISSN:11-2325/N
  • 分类号:85-92
摘要
命名实体识别一直是数据挖掘领域的经典问题之一,尤其随着网络数据的剧增,如果能对多来源的文本数据进行多领域、细粒度的命名实体识别,显然能够为很多的数据挖掘应用提供支持。该文提出一种多领域、细粒度的命名实体识别方法,利用网络词典回标文本数据获得了大量的粗糙训练文本。为防止训练文本中的噪声干扰命名实体识别的结果,该算法将命名实体识别的过程划分为两个阶段,第一个阶段先获得命名实体的领域标签,之后利用命名实体的上下文确定命名实体的细粒度标签。实验结果显示,该文提出的方法使F_1值在全领域上平均值达到了80%左右。
        Name entity recognition is a classical research issue in data mining community.To recognize the entities in multi-domain with fine-grained labels,we propose a method of utilizes web thesaurus to annotate web data automatically to acquire large-scale training corpus.To minimize the influence of the noises in training corpus,we design a two-phase entity recognition method.First,the entity's domain label is obtained.After that,the context of each recognized entity is used to determine the fine-grained label for one entity.Experimental results demonstrate that the proposed method can obtain high accuracy on entity recognition in multiple domains.
引文
[1]Fine S,Singer Y,Tishby N.The hierarchical hidden Markov model:Analysis and applications[J].Machine learning,1998,32(1):41-62.
    [2]Borthwick A,Grishman R.A maximum entropy approach to named entity recognition[D].New York U-niversity,Graduate School of Arts and Science,1999.
    [3]McCallum A,Freitag D,Pereira F C N.Maximum entropy Markov models for information extraction and segmentation[C]//Proceedings of the 17th ICML,2000:591-598.
    [4]Lafferty J,Mc Callum A,Pereira F C N.Conditional random fields:Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th ICML,2001:282-289.
    [5]Ratnaparkhi A.A maximum entropy model for partof-speech tagging[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,1996:133-142.
    [6]Collobert R,Weston J,Bootul L.Natural language processing(almost)from Scratch[J].arXiv preprine arXiv:1103.0398.2011.
    [7]Mikolov T,Sutskever I,Chen K,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems,2013:3111-3119.
    [8]Gers F A,Schraudolph N N,Schmidhuber J.Learning precise timing with LSTM recurrent networks[J].Journal of Machine Learning Research,2002,3(Aug):115-143.
    [9]Huang Z,Xu W,Yu K.Bidirectional LSTM-CRFmodels for sequence tagging[J].arXiv preprint arXiv:1508.01991,2015:1-10.
    [10]Ando R K,Zhang T.A framework for learning predictive structures from multiple tasks and unlabeled data[J].The Journal of Machine Learning Research,2005(6):1817-1853.
    [11]Jing H Y,Zhang T.Named entity recognition through classifier combination[C]//Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL-2003,2003:168-171.
    [12]Kim Yoon.Convolutional neural networks for sentence classification[C]//Proceedings of the 2014Conference on Empirical Methods in Natural Language Processing(EMNLP),2014:1746-1751.
    [13]Zhang Y,Wallace B.A sensitivity analysis of(and practitioners'guide to)convolutional neural networks for sentence classification[J].arXiv preprint arXiv:1510.03820,2015.
    [14]M Boden.A guide to recurrent neural networks and back-propagation[R].SICS Technical Report J 2002:03,SICS.
    [15]Hammerton J.Named entity recognition with Long short-term memory[C]//Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003-Volume 4.Association for Computational Linguistics,2003:172-175.
    [16]Tang D Qin B,Feng X.Effective LSTMs for targetdependent sentiment classification[C]//Proceedings of the COLING 2016,2016:3298-3307.
    [17]Yin Q,Zhang Y,Zhang W.Chinese zero pronoun resolution with deep memory network[C]//Proceedings of the 2017Conference on Empirical Methods in Natural Language Processing,2017:1309-1318.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700