多场景文本的细粒度命名实体识别

英文篇名：Fine-grained Named Entity Recognition for Multi-scenario
作者：盛剑 ; 向政鹏 ; 秦兵 ; 刘铭 ; 王莉峰
英文作者：SHENG Jian;XIANG Zhengpeng;QIN Bing;LIU Ming;WANG Lifeng;Research Center for Social Computing and Information Retrieval,Harbin Institute of Technology;Tencent Technology (Shenzhen) CO.,Ltd.;
关键词：命名实体识别 ; 细粒度类别划分 ; 语料回标
英文关键词：named entity recognition;;fine-grained category annotation;;corpus annotation
中文刊名：MESS
英文刊名：Journal of Chinese Information Processing
机构：哈尔滨工业大学社会技术与信息检索研究中心;腾讯科技(深圳)有限公司;
出版日期：2019-06-15
出版单位：中文信息学报
年：2019
期：v.33
基金：国家自然科学基金(61632011,61772156,61472107)
语种：中文;
页：MESS201906012
页数：8
CN：06
ISSN：11-2325/N
分类号：85-92

摘要

命名实体识别一直是数据挖掘领域的经典问题之一,尤其随着网络数据的剧增,如果能对多来源的文本数据进行多领域、细粒度的命名实体识别,显然能够为很多的数据挖掘应用提供支持。该文提出一种多领域、细粒度的命名实体识别方法,利用网络词典回标文本数据获得了大量的粗糙训练文本。为防止训练文本中的噪声干扰命名实体识别的结果,该算法将命名实体识别的过程划分为两个阶段,第一个阶段先获得命名实体的领域标签,之后利用命名实体的上下文确定命名实体的细粒度标签。实验结果显示,该文提出的方法使F_1值在全领域上平均值达到了80%左右。
Name entity recognition is a classical research issue in data mining community.To recognize the entities in multi-domain with fine-grained labels,we propose a method of utilizes web thesaurus to annotate web data automatically to acquire large-scale training corpus.To minimize the influence of the noises in training corpus,we design a two-phase entity recognition method.First,the entity's domain label is obtained.After that,the context of each recognized entity is used to determine the fine-grained label for one entity.Experimental results demonstrate that the proposed method can obtain high accuracy on entity recognition in multiple domains.

引文

[1]Fine S,Singer Y,Tishby N.The hierarchical hidden Markov model:Analysis and applications[J].Machine learning,1998,32(1):41-62.
    [2]Borthwick A,Grishman R.A maximum entropy approach to named entity recognition[D].New York U-niversity,Graduate School of Arts and Science,1999.
    [3]McCallum A,Freitag D,Pereira F C N.Maximum entropy Markov models for information extraction and segmentation[C]//Proceedings of the 17th ICML,2000:591-598.
    [4]Lafferty J,Mc Callum A,Pereira F C N.Conditional random fields:Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th ICML,2001:282-289.
    [5]Ratnaparkhi A.A maximum entropy model for partof-speech tagging[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,1996:133-142.
    [6]Collobert R,Weston J,Bootul L.Natural language processing(almost)from Scratch[J].arXiv preprine arXiv:1103.0398.2011.
    [7]Mikolov T,Sutskever I,Chen K,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems,2013:3111-3119.
    [8]Gers F A,Schraudolph N N,Schmidhuber J.Learning precise timing with LSTM recurrent networks[J].Journal of Machine Learning Research,2002,3(Aug):115-143.
    [9]Huang Z,Xu W,Yu K.Bidirectional LSTM-CRFmodels for sequence tagging[J].arXiv preprint arXiv:1508.01991,2015:1-10.
    [10]Ando R K,Zhang T.A framework for learning predictive structures from multiple tasks and unlabeled data[J].The Journal of Machine Learning Research,2005(6):1817-1853.
    [11]Jing H Y,Zhang T.Named entity recognition through classifier combination[C]//Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL-2003,2003:168-171.
    [12]Kim Yoon.Convolutional neural networks for sentence classification[C]//Proceedings of the 2014Conference on Empirical Methods in Natural Language Processing(EMNLP),2014:1746-1751.
    [13]Zhang Y,Wallace B.A sensitivity analysis of(and practitioners'guide to)convolutional neural networks for sentence classification[J].arXiv preprint arXiv:1510.03820,2015.
    [14]M Boden.A guide to recurrent neural networks and back-propagation[R].SICS Technical Report J 2002:03,SICS.
    [15]Hammerton J.Named entity recognition with Long short-term memory[C]//Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003-Volume 4.Association for Computational Linguistics,2003:172-175.
    [16]Tang D Qin B,Feng X.Effective LSTMs for targetdependent sentiment classification[C]//Proceedings of the COLING 2016,2016:3298-3307.
    [17]Yin Q,Zhang Y,Zhang W.Chinese zero pronoun resolution with deep memory network[C]//Proceedings of the 2017Conference on Empirical Methods in Natural Language Processing,2017:1309-1318.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700