基于Bi-LSTM和注意力机制的命名实体识别

英文篇名：Named Entity Recognition Based on Bi-LSTM and self-Attention Mechanism
作者：刘晓俊 ; 辜丽川 ; 史先章
英文作者：LIU Xiaojun;GU Lichuan;SHI Xianzhang;Anhui Agricultural University;
关键词：命名实体识别 ; DC-BiLSTM ; 注意力机制 ; 条件随机场
英文关键词：NER;;DC-BiLSTM;;Attention mechanism;;CRF
中文刊名：LYGY
英文刊名：Journal of Luoyang Institute of Science and Technology(Natural Science Edition)
机构：安徽农业大学信息与计算机学院;
出版日期：2019-03-25
出版单位：洛阳理工学院学报(自然科学版)
年：2019
期：v.29
基金：国家自然科学基金项目(31771679)
语种：中文;
页：LYGY201901014
页数：7
CN：01
ISSN：41-1403/N
分类号：68-73+80

摘要

命名实体识别是自然语言处理中一项重要的基础任务,本文提出一种简单、新颖的深层循环神经网络的命名实体识别(Named Entity Recognition,NER)方法。使用一种稠密连接的方式(Dense connection,DC)在多层的双向长短期记忆神经网络(bi-directional long short-term memory, Bi-LSTM)之间传递信息,称这种网络结构为DC-BiLSTM。利用DC-BiLSTM来学习句子特征,并采用自注意力机制(self-attention)来捕捉任意两个标记之间的关系,最后使用条件随机场(CRF)对整个句子进行解码预测。实验表明结果,该方法在MSRA语料上平均F1值能达到91.81%,最高F1值能达到92.05%。
Named entity recognition(NER) is an important basic task in natural language processing(NLP). In this paper, a simple and novel neural network model for NER is proposed. And a dense connection model(DC) is introduced to transfer information between multiple layers of bi-directional long short-term memories(Bi-LSTM), which are named DC-BiLSTM. DC-BiLSTM is applied to automatically learn sentence features and self-attention mechanism is used to capture the relationships between two tokens. At last, Conditional Random Field(CRF) is adopted for sentence level prediction. The results show that this method achieves F1 92.05% and average F1 91.81% in MSRA without any artificial feature.

引文

[1]孙镇,王惠临.命名实体识别研究进展综述[J].现代图书情报技术,2010(6):42-47.
    [2]Luo G,Huang X,Lin C Y,et al.Joint entity recognitionand disambiguation[C]//In Conference on Empirical Methods in Natural Language Processing,2016:879-888.
    [3]Passos A,Kumar V,Mccallum A.Lexicon Infused Phrase Embeddings for Named Entity Resolution[C]//In Proceedings of the Eighteenth Conference on Computational Natural Language Learning,2014:78-86.
    [4]何炎祥,罗楚威,胡彬尧.基于CRF和规则相结合的地理命名实体识别方法[J].计算机应用与软件,2015,32(1):179-185.
    [5]Collobert R,Weston J.A unified architecture for natural language processing:Deep neural networks with multitask learning[C]//Proceedings of the 25th international conference on Machine learning,2008:160-167.
    [6]Turian J,Ratinov L,Bengio Y.Word representations:a simple and general method for semi-supervised learning[C]//Proceedings of the48th annual meeting of the association for computational linguistics.Association for Computational Linguistics,2010:384-394.
    [7]Lample G,Ballesteros M,Subramanian S,et al.Neural Architectures for Named Entity Recognition[J].Computer Science,2016:260-270.
    [8]Ma X,Hovy E.End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF[J].In Association for Computational Linguistics(ACL),2016:848-865.
    [9]Huang G,Liu Z,Van Der Maaten L,et al.Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition,2017:4700-4708.
    [10]Vaswani A,Shazeer N,Parmar N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems,2017:5998-6008.
    [11]Ratinov L,Roth D.Design Challenges and Misconceptions in Named Entity Recognition[C]//Conll 09:Thirteenth Conference on Computational Natural Language Learning,2009:147-155.
    [12]Collobert R,Weston J,Bottou L,et al.Natural Language Processing(Almost)from Scratch[J].Journal of Machine Learning Research,2011(12):2493-2537.
    [13]Srivastava N,Hinton G,Krizhevsky A,et al.Dropout:A Simple Way to Prevent Neural Networks from Overfitting[J].Journal of Machine Learning Research,2014,15(1):1929-1958.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700