基于双向LSTM与CRF融合模型的否定聚焦点识别

英文篇名：Negation Focus Identification via Bi-directional LSTM-CRF Model
作者：沈龙骧 ; 邹博伟 ; 叶静 ; 周国栋 ; 朱巧明
英文作者：SHEN Longxiang;ZOU Bowei;YE Jing;ZHOU Guodong;ZHU Qiaoming;School of Computer Science and Technology,Soochow University;
关键词：否定聚焦点 ; BiLSTM-CRF模型 ; 序列标注
英文关键词：negation focus;;BiLSTM-CRF model;;sequence labeling
中文刊名：MESS
英文刊名：Journal of Chinese Information Processing
机构：苏州大学计算机科学与技术学院;
出版日期：2019-01-15
出版单位：中文信息学报
年：2019
期：v.33
基金：国家自然科学基金(61703293,61672367);; 江苏省科技计划(BK20151222)
语种：中文;
页：MESS201901005
页数：10
CN：01
ISSN：11-2325/N
分类号：30-39

摘要

否定表达作为自然语言文本中常见的语言现象,对自然语言处理上层应用,如情感分析、信息抽取等,具有十分重要的意义。否定聚焦点识别任务是更细粒度的否定语义分析,其旨在识别出句子中被否定词修饰和强调的文本片段。该文将该任务作为序列标注问题,提出了一种基于双向长短期记忆网络结合条件随机场(BiLSTMCRF)的否定聚焦点识别模型,其中,BiLSTM网络能够充分利用上下文信息并抓取全局特征,CRF层能够有效学习输出标签之间的前后依赖关系。在*SEM2012评测任务数据集上的实验结果表明,基于BiLSTM-CRF的否定聚焦点识别方法的准确率(accuracy)达到69.58%,与目前最好的系统相比,性能提升了2.44%。
Negative expressions are common phenomena in natural language text and play a critical role in various applications of natural language processing,such as sentiment analysis,information extraction.Negation focus identification task is a finer-grained negative semantic analysis,which aims at identifying the text fragment modified and emphasized by a negative keyword.Treating the negation focus identification as a sequence labeling task,we propose a bidirectional Long Short-Term Memory network with a Conditional Random Field layer(BiLSTM-CRF).It can not only learn the contextual information from both directions,but also learn the dependency between the output tags by the CRF layer.Experimental results on the*SEM2012 dataset shows that the performance of our approach achieves an accuracy of 69.58%,i.e.2.44%improvement compared to the state-of-the-art methods.

引文

[1]Blanco E,Moldovan D.Semantic representation of negation using focus detection[C]//Proceedings of the49th Annual Meeting of the Association for Computational Linguistics(ACL),2011:581-589.
    [2]Rosenberg S,Bergler S.UConcordia:CLaC negation focus detection at*Sem 2012[C]//Proceedings of the Joint Conferece on Lexical and Computational Semantics.Association for Computational Linguistics,2013:294-300.
    [3]Cho K,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[J].arXiv:406.1078.2014.
    [4]Bahdanau D,Cho K,Bengio Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473.2014.
    [5]Santos C,Gattit M.Deep convolutional neural networks for sentiment analysis of short texts[C]//Proceedings of the International Conference on Computational Linguistics,2014.
    [6]Wang J,et al.Dimensional sentiment analysis using a regional CNN-LSTM model[C]//Proceedings of the Meeting of the Association for Computational Linguistics,2016:225-230.
    [7]Zeng D,et al.Distant supervision for relation extraction via piecewise convolutional neural networks[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing.2015:1753-1762.
    [8]Lin Y,et al.Neural relation extraction with selective attention over instances[C]//Proceedings of the Meeting of the Association for Computational Linguistics,2016:2124-2133.
    [9]Goller C,Kuchler A.Learning task-dependent distributed representations by backpropagation through structure[C]//Proceedings of the IEEE International Conference on Neural Networks,1996:347-352.
    [10]Hochreiter S,Schmidhuber J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
    [11]Gers F,Schmidhuber J,Cummins F.Learning to Forget:Continual prediction with LSTM[J].Neural Computation,2000,12(10):2451-2471.
    [12]Cho K,et al.On the Properties of neural machine translation:Encoder-Decoder Approaches[C]//Proceedings of SSST-8,Eighth Workshop on Syntax,Semantics and Structure in Statistical Translation,2014:103-111.
    [13]Palmer M,Gildea D,Kingsbury P.The proposition Bank:An annotated corpus of semantic roles[J].Computational Linguistics,2005,31(1):71-106.
    [14]Morante R,Blanco E.*SEM 2012Shared Task:Resolving the Scope and Focus of Negation[C]//Proceedings of the First Joint Conference on Lexical and Computational Semantics(*SEM),2012:265-274.
    [15]Zou B,Zhu Q,Zhou G.Negation focus identification with contextual discourse information[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics(ACL),2014:522-530.
    [16]Bengio Y,Simard P,Frasconi P.Learning long-term dependencies with gradient descent is difficult[J].IEEE Transactions on Neural Networks,2002,5(2):157-166.
    [17]Pascanu R,Mikolov T,Bengio Y.On the difficulty of training recurrent neural networks[C]//Proceedings of the International Conference on Machine Learning.2013:1310-1318.
    [18]Graves A,Schmidhuber J.Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J].Neural Networks,2005,18(5):602-610.
    [19]Graves A,Mohamed A,Hinton G.Speech recognition with deep recurrent neural networks[C]//Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing IEEE,2013:6645-6649.
    [20]Lafferty J,Mccallum A,Pereira F.Conditional random fields:Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the Eighteenth International Conference on Machine Learning.Morgan Kaufmann Publishers Inc,2001:282-289.
    [21]Huang Z,Xu W,Yu K.Bidirectional LSTM-CRFmodels for sequence tagging[J].arXiv:1508.01991.2015.
    [22]Ma X,Hovy E.End-to-end Sequence labeling via Bidirectional LSTM-CNNs-CRF[C]//Proceedings of the Meeting of the Association for Computational Linguistics,2016:1064-1074.
    [23]Lample G,et al.Neural architectures for named entity recognition[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies(NAACL-HLT),2016:260-270.
    [24]Poon H,Domingos P.Unsupervised semantic parsing[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2009:1-10.
    [25]Pradhan S,et al.Shallow semantic parsing using support vector machines[C]//Proceedings of the North American Chapter of the Association for Computational Linguistics,2003:233-240.
    [26]Soricut R,Marcu D.Sentence level discourse parsing using syntactic and lexical information[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology,2003:149-156.
    [27]Collobert R,et al.Natural language processing(almost)from scratch[J].The Journal of Machine Learning Research,2011,(12):2493-2537.
    [28]Pennington J,Socher R,Manning C.Glove:Global vectors for word representation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2014:1532-1543.
    [29]Mikolov T,et al.Distributed representations of words and phrases and their compositionality[J].Advances in Neural Information Processing Systems,2013,(26):3111-3119.
    [30]Zeiler M.ADADELTA:An Adaptive Learning Rate Method[J].arXiv:1212.5701.2012.
    [31]Kingma D,Ba J.Adam:A Method for Stochastic Optimization[J].arXiv:1412.6980.2014.
    (1)本文用粗体表示否定运算符,用下划线表示否定聚焦点。
    (1)PropBank语料库对谓语动词和20多种语义角色进行了标注。
    (2)以维基百科和Reuters RCV-1语料库为训练数据,http://ronan.collobert.com/senna/
    (3)以维基百科和网页文本60亿个词为训练数据,http://nlp.stanford.edu/projects/glove/
    (4)以谷歌新闻语料1 000亿个词为训练数据,https://code.google.com/archive/p/word2vec/
    (1)We only target verbal negations and focus is always the full text of a semantic role.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700