摘要
否定表达作为自然语言文本中常见的语言现象,对自然语言处理上层应用,如情感分析、信息抽取等,具有十分重要的意义。否定聚焦点识别任务是更细粒度的否定语义分析,其旨在识别出句子中被否定词修饰和强调的文本片段。该文将该任务作为序列标注问题,提出了一种基于双向长短期记忆网络结合条件随机场(BiLSTMCRF)的否定聚焦点识别模型,其中,BiLSTM网络能够充分利用上下文信息并抓取全局特征,CRF层能够有效学习输出标签之间的前后依赖关系。在*SEM2012评测任务数据集上的实验结果表明,基于BiLSTM-CRF的否定聚焦点识别方法的准确率(accuracy)达到69.58%,与目前最好的系统相比,性能提升了2.44%。
Negative expressions are common phenomena in natural language text and play a critical role in various applications of natural language processing,such as sentiment analysis,information extraction.Negation focus identification task is a finer-grained negative semantic analysis,which aims at identifying the text fragment modified and emphasized by a negative keyword.Treating the negation focus identification as a sequence labeling task,we propose a bidirectional Long Short-Term Memory network with a Conditional Random Field layer(BiLSTM-CRF).It can not only learn the contextual information from both directions,but also learn the dependency between the output tags by the CRF layer.Experimental results on the*SEM2012 dataset shows that the performance of our approach achieves an accuracy of 69.58%,i.e.2.44%improvement compared to the state-of-the-art methods.
引文
[1]Blanco E,Moldovan D.Semantic representation of negation using focus detection[C]//Proceedings of the49th Annual Meeting of the Association for Computational Linguistics(ACL),2011:581-589.
[2]Rosenberg S,Bergler S.UConcordia:CLaC negation focus detection at*Sem 2012[C]//Proceedings of the Joint Conferece on Lexical and Computational Semantics.Association for Computational Linguistics,2013:294-300.
[3]Cho K,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[J].arXiv:406.1078.2014.
[4]Bahdanau D,Cho K,Bengio Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473.2014.
[5]Santos C,Gattit M.Deep convolutional neural networks for sentiment analysis of short texts[C]//Proceedings of the International Conference on Computational Linguistics,2014.
[6]Wang J,et al.Dimensional sentiment analysis using a regional CNN-LSTM model[C]//Proceedings of the Meeting of the Association for Computational Linguistics,2016:225-230.
[7]Zeng D,et al.Distant supervision for relation extraction via piecewise convolutional neural networks[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing.2015:1753-1762.
[8]Lin Y,et al.Neural relation extraction with selective attention over instances[C]//Proceedings of the Meeting of the Association for Computational Linguistics,2016:2124-2133.
[9]Goller C,Kuchler A.Learning task-dependent distributed representations by backpropagation through structure[C]//Proceedings of the IEEE International Conference on Neural Networks,1996:347-352.
[10]Hochreiter S,Schmidhuber J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[11]Gers F,Schmidhuber J,Cummins F.Learning to Forget:Continual prediction with LSTM[J].Neural Computation,2000,12(10):2451-2471.
[12]Cho K,et al.On the Properties of neural machine translation:Encoder-Decoder Approaches[C]//Proceedings of SSST-8,Eighth Workshop on Syntax,Semantics and Structure in Statistical Translation,2014:103-111.
[13]Palmer M,Gildea D,Kingsbury P.The proposition Bank:An annotated corpus of semantic roles[J].Computational Linguistics,2005,31(1):71-106.
[14]Morante R,Blanco E.*SEM 2012Shared Task:Resolving the Scope and Focus of Negation[C]//Proceedings of the First Joint Conference on Lexical and Computational Semantics(*SEM),2012:265-274.
[15]Zou B,Zhu Q,Zhou G.Negation focus identification with contextual discourse information[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics(ACL),2014:522-530.
[16]Bengio Y,Simard P,Frasconi P.Learning long-term dependencies with gradient descent is difficult[J].IEEE Transactions on Neural Networks,2002,5(2):157-166.
[17]Pascanu R,Mikolov T,Bengio Y.On the difficulty of training recurrent neural networks[C]//Proceedings of the International Conference on Machine Learning.2013:1310-1318.
[18]Graves A,Schmidhuber J.Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J].Neural Networks,2005,18(5):602-610.
[19]Graves A,Mohamed A,Hinton G.Speech recognition with deep recurrent neural networks[C]//Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing IEEE,2013:6645-6649.
[20]Lafferty J,Mccallum A,Pereira F.Conditional random fields:Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the Eighteenth International Conference on Machine Learning.Morgan Kaufmann Publishers Inc,2001:282-289.
[21]Huang Z,Xu W,Yu K.Bidirectional LSTM-CRFmodels for sequence tagging[J].arXiv:1508.01991.2015.
[22]Ma X,Hovy E.End-to-end Sequence labeling via Bidirectional LSTM-CNNs-CRF[C]//Proceedings of the Meeting of the Association for Computational Linguistics,2016:1064-1074.
[23]Lample G,et al.Neural architectures for named entity recognition[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies(NAACL-HLT),2016:260-270.
[24]Poon H,Domingos P.Unsupervised semantic parsing[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2009:1-10.
[25]Pradhan S,et al.Shallow semantic parsing using support vector machines[C]//Proceedings of the North American Chapter of the Association for Computational Linguistics,2003:233-240.
[26]Soricut R,Marcu D.Sentence level discourse parsing using syntactic and lexical information[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology,2003:149-156.
[27]Collobert R,et al.Natural language processing(almost)from scratch[J].The Journal of Machine Learning Research,2011,(12):2493-2537.
[28]Pennington J,Socher R,Manning C.Glove:Global vectors for word representation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2014:1532-1543.
[29]Mikolov T,et al.Distributed representations of words and phrases and their compositionality[J].Advances in Neural Information Processing Systems,2013,(26):3111-3119.
[30]Zeiler M.ADADELTA:An Adaptive Learning Rate Method[J].arXiv:1212.5701.2012.
[31]Kingma D,Ba J.Adam:A Method for Stochastic Optimization[J].arXiv:1412.6980.2014.
(1)本文用粗体表示否定运算符,用下划线表示否定聚焦点。
(1)PropBank语料库对谓语动词和20多种语义角色进行了标注。
(2)以维基百科和Reuters RCV-1语料库为训练数据,http://ronan.collobert.com/senna/
(3)以维基百科和网页文本60亿个词为训练数据,http://nlp.stanford.edu/projects/glove/
(4)以谷歌新闻语料1 000亿个词为训练数据,https://code.google.com/archive/p/word2vec/
(1)We only target verbal negations and focus is always the full text of a semantic role.