面向文本结构的混合分层注意力网络的话题归类

英文篇名：Text Structure Oriented Hybrid Hierarchical Attention Networks for Topic Classification
作者：车蕾 ; 杨小平 ; 王良 ; 梁天新 ; 韩镇远
英文作者：CHE Lei;YANG Xiaoping;WANG Liang;LIANG Tianxin;HAN Zhenyuan;School of Information,Renmin University of China;School of Information Management,Beijing Information Science & Technology University;
关键词：深度学习 ; 注意力机制 ; 混合分层注意力网络 ; 话题归类
英文关键词：deep learning;;attention mechanism;;hybrid hierarchical attention networks;;topic classification
中文刊名：MESS
英文刊名：Journal of Chinese Information Processing
机构：中国人民大学信息学院;北京科技大学信息管理学院;
出版日期：2019-05-15
出版单位：中文信息学报
年：2019
期：v.33
基金：北京市教委社科计划(SM201911232003);; 国家自然科学基金(61572079);; 北京市教委科技计划(KM201711417004)
语种：中文;
页：MESS201905011
页数：11
CN：05
ISSN：11-2325/N
分类号：98-107+117

摘要

针对目前话题归类模型中文本逻辑结构特征与文本组织结构特征利用不充分的问题,该文提出一种面向文本结构的混合分层注意力网络的话题归类模型(TSOHHAN)。文本结构包括逻辑结构和组织结构,文本的逻辑结构包括标题和正文等信息;文本的组织结构包括字—词语—句层次。TSOHHAN模型采用竞争机制融合标题和正文以增强文本逻辑结构特征在话题归类中的作用;同时该模型采用字-词语-句层次的注意力机制增强文本组织结构特征在话题归类中的作用。在4个标准数据集上的实验结果表明,TSOHHAN模型能够提高话题归类任务的准确率。
To better utilize text logical structure features and text organizational structure features in topic classification,this paper proposes a text structure oriented hybrid hierarchical attention network for this task.The logical structure usually includes information such as title and text,and the organizational structure includes character-wordsentence layer.The model integrates text headings and text bodies to improve the role of logical structure features in topic classification,and improves the role of text organizational structure features in topic classification based on the attention mechanism of char-sentence and word-sentence levels.Experimental results on 4 datasets show that the proposed model can improve the accuracy of topic classification tasks.

引文

[1]Liu N.Topic detection and tracking.In:Encyclopedia of Database Systems[M].Springer,New York,NY,2016.
    [2]Uysal A K,Gunal S.The impact of preprocessing on text classification[J].Information Processing&Management,2014,50(1):104-112.
    [3]Li J T,Cao Y M,Wang Y D,et al.Online learning algorithms for double-weighted least squares twin bounded support vector machines[J].Neural Processing Letters,2017,45(1):319-339.
    [4]Hinton G E,Osindero S,Teh Y W.A fast learning algorithm for deep belief nets[J].Neural Computaion,2006,18:1527-1554.
    [5]Kim Y.Convolutional neural networks for sentence classification[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing,2014:1746-1751.
    [6]Hochreiter S,Schmidhuber J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
    [7]Chung J,Gulcehre C,Cho K H,et al.Empirical evaluation of gated recurrent networks on sequence modeling[J].arXiv preprint arXiv:1412.3555,2014.
    [8]栾克鑫,杜新凯,孙承杰,等.基于注意力机制的句子排序方法[J].中文信息学报,2018,32(1):124-130.
    [9]Yang Z,Yang D,Dyer C,et al.Hierarchical attention networks for document classification[C]//Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,2016:1480-1489.
    [10]Wang Y,Wang S,Tang J,et al.Hierarchical attention network for action recognition in videos[J].arXiv preprint arXiv:1607.06416,2016.
    [11]Wang Y,Shen F,Elayavilli R K,et al.Entity-enhanced hierarchical attention neural networks for mining protein interactions from biomedical texts[C]//Proceedings of BioCreative VI Challenge and Workshop,2017.
    [12]Gao S,Young M T,Qiu J X,et al.Hierarchical attention networks for information extraction from cancer pathology reports[J].Journal of the American Medical Informatics Association Jamia,2018,25(3):321-330.
    [13]Yan S,Smith J S,Lu W,et al.Hierarchical multi-scale attention networks for action recognition[J].Signal Processing Image Communication,2017,61:73-84.
    [14]Zhou Y J,Xu J M,Gao,J,et al.Hybrid attention networks for Chinese short text classification[J].Computacion Y Sistemas,2017,21(4):759-769.
    [15]Pappas N,Popescubelis A.Multilingual hierarchical attention networks for document classification[J].arXiv preprint arXiv:1707.00896,2017.
    [16]Tarnpradab S,Liu F,Hua K A.Toward extractive summarization of online forum discussions via hierarchical attention networks[C]//Proceedings of FLAIRS 2017-Proceedings of the 30th International Florida Artificial Intelligence Research Society Conference,Palo Alto:AAAI Press,2017:288-292.
    [17]Minh-Thang Luong,et al.Achieving open vocabulary neural machine translation with hybrid word-character models.[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,Berlin,Germany,August 7-12,2016:1054-1063.
    [18]Xuezhe Ma,Eduard Hovy.End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,Berlin,Germany,August 7-12,2016:1064-1074.
    [19]Hinton G E.Learning distributed representations of concepts[C]//Proceedings of the 8th Annual Conference of the Cognitive Science Society,1986:1-12.
    [20]Yoshua Bengio.A neural probabilistic language model[J].Journal of Machine Learning Reseach.2003(3):1137-1155.
    [21]Mikolov T,Chen K,Corrado G,et al.Efficient estimation of word representations in vector space[J].ICLRWorkshop,2013.
    [22]Mikolov T,Sutskever I,Chen K,et al.Distributed representations of words and phrases and their compositionality[J].Advances in Neural Information Processing Systems,2013,26:3111-3119.
    [23]孙晓,何家劲,任福继.基于多特征融合的混合神经网络模型讽刺语用判别[J].中文信息学报,2016,30(6):215-223.
    [24]Schwenk H.Continuous space language models[J].Computer Speech and Language,2007,21(3):492-518.
    [25]Krogh A,Vedelsby J.Neural network ensembles,cross validation,and active learning[C]//Proceedings of Neural Information Processing Systems,1995:231-238.
    [26]叶敏,汤世平,牛振东.一种基于多特征因子改进的中文文本分类算法[J].中文信息学报,2017,31(4):132-137.
    [27]http://thuctc.thunlp.org/[DB/OL].
    [28]Wang Y R,Tian F.Recurrent residual learning for sequence classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Cambridge,MA:MIT Press,2016:938-943.
    [29]Paul S,Magdon-Ismail M,Drineas P.Feature selection for linear SVM with provable guarantees[J].Pattern Recognition,2016:205-214.
    [30]Dahl G E,Sainath T N,Hinton G E.Improving deep neural networks for LVCSR using rectified linear units and dropout[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2013:8609-8613.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700