面向社会事件的半监督自训练多方立场分析

英文篇名：Semi-supervised Self-training for Multiple Standpoint Analysis in Social Events
作者：林俊杰 ; 王磊 ; 毛文吉
英文作者：LIN Junjie;WANG Lei;MAO Wenji;State Key Laboratory for Management and Control of Complex Systems,Institute of Automation,Chinese Academy of Sciences;School of Artificial Intelligence,University of Chinese Academy of Sciences;
关键词：多方立场分析 ; 半监督 ; 自训练 ; 用户立场一致性 ; 话题信息
英文关键词：Multiple Standpoint Analysis;;Semi-supervised;;Self-training;;User-Level Standpoint Consistency;;Topic Information
中文刊名：MSSB
英文刊名：Pattern Recognition and Artificial Intelligence
机构：中国科学院自动化研究所复杂系统管理与控制国家重点实验室;中国科学院大学人工智能学院;
出版日期：2018-12-15
出版单位：模式识别与人工智能
年：2018
期：v.31;No.186
基金：国家自然科学基金项目(No.71702181,11832001)资助~~
语种：中文;
页：MSSB201812002
页数：11
CN：12
ISSN：34-1089/TP
分类号：16-26

摘要

已有的立场分析方法主要采用有监督或无监督方式训练立场分类模型,有监督模型训练通常需要大量有标注数据支持,而相比有监督模型,无监督模型的性能差距较大.为了降低模型训练对有标注训练数据的要求,同时保证模型性能,文中面向社会事件相关的社交媒体文本,提出半监督自训练多方立场分析方法.对于自训练方法,在模型迭代训练过程中,选择高质量样本加入训练集合,对提升模型性能起到关键作用.为此,文中方法首先根据用户立场一致性度量文本的分类置信度,然后利用话题信息进一步筛选高质量样本扩充训练集合,保证模型性能不断提升.实验表明,相比相关工作中的代表性方法和其它半监督模型训练方式,文中方法能够取得更优的立场分类效果,并且方法依据的用户立场一致性和话题信息均有助于提升立场分类效果.
Existing methods for standpoint analysis mainly train standpoint classification models in a supervised or unsupervised manner. It usually needs a large number of labeled data to support the training of supervised models. In contrast,the performance of unsupervised models differs greatly from that of the supervised models. To reduce the demand of labeled data in model training,and meanwhile to ensure model performance,this paper proposes a semi-supervised self-training method for multiple standpoint analysis based on social media texts related to social events. For self-training methods,selecting and adding high-quality data to the training dataset play a key role in improving the performance of classification models during the iterative training process. The proposed method first measures the classification confidence of texts based on user-level standpoint consistency. It then leverages topic information to select high-quality texts to expand the training dataset,so as to constantly improve the performance of the model. Experimental results show that the proposed method can achieve better performance in standpoint classification compared with the representative methods in the related work as well as other semi-supervised model training methods. In addition, both the user-level standpoint consistency and topic information used in the method contribute to improve the performance of standpoint classification.

引文

[1] MOHAMMAD S M,SOBHANI P,KIRITCHENKO S. Stance and Sentiment in Tweets. ACM Transactions on Internet Technology,2017,17(3). DOI:10. 1145/3003433.
    [2] ABBOTT R,WALKER M,ANAND P,et al. How Can You Say Such Things?!?:Recognizing Disagreement in Informal Political Argument//Proc of the Workshop on Languages in Social Media.Stroudsburg,USA:ACL,2011:2-11.
    [3]SOMASUNDARAN S,WIEBE J. Recognizing Stances in Ideological On-line Debates//Proc of the NAACL HLT Workshop on Computational Approaches to Analysis and Generation of Emotion in Text.Stroudsburg,USA:ACL,2010:116-124.
    [4]HASAN K S,NG V. Stance Classification of Ideological Debates:Data,Models,Features,and Constraints//Proc of the 6th International Joint Conference on Natural Language Processing. Stroudsburg,USA:ACL,2013:1348-1356.
    [5]AUGENSTEIN I,ROCKTASCHEL T,VLACHOS A,et al. Stance Detection with Bidirectional Conditional Encoding//Proc of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg,USA:ACL,2016:876-885.
    [6]WEI W,ZHANG X,LIU X Q,et al. Pkudblab at Sem Eval-2016Task 6:A Specific Convolutional Neural Network System for Effective Stance Detection//Proc of the 10th International Workshop on Semantic Evaluation. Stroudsburg,USA:ACL,2016:384-388.
    [7]ANAND P,WALKER M,ABBOTT R,et al. Cats Rule and Dogs Drool!:Classifying Stance in Online Debate//Proc of the 2nd Workshop on Computational Approaches To Subjectivity and Sentiment Analysis. Stroudsburg,USA:ACL,2011:1-9.
    [8]WALKER M A,ANAND P,ABBOTT R,et al. Stance Classification Using Dialogic Properties of Persuasion//Proc of the Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg,USA:ACL,2012:592-596.
    [9] QIU M H,YANG L,JIANG J. Modeling Interaction Features for Debate Side Clustering//Proc of the 22nd ACM International Conference on Information&Knowledge Management. New York,USA:ACM,2013:873-878.
    [10]SRIDHAR D,GETOOR L,WALKER M. Collective Stance Classification of Posts in Online Debate Forums//Proc of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media. New York,USA:ACM,2014:109-117.
    [11]SRIDHAR D,FOULDS J,HUANG B,et al. Joint Models of Disagreement and Stance in Online Debate//Proc of the 53rd Annual Meeting of the Association for Computational Linguistics and the7th International Joint Conference on Natural Language Processing.Stroudsburg,USA:ACL,2015:116-125.
    [12] EBRAHIMI J,DOU D J,LOWD D. Weakly Supervised Tweet Stance Classification by Relational Bootstrapping//Proc of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg,USA:ACL,2016:1012-1017.
    [13] QIU M H,SIM Y C,SMITH N A,et al. Modeling User Arguments,Interactions,and Attributes for Stance Prediction in Online Debate Forums//Proc of the SIAM International Conference on Data Mining. Philadelphia,USA:SIAM,2015:855-863.
    [14]HASAN K S,NG V. Extra-Linguistic Constraints on Stance Recognition in Ideological Debates//Proc of the 51st Annual Meeting of the Association for Computational Linguistics. Stroudsburg,USA:ACL,2013:816-821.
    [15]JOHNSON K,GOLDWASSER D."All I Know about Politics Is What I Read in Twitter":Weakly Supervised Models for Extracting Politicians'Stances from Twitter//Proc of the 26th International Conference on Computational Linguistics. Stroudsburg, USA:ACL,2016:2966-2977.
    [16] EBRAHIMI J,DOU D J,LOWD D. A Joint Sentiment-TargetStance Model for Stance Classification in Tweets//Proc of the26th International Conference on Computational Linguistics.Stroudsburg,USA:ACL,2016:2656-2665.
    [17]HAMMER H L,SOLBERG P E,?VRELID L. Sentiment Classification of Online Political Discussions:A Comparison of a WordBased and Dependency-Based Method//Proc of the 5th Workshop on Computational Approaches to Subjectivity,Sentiment and Social Media Analysis. Stroudsburg,USA:ACL,2014:90-96.
    [18]RINGSQUANDL M,PETKOVIC'D. Analyzing Political Sentiment on Twitter//Proc of the AAAI Spring Symposium. Palo Alto,USA:AAAI Press,2013:40-47.
    [19]HASAN K S,NG V. Why Are You Taking This Stance? Identifying and Classifying Reasons in Ideological Debates//Proc of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg,USA:ACL,2014:751-762.
    [20]BOLTUZICCF,SNAJDER J. Back up Your Stance:Recognizing Arguments in Online Discussions//Proc of the 1st Workshop on Argumentation Mining. Stroudsburg,USA:ACL,2014:49-58.
    [21]QIU M H,JIANG J. A Latent Variable Model for Viewpoint Discovery from Threaded Forum Posts//Proc of the Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg, USA:ACL,2013:1031-1040.
    [22]THONET T,CABANAC G,BOUGHANEM M,et al. VODUM:A Topic Model Unifying Viewpoint,Topic and Opinion Discovery//Proc of the European Conference on Information Retrieval. Berlin,Germany:Springer,2016:533-545.
    [23]TRABELSI A,ZAIANE O R. A Joint Topic Viewpoint Model for Contention Analysis//Proc of the International Conference on Applications of Natural Language to Data Bases/Information Systems.Berlin,Germany:Springer,2014:114-125.
    [24]LIN W H,WILSON T,WIEBE J,et al. Which Side Are You on?:Identifying Perspectives at the Document and Sentence Levels//Proc of the 10th Conference on Computational Natural Language Learning. Stroudsburg,USA:ACL,2006:109-116.
    [25]LIN J J,MAO W J,ZHANG Y H. An Enhanced Topic Modeling Approach to Multiple Stance Identification//Proc of the 26th ACM Conference on Information and Knowledge Management. New York,USA:ACM,2017:2167-2170.
    [26]ZHU X J. Semi-supervised Learning Literature Survey. Technical Report,1530. Madison,USA:University of Wisconsin-Madison,2008.
    [27]ZHAO W X,JIANG J,WENG J S,et al. Comparing Twitter and Traditional Media Using Topic Models//Proc of the European Conference on Information Retrieval. Berlin,Germany:Springer,2011:338-349.
    [28]SUN F Z,ZHANG K. NMF-Based Method of Text Classification//Proc of the 8th World Congress on Intelligent Control and Automation. Washington,USA:IEEE,2010:4312-4316.
    [29]XU W,LIU X,GONG Y H. Document Clustering Based on Nonnegative Matrix Factorization//Proc of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. New York,USA:ACM,2003:267-273.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700