基于多标记学习的汽车评论文本多性能识别

英文篇名：Multiple performances identification for car review texts based on multi-label learning
作者：张晶 ; 李德玉 ; 王素格
英文作者：ZHANG Jing;LI De-yu;WANG Su-ge;School of Computer and Information Technology,Shanxi University;
关键词：多标记学习 ; 文本处理 ; 汽车评论 ; 多方面识别
英文关键词：multi-label learning;;text processing;;car reviews;;multi-aspect recognition
中文刊名：JSJK
英文刊名：Computer Engineering & Science
机构：山西大学计算机与信息技术学院;
出版日期：2016-01-15
出版单位：计算机工程与科学
年：2016
期：v.38;No.253
基金：国家自然科学基金(61272095,61175067);; 山西省科技攻关项目(20110321027-02);; 山西省回国留学人员科研项目(2013-014);; 山西省科技基础条件平台建设项目(2015091001-0102)
语种：中文;
页：JSJK201601031
页数：7
CN：01
ISSN：43-1258/TP
分类号：192-198

摘要

针对汽车产品评论文本中出现的多方面性能,提出一种基于多标记学习的汽车评论文本多方面性能识别方法。首先,结合文本挖掘方法,利用多标记文本特征选择方法选取特征,将非结构化的文本转化为结构化的多标记数据集。在此基础上,使用四种多标记分类方法,对待识别的评论文档标注一个或多个方面标记。最后,以八种多标记评价指标评估方面识别的性能。在新浪汽车评论语料上的实验表明,方面识别的子集准确率达到了95%,验证了方法的可行性。
Aiming at the characteristics of the multi-aspect performance appeared in the automotive product reviews,this paper proposed a novel method for recognizing the multiple aspects of performance about car comment text based on multi-label learning.Firstly,appropriate words were selected as features by multi-label text feature selection method combined with the text mining technology,and then,the unstructured document corpus are transformed into structured multi-label dataset.After that,we finished marking one or more aspect tags for the unrecognized comment text with four multi-label classification methods.Finally,the recognition accuracy of multiple aspects was assessed by eight multi-label evaluation metrics.On the Sina car review corpus,experimental results indicate the subset accuracy reaches up to 95%.Hence,our method was feasible for recognizing the multiple aspects of automobile reviews.

引文

[1]Dave K,Lawrence S,Pennock D.Minging the peanut gallery:Opinion extraction and semantic classification of product reviews[C]∥Proc of the 12th International Conference on World Wide Web(WWW),2003:519-529.
    [2]Qiu G,Liu B,Bu J,et al.Expanding domain sentiment lexicon through double propagation[C]∥Proc of the 21st International Joint Conference on Artificial Intelligence(IJCAI),2009:1199-1204.
    [3]Huang S,Peng W,Li J,et al.Sentiment and topic analysis on social media:A multi-task multi-label classification approach[C]∥Proc of the 5th Annual ACM Web Science Conference(WebSci),2013:172-181.
    [4]Shuhua Monica L,Jiun-Hung C.A multi-label classification based approach for sentiment classification[J].Expert Systems with Applications,2015,42(3):1083-1093.
    [5]Katakis I,Tsoumakas G,Vlahavas I.Multi-label text classification for automated tag suggestion[C]∥Proc of the ECML/PKDD 2008Discovery Challenge,2008:1-9.
    [6]Hu M,Liu B.Mining and summarizing customer reviews[C]∥Proc of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD),2004:168-177.
    [7]Lu Y,Zhai C X,Sundaresan N.Rated aspect summarization of short comments[C]∥Proc of the 18th International Conference on World Wide Web(WWW),2009:131-140.
    [8]Wang H,Lu Y,Zhai C.Latent aspect rating analysis on review text data:a rating regression approach[C]∥Proc of the16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD),2010:783-792.
    [9]Wang S G,Yin X Q,Li R,et al.Sentiment clustering of evaluation object based on incomplete information systems[J].Journal of Chinese Information Processing,2012,26(4):98-102.(in Chinese)
    [10]Zhang M L,Zhou Z H.A review on multi-label learning algorithms[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(8):1819-1837.
    [11]Shen X,Boutell M,Luo J,et al.Mutil-abel machine learning and its application to semantic scene classification[C]∥Proc of the 2004International Symposium on Electronic Imaging,2004:18-22.
    [12]Read J,Pfahringer B,Holmes G,et al.Classifier chains for multi-label classification[J].Machine Learning,2011,85(3):333-359.
    [13]Tsoumakas G,Vlahavas I.Random k-labelsets:An ensemble method for multi-label classification[C]∥Proc of the 18th European Conference on Machine Learning(ECML),2007:406-417.
    [14]Zhang M L,Zhou Z H.ML-kNN:A lazy learning approach to muti-label learning[J].Pattern Recognition,2007,40(7):2038-2048.
    [15]Schapire R,Singer Y.BoosTexter:A boosting-based system for text categorization[J].Machine Learning,2000,39(2):135-168.
    [16]Elisseeff A,Weston J.A kernel method for multi-labelled classification[C]∥Advances in Neural Information Processing Systems,2001:681-687.
    [17]Zhang M L,Zhou Z H.Multi-label neural net-works with applications to functional genomics and text categorization[J].IEEE Transactions on Knowledge and Data Engineering,2006,18(10):1338-1351.
    [18]Spolaor N,Cherman E A,et al.A comparison of multi-label feature selection methods using the problem transformation approach[J].Electronic Notes in Theoretical Computer Science,2013,292(5):135-151.
    [19]Spolaor N,Tsoumakas G.Evaluating feature selection methods for multi-label text classification[C]∥Proc of the 1st Workshop on Bio-Medical Semantic Indexing and Question Answering,2013.
    [20]Zhang M L,JM Pena,V Robles.Feature selection for multilabel Naive Bayes classification[J].Information Sciences,2009,179(19):3218-3229.
    [21]Feng S F,Wang S G.Automobile reviews ontology knowledge base construction oriented towards opinion mining[J].Computer Applications and Software,2011,28(5):45-47.(in Chinese)
    [22]Bhowmick P K,Basu A,et al.Reader perspective emotion analysis in text through ensemble based multi-label classification framework[J].Computer and Information Science.2009,2(4):64-74.
    [9]王素格,尹学倩,李茹,等.基于非完备信息系统的评价对象情感聚类[J].中文信息学报,2012,26(4):98-102.
    [21]冯淑芳,王素格.面向观点挖掘的汽车本体知识库的构建[J].计算机应用与软件,2011,28(5):45-47.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700