摘要
针对汽车产品评论文本中出现的多方面性能,提出一种基于多标记学习的汽车评论文本多方面性能识别方法。首先,结合文本挖掘方法,利用多标记文本特征选择方法选取特征,将非结构化的文本转化为结构化的多标记数据集。在此基础上,使用四种多标记分类方法,对待识别的评论文档标注一个或多个方面标记。最后,以八种多标记评价指标评估方面识别的性能。在新浪汽车评论语料上的实验表明,方面识别的子集准确率达到了95%,验证了方法的可行性。
Aiming at the characteristics of the multi-aspect performance appeared in the automotive product reviews,this paper proposed a novel method for recognizing the multiple aspects of performance about car comment text based on multi-label learning.Firstly,appropriate words were selected as features by multi-label text feature selection method combined with the text mining technology,and then,the unstructured document corpus are transformed into structured multi-label dataset.After that,we finished marking one or more aspect tags for the unrecognized comment text with four multi-label classification methods.Finally,the recognition accuracy of multiple aspects was assessed by eight multi-label evaluation metrics.On the Sina car review corpus,experimental results indicate the subset accuracy reaches up to 95%.Hence,our method was feasible for recognizing the multiple aspects of automobile reviews.
引文
[1]Dave K,Lawrence S,Pennock D.Minging the peanut gallery:Opinion extraction and semantic classification of product reviews[C]∥Proc of the 12th International Conference on World Wide Web(WWW),2003:519-529.
[2]Qiu G,Liu B,Bu J,et al.Expanding domain sentiment lexicon through double propagation[C]∥Proc of the 21st International Joint Conference on Artificial Intelligence(IJCAI),2009:1199-1204.
[3]Huang S,Peng W,Li J,et al.Sentiment and topic analysis on social media:A multi-task multi-label classification approach[C]∥Proc of the 5th Annual ACM Web Science Conference(WebSci),2013:172-181.
[4]Shuhua Monica L,Jiun-Hung C.A multi-label classification based approach for sentiment classification[J].Expert Systems with Applications,2015,42(3):1083-1093.
[5]Katakis I,Tsoumakas G,Vlahavas I.Multi-label text classification for automated tag suggestion[C]∥Proc of the ECML/PKDD 2008Discovery Challenge,2008:1-9.
[6]Hu M,Liu B.Mining and summarizing customer reviews[C]∥Proc of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD),2004:168-177.
[7]Lu Y,Zhai C X,Sundaresan N.Rated aspect summarization of short comments[C]∥Proc of the 18th International Conference on World Wide Web(WWW),2009:131-140.
[8]Wang H,Lu Y,Zhai C.Latent aspect rating analysis on review text data:a rating regression approach[C]∥Proc of the16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD),2010:783-792.
[9]Wang S G,Yin X Q,Li R,et al.Sentiment clustering of evaluation object based on incomplete information systems[J].Journal of Chinese Information Processing,2012,26(4):98-102.(in Chinese)
[10]Zhang M L,Zhou Z H.A review on multi-label learning algorithms[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(8):1819-1837.
[11]Shen X,Boutell M,Luo J,et al.Mutil-abel machine learning and its application to semantic scene classification[C]∥Proc of the 2004International Symposium on Electronic Imaging,2004:18-22.
[12]Read J,Pfahringer B,Holmes G,et al.Classifier chains for multi-label classification[J].Machine Learning,2011,85(3):333-359.
[13]Tsoumakas G,Vlahavas I.Random k-labelsets:An ensemble method for multi-label classification[C]∥Proc of the 18th European Conference on Machine Learning(ECML),2007:406-417.
[14]Zhang M L,Zhou Z H.ML-kNN:A lazy learning approach to muti-label learning[J].Pattern Recognition,2007,40(7):2038-2048.
[15]Schapire R,Singer Y.BoosTexter:A boosting-based system for text categorization[J].Machine Learning,2000,39(2):135-168.
[16]Elisseeff A,Weston J.A kernel method for multi-labelled classification[C]∥Advances in Neural Information Processing Systems,2001:681-687.
[17]Zhang M L,Zhou Z H.Multi-label neural net-works with applications to functional genomics and text categorization[J].IEEE Transactions on Knowledge and Data Engineering,2006,18(10):1338-1351.
[18]Spolaor N,Cherman E A,et al.A comparison of multi-label feature selection methods using the problem transformation approach[J].Electronic Notes in Theoretical Computer Science,2013,292(5):135-151.
[19]Spolaor N,Tsoumakas G.Evaluating feature selection methods for multi-label text classification[C]∥Proc of the 1st Workshop on Bio-Medical Semantic Indexing and Question Answering,2013.
[20]Zhang M L,JM Pena,V Robles.Feature selection for multilabel Naive Bayes classification[J].Information Sciences,2009,179(19):3218-3229.
[21]Feng S F,Wang S G.Automobile reviews ontology knowledge base construction oriented towards opinion mining[J].Computer Applications and Software,2011,28(5):45-47.(in Chinese)
[22]Bhowmick P K,Basu A,et al.Reader perspective emotion analysis in text through ensemble based multi-label classification framework[J].Computer and Information Science.2009,2(4):64-74.
[9]王素格,尹学倩,李茹,等.基于非完备信息系统的评价对象情感聚类[J].中文信息学报,2012,26(4):98-102.
[21]冯淑芳,王素格.面向观点挖掘的汽车本体知识库的构建[J].计算机应用与软件,2011,28(5):45-47.