摘要
商品通常包含多个属性维度,准确找到商品评论中涉及的属性维度是文本挖掘工作的基础。RAKEL算法是多标签分类中问题转换思路的一种实现。在以往的工作中,由于子标签集合的随机性,没有充分发现和考虑标签之间的相关性,导致分类精度不高。为此,提出了改进的FI-RAKEL算法。首先通过FP-Growth算法得到标签的频繁项集,再从频繁项集和原始标签集合中选择标签构成新的标签子集,以此充分利用标签相关性训练基分类器。实验证明,改进的FI-RAKEL算法具有更好的评论文本多标签分类性能。
Generally,there are multiple attribute-dimensions to describe a commodity.It is the foundation of text mining to accurately find the attribute-dimensions involved in commodity reviews.The Random K-Labelsets(RAKEL) is an accomplishment of problem transformation in multi-label classification.However,due to the randomness of sub-labelset and the lack of investigating into the relationship among labels,the classification accuracy of RAKEL is not high.Hence,an improved RAKEL algorithm(FI-RAKEL) is proposed.Firstly,the item-frequency sets of labels are obtained through the FPGrowth algorithm.Then,labels are selected from the item-frequency sets and the original label set respectively to generate a new k-labelset and it is used to train the corresponding classifier based on correlation among labels.The experiment result shows that the proposed FI-RAKEL algorithm brings higher classification accuracy for multiple-labeled reviews.
引文
[1]G.Tsoumakas,I.Katakis,and I.Vlahavas,Random k-labelsets for multi-label classification,IEEE Trans.Knowl.Data Eng.,2011,23(7):1079-1089.
[2]Padmanabhan Divya,Bhat Satyanath,Shevade Shirish,Narahari Y.Topic Model Based Multi-Label Classification.2016IEEE 28th International Conference on Tools with Artificial Intelligence,Nov 2016:996-1003.
[3]Huang Jun,Li Guorong,Huang Qingming,et al.Learning Label Specific Features for Multi-label Classification.2015 IEEEInternational Conference on Data Mining,2015(11):181-190.
[4]张洛阳,毛嘉莉,刘斌,等.基于贝叶斯模型的多标签分类算法[J].计算机应用,2016(1):52-56;71.
[5]徐婧扬.多标签分类算法研究及其应用[D].山东大学,2017.
[6]Read Jesse,Martino Luca,Luengo.Efficient monte carlo methods for multi-dimensional learning with classifier chains.Pattern Recognition,March 2014,47(3):1535-1546.
[7]Yu Zhilou,Hao Hong,Zhang Weipin.A Classifier Chain Algorithm with K-means for Multi-label Classification on Clouds.Journal of Signal Processing Systems,2017,86(2):337-346.
[8]Rokach Lior,Schclar Alon,Itach Ehud.Ensemble methods for multi-label classification.Expert Systems With Applications,November 2014(41):7507-7523.
[9]Read J.A pruned problem transformation method for multilabel classification[C].Proceeding of New Zealand Computer Science Research Student Conference.Christchurch:Canterbury University,2008:143-150.
[10]金永贤,张微微,周恩波.一种改进的RAKEL多标签分类算法[J].浙江师范大学学报(自然科学版),2016,39(4):386-391.
[11]Wu Yu-Ping,Lin Hsuan-Tien.Progressive random k-labelsets for cost-sensitive multi-label classification.Machine Learning,2017,106(5):671-694.
[12]Osojnik,Alja?,Panov.Multi-label classification via multi-target regression on data streams.Machine Learning,2017,106(6):745-770.
[13]周恩波,叶荣华,张微微,等.一种基于成对标签的Rakel算法改进[J].计算机与现代化,2016(3):16-18;23.
[14]吕小勇,石洪波.基于频繁项集的多标签文本分类算法[J].计算机工程,2010(15):83-85.
[15]Jiawei Han,Jian Pei,Yiwen Yin.Mining Frequent Patterns without Candidate Generation:A FrequentPattern Tree Approach[J].Data Mining and Knowledge Discovery,2004(8):53-87.