基于RAKEL算法的商品评论多标签分类研究与实现
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research and Implementation of RAKEL Algorithm Based Multi-Label Classification for Online Commodity Reviews
  • 作者:梁睿博 ; 王思远 ; 李壮 ; 刘亚松
  • 英文作者:LIANG Ruibo;WANG Siyuan;LI Zhuang;LIU Yasong;School of Computer Science and Engineering,Northeastern University;
  • 关键词:多标签分类 ; RAKEL ; 频繁项集 ; 标签相关性
  • 英文关键词:multi-label classification;;RAKEL;;item-frequency set;;label correlation
  • 中文刊名:ZGGC
  • 英文刊名:Software Engineering
  • 机构:东北大学计算机科学与工程学院;
  • 出版日期:2019-01-05
  • 出版单位:软件工程
  • 年:2019
  • 期:v.22;No.235
  • 基金:the National Key R&D Program of China under grant(2018YFB1004700)资助
  • 语种:中文;
  • 页:ZGGC201901002
  • 页数:4
  • CN:01
  • ISSN:21-1603/TP
  • 分类号:12-15
摘要
商品通常包含多个属性维度,准确找到商品评论中涉及的属性维度是文本挖掘工作的基础。RAKEL算法是多标签分类中问题转换思路的一种实现。在以往的工作中,由于子标签集合的随机性,没有充分发现和考虑标签之间的相关性,导致分类精度不高。为此,提出了改进的FI-RAKEL算法。首先通过FP-Growth算法得到标签的频繁项集,再从频繁项集和原始标签集合中选择标签构成新的标签子集,以此充分利用标签相关性训练基分类器。实验证明,改进的FI-RAKEL算法具有更好的评论文本多标签分类性能。
        Generally,there are multiple attribute-dimensions to describe a commodity.It is the foundation of text mining to accurately find the attribute-dimensions involved in commodity reviews.The Random K-Labelsets(RAKEL) is an accomplishment of problem transformation in multi-label classification.However,due to the randomness of sub-labelset and the lack of investigating into the relationship among labels,the classification accuracy of RAKEL is not high.Hence,an improved RAKEL algorithm(FI-RAKEL) is proposed.Firstly,the item-frequency sets of labels are obtained through the FPGrowth algorithm.Then,labels are selected from the item-frequency sets and the original label set respectively to generate a new k-labelset and it is used to train the corresponding classifier based on correlation among labels.The experiment result shows that the proposed FI-RAKEL algorithm brings higher classification accuracy for multiple-labeled reviews.
引文
[1]G.Tsoumakas,I.Katakis,and I.Vlahavas,Random k-labelsets for multi-label classification,IEEE Trans.Knowl.Data Eng.,2011,23(7):1079-1089.
    [2]Padmanabhan Divya,Bhat Satyanath,Shevade Shirish,Narahari Y.Topic Model Based Multi-Label Classification.2016IEEE 28th International Conference on Tools with Artificial Intelligence,Nov 2016:996-1003.
    [3]Huang Jun,Li Guorong,Huang Qingming,et al.Learning Label Specific Features for Multi-label Classification.2015 IEEEInternational Conference on Data Mining,2015(11):181-190.
    [4]张洛阳,毛嘉莉,刘斌,等.基于贝叶斯模型的多标签分类算法[J].计算机应用,2016(1):52-56;71.
    [5]徐婧扬.多标签分类算法研究及其应用[D].山东大学,2017.
    [6]Read Jesse,Martino Luca,Luengo.Efficient monte carlo methods for multi-dimensional learning with classifier chains.Pattern Recognition,March 2014,47(3):1535-1546.
    [7]Yu Zhilou,Hao Hong,Zhang Weipin.A Classifier Chain Algorithm with K-means for Multi-label Classification on Clouds.Journal of Signal Processing Systems,2017,86(2):337-346.
    [8]Rokach Lior,Schclar Alon,Itach Ehud.Ensemble methods for multi-label classification.Expert Systems With Applications,November 2014(41):7507-7523.
    [9]Read J.A pruned problem transformation method for multilabel classification[C].Proceeding of New Zealand Computer Science Research Student Conference.Christchurch:Canterbury University,2008:143-150.
    [10]金永贤,张微微,周恩波.一种改进的RAKEL多标签分类算法[J].浙江师范大学学报(自然科学版),2016,39(4):386-391.
    [11]Wu Yu-Ping,Lin Hsuan-Tien.Progressive random k-labelsets for cost-sensitive multi-label classification.Machine Learning,2017,106(5):671-694.
    [12]Osojnik,Alja?,Panov.Multi-label classification via multi-target regression on data streams.Machine Learning,2017,106(6):745-770.
    [13]周恩波,叶荣华,张微微,等.一种基于成对标签的Rakel算法改进[J].计算机与现代化,2016(3):16-18;23.
    [14]吕小勇,石洪波.基于频繁项集的多标签文本分类算法[J].计算机工程,2010(15):83-85.
    [15]Jiawei Han,Jian Pei,Yiwen Yin.Mining Frequent Patterns without Candidate Generation:A FrequentPattern Tree Approach[J].Data Mining and Knowledge Discovery,2004(8):53-87.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700