考虑多粒度属性约简的关联规则挖掘研究

英文篇名：Research on Mining Association Rules Based on Multi-Granularity Attribute Reduction
作者：杨珍 ; 耿秀丽
英文作者：YANG Zhen;GENG Xiuli;Business School, University of Shanghai for Science and Technology;
关键词：多粒度粗糙集 ; 属性约简 ; 二进制 ; 加权Apriori算法
英文关键词：multi-granularity rough set;;attribute reduction;;binary;;weighted Apriori algorithm
中文刊名：JSGG
英文刊名：Computer Engineering and Applications
机构：上海理工大学管理学院;
出版日期：2018-03-16 17:30
出版单位：计算机工程与应用
年：2019
期：v.55;No.925
基金：国家自然科学基金(No.71301104,No.71271138);; 高等学校博士学科点专项科研基金资助课题(No.20133120120002);; 上海市教育委员会科研创新项目(No.14YZ088);; 上海市一流学科项目(No.S1201YLXK);; 沪江基金(No.A14006)
语种：中文;
页：JSGG201906021
页数：7
CN：06
分类号：139-145

摘要

大数据时代,人们获取所需信息的困难度提高,而数据挖掘是当下解决此问题的关键技术。Apriori算法作为数据挖掘中的常用算法,通过挖掘数据背后的潜在关联规则。考虑到传统Apriori算法执行过程中,数据扫描频繁、候选集获取繁琐等问题,提出采用加权Apriori算法,即将冗余记录存储一次,并将记录的重复次数占全部记录数的比值作为权重,压缩空间;采用二进制的布尔矩阵替代原有数据集,通过矩阵内部"与运算",获取最大频繁集,降低时间复杂度。考虑到原始数据冗余性以及粗糙集属性约简的不精确性,在提取关联规则前,提出采用多粒度粗糙集的属性约简算法,通过知识粒度细化属性值来提高约简精度,降低空间复杂度。最后,将所提方法与基于频繁矩阵的Apriori算法以及原始Apriori算法进行比较,验证所提方法的实用性和有效性。
In the era of big data, it has become increasingly difficult to obtain the data. And data mining is the key technology to solve this problem at present. Apriori algorithm is a common algorithm in data mining by mining potential association rules behind the data. Considering the problems of traditional Apriori algorithm, such as frequent scan data and cumbersome acquisition of candidate items, a weighted Apriori algorithm is proposed to record the number of repetitions of the total number of records. The repetition times are taken as the weight and compression matrix of data sets. Binary Boolean matrix is used to replace the original data set, through the matrix of"AND operation"to obtain the maximum frequent item set to reduce the time complexity. Considering the redundancy of the original data and the inaccuracy of attribute reduction, an algorithm of attribute reduction based on multi-granularity rough set is proposed before the association rules are extracted. The uncertainty of the information is described by the granularity of knowledge, and the attribute value is refined to reduce the precision and reduce the space complexity. Finally, the proposed algorithm is compared with the Apriori algorithm based on frequent matrices and the original Apriori algorithm to verify its practicability and validity.

引文

[1]Wang P,An C,Wang L.An improved algorithm for mining association rule in relational database[C]//International Conference on Machine Learning and Cybernetics,2015:1-4.
    [2]Yang J,Li Z,Xiang W,et al.An improved apriori algorithm based on features[J].International Conference on Computational Intelligence and Security,2014:125-128.
    [3]于守健,周羿阳.基于前缀项集的Apriori算法改进[J].计算机应用与软件,2017,34(2):290-294.
    [4]Zhang C S,Yan L.Extension of local association rules mining algorithm based on apriori algorithm[C]//IEEEInternational Conference on Software Engineering and Service Science,2014:340-343.
    [5]Tong C,Guo P.Data mining with improved apriori algorithm on wind generator alarm data[C]//中国控制与决策会议,2013:1936-1941.
    [6]曹莹,苗志刚.基于向量矩阵优化频繁项的改进Apriori算法[J].吉林大学学报(理学版),2016,54(2):349-353.
    [7]张永梅,许静,郭莎.基于堆排序的重要关联规则挖掘算法研究[J].计算机技术与发展,2016,26(12):45-48.
    [8]Xu F,Bi Z,Lei J.Cost minimization attribute reduction based on mutual information[C]//International Conference on Fuzzy Systems and Knowledge Discovery,2016:215-219.
    [9]Guo G,Liu Z,Lou C,et al.Improving on a rapid attribute reduction algorithm based on neighborhood rough sets[J].International Conference on Fuzzy Systems and Knowledge Discovery,2016:236-240.
    [10]Chen R,Ma W,Xiao B,et al.Research on the rough set attribute reduction algorithm based on significance of attributes[C]//International Conference on Control,2016:1-9.
    [11]Zhang Q,Yang J,Yao L.Attribute reduction based on rough approximation set in algebra and information views[J].IEEE Access,2016,4:5399-5407.
    [12]李丹.多粒度粗糙集模型下的矩阵属性约简算法[J].计算机工程与应用,2017,53(19):168-172.
    [13]刘敏娴,马强,宁以风.基于频繁矩阵的Apriori算法改进[J].计算机工程与设计,2012,33(11):4235-4239.
    [14]王玲,李树林,吴璐璐.基于定量关联规则树的分类及回归预测算法[J].北京科技大学学报,2016,38(6):886-892.
    [15]徐开勇,龚雪容,成茂才.基于改进Apriori算法的审计日志关联规则挖掘[J].计算机应用,2016,36(7):1847-1851.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700