基于随机相遇的频繁项集挖掘方法

英文篇名：A frequent item-set mining method based on random meeting
作者：赵文涛 ; 付侃侃 ; 李素青 ; 张霄宏
英文作者：ZHAO Wentao;FU Kankan;LI Suqing;ZHANG Xiaohong;School of Computer Science and Technology,Henan Polytechnic University;
关键词：数据挖掘 ; 频繁项集挖掘 ; 随机相遇算法 ; 随机相遇 ; 最小支持度
英文关键词：data mining;;frequent item-set mining;;random meeting algorithm;;random meeting;;minimum support degree
中文刊名：JGXB
英文刊名：Journal of Henan Polytechnic University(Natural Science)
机构：河南理工大学计算机科学与技术学院;
出版日期：2015-01-21 14:32
出版单位：河南理工大学学报(自然科学版)
年：2015
期：v.34;No.162
基金：国家自然科学基金资助项目(51274088);; 河南省科技攻关计划项目(112102210004);; 河南省高等学校矿山信息化重点实验室基金资助项目(J1202)
语种：中文;
页：JGXB201501016
页数：4
CN：01
ISSN：41-1384/N
分类号：86-89

摘要

频繁项集挖掘是关联规则挖掘的重要内容,而现有的频繁项集挖掘算法在数据库扫描和复杂数据结构构建方面消耗过多的时间,效率较低。为克服现有频繁项集挖掘算法的不足,提出了基于随机相遇的频繁项集挖掘算法。在随机相遇过程中,不断从原始事务集中随机挑选两条事务,将其交集作为新事务集中的元素,通过计算新事务集中最小支持度与原事务集中最小支持度的关系,将在原事务集上的频繁项集挖掘转化为在新事务集上的频繁项集挖掘,算法的时间复杂度和空间复杂度大大降低。由于随机样本蕴含原始数据集的主要统计特性,新事务集具有原事务集的统计特性,通过调整参数,算法在新事物集上挖掘结果的准确度可以得到保证。并利用一个零售超市的交易数据对该算法的有效性进行了测试。测试结果表明,该算法能将挖掘速度提升数十倍,同时挖掘结果的准确度和其它算法相差不大。
An association rule mining algorithm based on random meeting is proposed to handle the inefficient problem of algorithms.During random meeting,two transactions are selected randomly from an original set,and then the intersection of the transactions is computed and taken as the elements of a new set.The frequent item mining on the original set can be instead on the new set by mapping the min support degree on the two sets.Due to the fact that the new set is smaller than the original one,and the statistical properties of the new set are similar to those of the original one,the complexity of the new algorithm can be reduced while getting the similar accuracy to existing algorithms.The new algorithm was evaluated based on the transactions of a supermarket.The experiment results showed that the new algorithm can improve the speed of frequent item-set mining by more than ten times while achieve the similar accuracy of the computing results compared with other algorithms.

引文

[1]李志刚.关联规则数据挖掘在客户关系管理中的应用研究[J].中国信息界,2012(228):38-40.
    [2]张超,魏三强,朱军.一种农业信息数据挖掘系统的研究与应用[J],苏州科技学院学报[J],2013,30(1):75-81.
    [3]张天嵩,张素,李秀娟,等.治疗肺纤维化中药复方用药规律的数据挖掘[J].中国中医药信息杂志,2011,18(1):31-33.
    [4]李宇.基于关联规则的股票价格指数联动效应分析[J].时代金融,2012(15):244-245.
    [5]唐冰.基于粗糙集理论和布尔矩阵的改进Apriori算法[J].西南民族大学学报,2013,36(3):366-372.
    [6]刘兴丽,骆力明.基于高阶项目集的频繁项目集发现算法[J].首都师范大学学报,2011,32(1):22-25.
    [7]崔旭,刘小丽.基于粗糙集的改进Apriori算法研究[J].计算机仿真,2013,30(1):329-332.
    [8]刘敏娴,马强,宁以风.基于频繁矩阵的Apriori算法改进[J].计算机工程与设计,2012,33(11):4235-4239.
    [9]凌绪雄,王社国,李洋,等.无项头表的FP-Growth算法[J].计算机应用,2011,31(5):1391-1394.
    [10]陈红叶.增量式FP-Growth算法及在信息抽取上的应用[J].制造业自动化,2010,33(1):57-59.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700