关联规则挖掘在盲文软件中的应用研究

英文题名：Research on Association Rules Mining Applied into Braille Software
作者：李重周
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：数据挖掘 ; 关联规则 ; 盲文软件 ; 兴趣度 ; 紧密度
英文关键词：Data mining ; Association Rule ; Braille software ; Interestness ; impaction
学位年度：2008
导师：杨君锐
学科代码：081203
学位授予单位：西安科技大学
论文提交日期：2008-01-08

摘要

数据挖掘是致力于数据分析和理解、揭示数据内部蕴涵知识的技术,它是未来信息技术应用的重要方法之一。关联规则挖掘是数据挖掘中一个很重要的研究领域。关联规则挖掘算法是关联规则数据挖掘研究中的主要内容,迄今为止已提出了许多高效的关联规则挖掘算法。
     本文首先对数据挖掘的基本概念、数据挖掘的基本过程和数据挖掘的研究热点等方面进行了探讨,并对关联规则数据挖掘的经典算法Apriori进行了较详细的分析和研究,在此基础上,提出了一种新的不产生候选项集及少量扫描数据库来挖掘频繁项集的超集树算法SI_Tree。该算法通过搜索数据库,一次性的找出当前项的所有超集从而获得频繁项集,经实验验证,产生了较好的效果。
     然后,通过对盲文软件系统的研究,针对传统盲文软件系统中存在的问题,并在充分考虑关联规则挖掘算法特性的基础上,再采取不断扫描挖掘对象,组成一个Web信息元数据库,找出其中相互关联的部分,并对其进行分类等方法和手段,将超集树关联规则挖掘方法应用到盲文软件系统中,从而使盲文软件在网站访问时,达到快速访问相关内容的目的。
     最后,针对关联规则挖掘中可能产生许多无效规则的问题,在对兴趣度度量方法进行研究的基础上,提出了一种旨在反映项目集之间紧密性、稀有性和简洁性的新的度量方法-紧密度(性),并利用该度量方法给出了一个基于紧密性的兴趣度挖掘算法,同时将这种挖掘方法应用到盲文软件的网站访问中。经实验验证,在盲文软件的网站访问中应用基于紧密性的兴趣规则挖掘方法的访问效率要优于基于超集树的关联规则挖掘方法。
Data mining is a technique that aims to analyze and understand large source data and reveal knowledge hidden in the data. It has been viewed as one of important ways in information processing. Association rule mining is a very important research field in data mining. The research on the algorithms of association rule mining is a key task in data mining of association rule. Many highly efficient algorithms in the field have been put forward for mining association rules so far.
     At first, the problems on the fundamental concepts of data mining, the main process of data mining, the key research of data mining and so on were inquired in the thesis. The classical algorithm Apriori in data mining of association rule was analysed and studied more thorough. Then a new Super-Items Tree(SI_Tree) algorithm without candiddte items and in mining frequent itemsets based on database scaned few was put forward. It mines frequent itemsets through all super items of the current items found only one by scaning database. The experimental results show that the algorithm has better performance.
     Then, the problems in the traditional Braille software system were sloved that based fully on the characteristic of association rule mining algorithm after Braille software was studied. Those were that searches all of the objects in the website constantly, makes up a Web database, finds the items which associate each other from the Web database, classes them and so on. The algorithm of super item tree for association rule mining was applied to Braille software. So that the speed to access to the web contents in Braille software was improved.
     Finally, a new way about the question that association rule mining can bring many unavailable association rules was presented after the interestingness in association rules mining was researched. This way was called the impaction which aimed the report of closeness、singularity and concision among the items. An algorithm of interesting mining based on the impaction was presented by the way. Also the algorithm was applied to access to the Web contents in Braille software. The experimental results show that the way has more efficient than super item tree way when the contents of Web were accessed in Braille software.

引文

[1] Han J,Kamber M.Data Mining:Concepts and Techniques[M].Beijing: High Education Press, 2001
    [2] 蒋腾旭.智能优化算法概述[J].电脑知识与技术,2007,8(23):507-508
    [3] Agrawal R,Imielinski T,Swawi A. Mining association rules between sets of items in large database[C].In:Proc. of the ACM SIGMOD Int’l Conf. Management of Data[A].USA Washington,1993:207-216
    [4] Frawley W.J,Piatetsky-Shapiro G,Matheus C.J. Knowledge Discovery in Database:An Overview,in Knowledge Discovery in Databases. Cambridge,MA:AAAI/MIT,1991:1～27
    [5] 李长河,王磊,杨君锐等.人工智能及其应用[M].北京:机械工业出版社,2006,8
    [6] 王珏,周志华,周傲英.机器学习及其应用[M]. 北京:清华大学出版社,2006,3
    [7] 张立民.人工神经网络的模型及其应用[M].上海:复旦大学出版社,1993,7
    [8] 李丙春,梁静峰,田华.统计学方法在数据挖掘中的应用与评价[J].喀什师范学院学报,2003,24(6):60-63
    [9] 边肇祺,张学工等.模式识别(第二版)[M].北京:清华大学出版社,2005,1
    [10] Srikant R,Vu Q,Agrawal R. Mining association rules with item constraints[C]. In:Heckman D,Mannila H,Pregibon D,et al. In: Proc. of the 1997 Int’l Conf. on Data Mining and Knowledge Discovery. AAAI Press,1997:67~73
    [11] Bayardo R,Agrawal R,Gunopulos D. Constraint-Based rule mining on large dense data sets[C]. In: Papazoglou M. In: Proc. of the 1999 Int’l Conf. on Data Engineering. Sydney:IEEE Computer Society,1999:188~197
    [12] 黄晓霞,萧蕴诗. 数据挖掘集成技术研究[J]. 计算机应用研究,2003,(4):37~39
    [13] Bezerra E,Mattoso M,Xexeo G. An analysis of the integration between data mining application and database system[C]. In:Ebecken N,Brebbir C.A.Data Mining Ⅱ.Published by MIT Press,2000:151~160
    [14] Imielinski T,Virmani A,Abdulghani A. Discovery board application programming inter- face and query language for database mining[C]. In: Proc. of the 2nd Int’l Conference on Knowledge Discovery and Data Ming,Portland,Oregon,U.S.A,1996
    [15] Han J,Fu Y,Wang W,etal. DMQL:A data mining query language for relational databases[C]. In: Proceedings of 1996 SIGMOD’96 Workshop Research Issues on Data Mining and Knowledge Discovery(DMKD’96). Montreal,Canada,1996:27~34
    [16] Grossman R,Bailey S,Ramu A,etal. The Management and Mining of Multiple PredictiveModels Using the Predictive Modeling Markup Language(PMML).AFCEA’99
    [17] Agrawal R, Srikant R.Fast algroithm for mining association rules[C]. In:Proceeding of the 20th International Conference on VLDB.Santigo,1994:487-499
    [18] 陆丽娜,陈亚萍,魏恒义等.挖掘关联规则中 Apriori 算法的研究[J].小型微型计算机系统, 2000,21(9) :940-943
    [19] Park J S,Chen M S,Yu P S. An Effective Hash-Based Algorithm for Mining Association Rules[C]. In: Proc. of ACM SIGMOD Int’l Conf. Management of Data,San Jose,CA,1995: 175~186
    [20] Savasere A, Omiecinski E, Navathe SM. An efficient algorithm for mining association rules[C]. In: Proc. of the 21st International Conference on VLDB.Zurich,1995:432～444
    [21] Toivonen H. Sampling large databases for association rules[C]. In: Proc. of the 22th Int’l Conference on Very Large Databases,Bombay,India,1996:1~12
    [22] S.Brin,R.Motwani, J.D.Ullman, et al.Dynamic Itemset counting andimplication rules for market basket data[C]. In:Proc. of the ACM SIGMOD Int’l Conf. On Management of Data (SIGMOD’97),Tucson,Arizona,1997.ACM Press Publisher,1997:255~264
    [23] 黄艳,王延章,苑森森. 一种高效相联规则提取算法[J]. 吉林大学学报,1999,(2):36~38
    [24] Han J,Jian P,Yiwen Y. Mining frequent patterns without candidate generation[C]. In: Proc. of the 2000 ACM SIGMOD Int’l Conf. Management of Data. Dallas,2000:1~12
    [25] Burdick D,Calimlim M,Gehrke J. Mafia:A maximal frequent itemset algorithm for transactional database[C]. In:Proc. of 17th Int’l Conf. on Data Engineering,2001:443-452
    [26] 周欣,沙朝锋,朱扬勇等.兴趣度—关联规则的又一个阈值[J].计算机研究与发展,2000,37(5):627-633
    [27] 高峰,谢剑英.多值属性关联规则的理论基础[J].计算机工程,2000,26(11):47-49
    [28] 杨建林,邓三鸿,苏新宁.关联规则兴趣度的度量[J].情报学报,2003,22(4):419-424
    [29] 罗可,吴杰.关联规则衡量标准的研究[J].控制与决策,2003,18(3):277-284
    [30] M.Klemettinen,H.Mannila,P.Ronkainen,et al.Finding interesting rules from large sets of discovered association rules[C].In Proceedings of the 3rd international Conference on Information and Knowledge Management, 1994:401-407
    [31] Fayyad U, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery: An overview. In: Fayyad M, Piatetsky-Shapiro G et al. Advances in knowledge Discovery and data Mining, California: AAAI Press, 1996: 1～36
    [32] Symth P, Goodman R.M. An information theoretic approach to rules induction from databases[J].IEEE Transactions on Knowledge and Data Engineering,1992, 4(4):301-316
    [33] Srikant R, Agrawal R. Mining generalized association rules[C]. In: Proc. of the 21st Int’lConf. on Very Large Databases, Zurich,Swizerland,1995: 407～419
    [34] Piatetsky-Shapiro G. Discovery, analysis, and presentation of strong rules. Knowledge Discovery in Databases, AAAI/MIT Press,1991: 229～241
    [35] Savasere A, Omiecinski E, Navathe S. Mining for strong negative association rules in a large database of customer transactions[C]. In: Proc. of the Int’l Conf. on Data Engineering, February 1998: 494～502
    [36] Wu X, Zhang C, Zhang S. Mining both positive and negative association rules[C]. In:Proc. of the 19th Int’l Conf. on Machine Learning,The University of New South Wales,Sydney,Austrilia,2002: 658～665
    [37] Zhang C, Zhang S. Association rules mining[C]. LNAI 2307,Springer-Verlag Berlin Heidelberg, 2002: 47～84
    [38] 周皓峰,朱扬勇,施伯乐.一个基于兴趣度的关联规则采掘算法[J].计算机研究与发展,2002,39(4): 450-457
    [39] 何兵.关联规则数据挖掘算法的相关研究[D].西南交通大学,2004,7
    [40] 行小帅,焦李成.数据挖掘的聚类方法[J].电路与系统学报,2003,8(1):59-67
    [41] 张新霞,王耀青.基于统计相关性的兴趣关联规则的挖掘[J].计算机工程与科学, 2003,25(3):60-62
    [42] 王小虎.关联规则挖掘综述[J].计算机工程与应用,2003,39(33):190-193.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700