A Novel Predictive Modeling Framework: Combining Association Rule Discovery With EM Algorithm.
详细信息   
  • 作者:Jiang ; Zhonghua.
  • 学历:Doctor
  • 年:2013
  • 导师:Karypis,George,eadvisorRoumeliotis,Stergiosecommittee memberKuang,Ruiecommittee memberGeyer,Charlesecommittee member
  • 毕业院校:University of Minnesota
  • Department:Computer Science.
  • ISBN:9781267978172
  • CBH:3556092
  • Country:USA
  • 语种:English
  • FileSize:924721
  • Pages:125
文摘
Building predictive models and finding patterns are two fundamental problems in data mining. This thesis focuses on making contributions to these two areas. In recent years,there have been increasing efforts to apply association rule mining to build predictive models,which have resulted in the areas of Associative Classification AC) and Associative Regression AR). The first major contribution of this thesis is a novel predictive modeling framework that can be applied to build both AC and AR models. The resulting classification/regression model is called ACEM/AREM. ACEM/AREM derives a set of classification/regression rules by: i) applying an instance based approach to mine itemsets which form the rules left hand side,and ii) developing a probabilistic model which determines,for each mined itemset,the corresponding rules parameters. The key contributions of ACEM/AREM include the probabilistic model that is able to capture interactions among itemsets and an expectation and maximization EM) algorithm that is derived to learn rule parameters. The extensive experimental evaluation shows that the EM optimization can improve the predictive performance dramatically. We also show that ACEM/AREM can perform better than some of the state of the art classification/regression models. The second major contribution of this thesis is the development of effective pruning methods that lead to efficient algorithms for two pattern mining problems. The first pattern mining problem is the instance based itemset mining of ACEM/AREM. ACEM/AREM utilizes an Instance-Based Itemset Miner IBIMiner) algorithm to discover best itemsets for each training instance. IBIMiner incorporates various methods to bound the quality of any future extensions of the itemset under consideration. Our experiments show that these bounds allow IBIMiner to considerably prune the size of the search space. The second pattern mining problem is the extention of association rule mining to the dyadic datasets. These are the datasets where the features are naturally partitioned into two groups of distinct types. Traditional association rule mining methods employ metrics e.g.,confidence) that fail to distinguish the two types of features. We address this problem by proposing a new metric called dual-lift that captures the interaction between features. Based on that,we formulate a constraint pattern mining problem,which is solved by an efficient algorithm that pushes various constraints deeply into the rule mining process. We apply the dual-lift mining formulation to some real world applications and show some interesting results.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700