基于数据挖掘的制造业采购DSS理论及方法研究

英文题名：Study on Theory and Method of Manufacturing Procurement DSS Based on Data Mining
作者：周明
论文级别：博士
学科专业名称：机械制造及其自动化
中文关键词：采购管理 ; 数据仓库 ; OLAP ; 数据挖掘 ; 关联规则 ; 增量挖掘 ; DSS
英文关键词：Procurement management ; Data warehouse ; OALP ; Data mining ; Association rules ; Incremental mining ; DSS
学位年度：2009
导师：王太勇
学科代码：080201
学位授予单位：天津大学
论文提交日期：2009-12-01

摘要

随着ERP、CRM以及MRP-II等管理信息系统在制造业企业中的广泛应用,企业积累了大量历史记录,但是缺乏有效组织、分析和集成信息的手段,无法为管理层提供辅助决策支持。而企业的采购工作要求管理者根据当前情况及时做出应对措施,因此更需要决策支持系统的帮助。
     本文针对我国制造业企业采购特点,详细分析了采购决策支持系统的需求。在传统DSS系统架构的基础上,结合先进的数据挖掘技术,提出了新型采购决策支持系统的结构框架,有机融合了DW、OLAP以及DM技术,既包括了传统的决策分析功能,又增加了多维数据展示和深层次的关联信息挖掘技术,为决策支持提供了多种方法。
     在详细分析数据仓库技术的基础上,通过概念模型设计、逻辑模型设计和物理模型设计实现了采购决策数据仓库,探讨了异构数据的整合方法。对采购决策的OLAP分析内容和方法进行了研究,在Analysis Services的基础上设计和实现销售、采购和供货等主题的多维分析,帮助企业直观准确地了解信息。
     针对关联规则增量更新维护,提出了改进的快速更新频繁模式(IFUFP)算法,处理最小支持度阈值不变的情况下,事务数据库增量更新的关联规则维护问题。算法根据项目在新插入事务记录和原数据库中的支持度,分为四种情况验证,将满足条件的节点插入IFUFP-tree,再调用频繁增长模式挖掘关联规则。IFUFP树中的父母节点和子女节点之间双向连结,加快了节点更新速度,最大程度地利用了已有挖掘结果,提高了决策支持系统的运行效率。
     针对增量挖掘问题提出了基于前缀树的频繁模式算法(IFP)。在扫描事务数据集时,将项目按照指定的规范次序添加到IFP树中,同时更新项目在项目头表中的计数值。当插入一定数量事务记录后,按照项目当前支持度降序排列项目头表,并按此顺序重构IFP树。完成后继续按照当前项目排序插入事务,并再次执行重构步骤。算法通过插入步骤和重构步骤的循环交替进行,一次扫描数据库就可以得到全部频繁项目集,满足了企业的实际需要。
With wildly use of ERP, CRM and MRP-II systems in manufacturing enterprises, a large amount of historical transactions records have been accumulated in databases without analysis or integration. It is useless to the managers, not mentioned helping decision making. But the procurement department has to make quick response to the challenge market everyday. So it is necessary to build decision support system in order to help them making excellent choice with information extracted from historical transactions records.
     A new decision support system is introduced with integration of data warehouse, OLAP and data mining module which was developed by requirements analysis of manufacturing enterprises procurement. It can provide managers with traditional decision making functions, multi-dimensional data analysis and association rules mining from transaction records.
     A procurement data warehouse is built through conceptual model design, logical model design and physical model design. An OLAP analysis cube of sales, purchase and supply is realized following DTS and ETL by using Analysis services tools provided by MS SQL server which enable visual analysis for enterprises.
     For transaction databases usually grow over time and the association rules mined from them must be re-evaluated and some new association rules may be generated and some old ones may become invalid. An incremental IFUFP-tree maintenance algorithm is presented with modification of the FP-tree construction algorithm for efficiently handling new transactions. The original database and new inserted transactions are considered in four cases and the results are then put into the IFUFP thus efficient maintenance association rules can be achieved. Besides, the counts of the sorted frequent items are also kept in the header table which is the same as FP-tree algorithm. Bi-directional links will fasten the maintenance processes of association rules through which high proficiency DSS system can be reached.
     A novel tree structure, called IFP-tree (improved pattern tree), which captures database information with one database scan and provides the same mining performance as the FP-growth method. The IFP-tree introduces the concept of dynamic tree restructuring to generate a highly compact frequency-descending tree structure at runtime following insertion phase and restructuring phase. An efficient tree restructuring method that restructures a prefix-tree branch-by-branch is also proposed. Extensive experimental results show that the IFP-tree is efficient for incremental mining with a single database scan that improves the DSS function.

引文

[1]Mehmed Kantardzic,数据挖掘-概念、模型、方法和算法,北京:清华大学出版社,2003.6~7
    [2]黄梯云,智能决策支持系统,北京:电子工业出版社,2001.3~4
    [3]陈文伟,决策支持系统及其开发,北京:清华大学出版社,2000.15~16
    [4]高洪深,决策支持系统(DSS)理论、方法、案例,北京:清华大学出版社,2000.9~10
    [5]Efrem G.Mallach,决策支持系统与数据仓库系统(李昭智译),北京:电子工业出版社,2001.12~13
    [6]彭木根著,数据仓库技术与实现.北京:电子工业出版社,2002.6~7
    [7]Ralph H Sprague,Building Effective Decision Support Systems,New Jersey, Prentice-Hall,1982.14~15
    [8]ERMC,Decision Support Systems: A Summary, Problems and Future Trends, Decision Support Systems,1988,4(4):355～363
    [9]Tuban E,Aronson J E,Decision Support System and Intelligent Systems,New Jersey,Prentice-Hall,Upper Saddle Diver,1998.10～11
    [10] Martin T Hagan, Howard B.Demuth, mark H.Beale,神经网络设计(戴葵译),北京:机械工业出版社,2002.26~27
    [11]刘杰,王永利,ERP系统中实现决策支持的方法研究,应用科技,2003,30(4):25～28
    [12]马丽娜,刘弘,数据挖掘、OLAP在决策支持系统中的应用,计算机应用研究,2001,11:10~12
    [13]陈刚,基于数据挖掘的电力营销决策支持系统的结构原理及算法研究:[博士学位论文],重庆;重庆大学,2004
    [14]W.H.Inmon,Building the Data Warehouse,New York, NY, USA, John Wiley & Sons, Inc, 1996.12~13
    [15]王珊等,数据仓库技术与联机分析处理,北京:科学出版社,1999.1~2
    [16]苏新宁,杨建林,江念男,数据仓库和数据挖掘,北京:清华大学出版社,2006.13~16
    [17]郭秋萍,余建国,刘双红,企业数据挖掘理论与实践,郑州:黄河水利出版社,2005.32~47
    [18]Park J S. Using a hash-based method with transaction trimming for mining association rules. IEEE Transactions on knowledge and data engineering, 1997, 9(5), 813~825
    [19]Kaufman L, Rousseeuw P J, An Introduction to Cluster Analysis, Computer Science Press, 1990(4): 89~100
    [20]任明仑,杨善林,朱卫东,智能决策支持系统:研究现状与挑战,系统工程学报,2002(17):430~440
    [21]汤胤,彭宏,郑启伦,基于数据挖掘和范例推理的智能分析决策支持技术综述,计算机工程与应用,2004,12(9):184~187。
    [22]袁援,陈松乔,一种协同式智能决策支持系统的研究与实现小型微型计算机系统,2003,24(2) :289~291
    [23]蒙祖强,基于分类模型的知识发现机理和方法研究:[博士学位论文],长沙;中南大学,2004
    [24]何汉明,何华灿,社会Agent的思维模型,计算机应用研究,2005,24(7):222～225
    [25]郭健,基于多Agent的智能管理信息系统研究:[博士学位论文],天津;天津大学,2007
    [26]张双民,群体Agent合作求解方法的研究:[博士学位论文],北京;清华大学,2004
    [27]孙志勇,多Agent系统体系结构及建模方法研究:[博士学位论文],合肥;合肥工业大学,2004
    [28]朱玉全,孙志挥,季小俊,基于频繁模式树的关联规则增量更新算法,计算机学报,2003,26(1):91~96
    [29]皋军,王建东,关联规则挖掘算法更新与拓展,计算机工程与应用,2003,39(35):178~179
    [30]Avigdor Gal, Zachary Stoumbos, Scheduling of Data Transcription in periodically Connected Databases, Stochastic Analysis and Applications, 2003, 21(5):1021~1058
    [31]冯玉才,冯建琳,关联规则的增量式更新算法,软件学报,1998,9(4):301~306
    [32]朱玉全,孙志挥,赵传申,快速更新频繁项目集,计算机研究与发展,2003,40(1):94~99
    [33]姜玉泉,最大频繁项目集的增量式更新算法,计算机工程与应用,2003,39(24):187~188
    [34]王其涛,基于数据挖掘的采购决策研究:[硕士学位论文],广东;广东工业大学,2005
    [35]高飞,基于Agent的采购管理决策支持新系统研究:[硕士学位论文],大连;大连海事大学,2008
    [36]谢胜强,吴忠,基于CBR与RBR的设备采购决策支持系统,计算机工程与应用,2003,12,71~73
    [37]霍佳震,雷星晖,隋明刚,基于供应链的供应商绩效评价体系研究,上海大学学报,2002(10):36~39
    [38]马士华,林勇,陈志祥,供应链管理,北京:机械工业出版社,2000.40~46
    [39]钱碧波,敏捷虚拟企业合作伙伴选择评价体系研究,中国机械工程,2002(4):397～401
    [40]马丽娟,基于供应链管理的供应商选择问题初探,工业工程管理,2002(6):23～25
    [41]仲维清,侯强,供应商评价指标体系与评价模型研究,数量经济技术经济研究,2003(3):93～97
    [42]朱建军,刘士新,供应商选择及定购计划的分析,东北大学学报(自然科学版),2003,24(10):956~958
    [43]刘蓉,张毕西,廖朝辉,供应链合作伙伴的选择、评估和动态监控,系统工程,2005,23(5):51~54
    [44]吴隽,张剑英,任丽娟,基于证据推理与粗糙理论的供应链合作伙伴选择方法研究,中国软科学,2005(5):130～133
    [45]王强,刘东波,王建新,数据仓库元数据标准研究,计算机工程,2002,28(12):123～125
    [46]林杰斌,刘明德,陈湘,数据挖掘与OLAP理论与实务,北京:清华大学出版社,2003.75～91
    [47]万德钧,房建成,王庆,数据仓库的理论、方法及其应用,南京:江苏科学技术出版社,2000.55～89
    [48]程平,黄仁,柳刚,高性能数据仓库平台构建的研究,计算机工程与设计,2006,(6):35～37
    [49]鲍蓉,数据仓库模式版本的设计与实现,计算机工程与应用,2007,(33):33~35
    [50]Ralph Kimball,Expert Methods for Designing,Developing,and Deploying Data Warehouses,WILEY,2005.82~83
    [51] Sheikholexlami G, Chatedee S, Zhang AD, WaveCluster:A multi-resolution clustering approach for very large spatial databases, Proc.of the 24th Int’l Conf.on Very Large Data Bases.New York, 2003.428~439
    [52]章强,基于商业智能的制造业分析型客户关系管理系统研究与应用:[硕士学位论文],南京;东南大学,2007
    [53]沈兆阳著, SQL Server 2000 OLAP解决方案-数据仓库与Analysis Services,北京,清华大学出版社,2001.156~177
    [56]Jiawei Han,Micheline Kamber著,数据挖掘概念与技术(范明译),北京:机械工业出版社,2001.89~92
    [57]林杰斌,数据挖掘与OLAP理论与实务,北京:清华大学出版社,2003.144～150
    [58]杨莉萍,杨晓红,Office Web组件在OLAP分析系统中的应用,计算机系统应用,2004(11):70～72
    [59]Agrawal R,Imielinski T,Swami A, Mining association rules between sets of item in large database, Proceeding of the1993 ACMSIGMOD International conference on Management of Data, NewYork ACM Press, 1993: 207～216
    [60]Agrawal R, Srikant S, Fast algorithm for mining association rules in large databases, Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, Chile, 1994.487～499
    [61]Park J S, Chen M S, Yu P S, An effective hash based algorithm for mining association rules, Proceedings of ACMSIGMOD International Conference On Management of Data, San Jose, CA,1995.175～186
    [62]N.F. Ayan, A.U. Tansel, E.Arkun, An Efficient Algorithm to Update Large Itemsets with Early Pruning, Proc. of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, 1999.287~291
    [63]Han J, Fu Y, Discovery of multiple-level association rules from large databases, Proceeding of the 2lth International Conference on Very Large Data, Zurich,Switzerland, 1995.402~431
    [64]A Savasere, E Omiecinski, S Navathe, An efficient algorithm for mining association rules in large databases Proceedings of the 21st International Conference on Very Large Databases, Zurich,Switzerland, 1995.432~443
    [65]H Toivonen, Sampling large databases for association rules, Proceedings of the 22nd International Conference on Very Databases,Bombay,India,1996.134~145
    [66]S Brin, R Motwani, J D Ullman, Dynamic Itemset counting and implication rules for market basket data, Proceedings of the 1997 ACM SIGMOD International Conference On Management of Data,Tucson,Arizona,1997.255~264
    [67]Han J, Pei J, Yin Y, Mining frequent patterns without candidate generation, Proceeding of the 2000 ACM-SIGMOD International Conference on Management of Data (SIGMOD’00), Dallas, 2000.1~12
    [68]Pei J, Han J, Lu H, H-Mine: Hyper-structure mining of frequent patterns in large databases, First IEEE International Conference on Data Mining (ICDM'01), 2001. 441~448
    [69]G. Goulbourne, F Coenen, PL eng, Algorithms for computing association rules using a Partial-support tree, Knowledge-Based Systems, 2000(13).141~149
    [70]Lan YongJie, Mining frequent Patterns without conditional FP-tree generation, Proceedings of the 7th International Conference on Electronic Measurement & Instruments, Beijing, China, 2005.476~480
    [71]Raj P.Gopalan, Yudho Giri.Sueahyo, High Performance Frequent Patterns Extraction using Compressed FP-Tree, Proceedings of SIAM International Workshop on high performance and Distributed Mining, Orlando, USA, 2004.95~104
    [72]IBM. QUEST Data Mining Project [E3/OL].http://www.almaden.ibm.com/cs /quest
    [73]钟勇发,吕红兵,基于FP-growth的关联规则增量式更新算法,计算机工程与应用,2004,40(26):174～175
    [74]朱玉全,孙志挥,季小俊,基于频繁模式树的关联规则增量式更新算法,计算机学报,2003,26(1):91～96
    [75]C.H.Chang, S.H.Yang, Enhancing SWF for incremental association mining by itemset maintenance, Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2003.301~312
    [76]D.W. Cheung, S.D. Lee, B. Kao, A general incremental technique for maintaining discovered association rules, Proceedings of the Fifth International Conference on Database Systems for Advanced Applications, 1997,185~194
    [77]T.-P. Hong, C.-W. Lin, Y.-L. Wu, Incrementally fast updated frequent pattern trees, Expert Systems with Applications, 2008, 34 (4), 2424~2435
    [78]J.-L. Koh, S.-F. Shieh, An efficient approach for maintaining association rules based on adjusting FP-tree structures, Proceedings of the DASFAA, Springer-Verlag, Berlin Heidelberg, New York, 2004, 417~424
    [79]Li X, Deng ZH, Tang S, A fast algorithm for maintenance of association rules in incremental databases,In: ADMA, Xi’AN, 2006,56~63
    [80]C.K. Leung, Q.I. Khan, Z. Li, T. Hoque, CanTree: a canonical-order tree for incremental frequent-pattern mining, Knowledge and Information Systems, 2007, 11(3),287~311
    [81]F. Wua, S.-W. Chiang, J.-R. Linb, A new approach to mine frequent patterns using item-transformation methods, Information Systems, 2007(32),1056~1072
    [82]Y.-S. Lee, S.-J. Yen, Incremental and interactive mining of web traversal patterns, Information Sciences, 2008(178),287~306
    [83]S. Zhang, J. Zhang, C. Zhang, EDUA: an efficient algorithm for dynamic database mining, Information Sciences, 2007 (177), 2756~2767
    [84] M.-Y. Lin, S.-Y. Lee, Interactive sequence discovery by incremental mining, Information Sciences, 2004 (165),187~205
    [85]N.F. Ayan A.U. Tansel E. Akrun, An efficient algorithm to update large itemsets with early pruning, Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999, 287 ~291
    [86]Cheung W, Za?ane OR. Incremental mining of frequent patterns without candidate generation or support constraint, Proc IDEAS 2003,111~116
    [87]Cheung, D.W., Han, J., Ng, V. T., Maintenance of discovered association rules in large databases: An incremental updating approach, The 12th IEEE international conference on data engineering, 1996.106~114
    [88] Zheng, Z., Kohavi, R., & Mason, L., Real world performance of association rule algorithms, The international conference on knowledge discovery and data mining, 2001, 401~406
    [89]G. Grahne, J. Zhu, Fast algorithms for frequent itemset mining using FP-Trees, IEEE Transactions on Knowledge and Data Engineering, 2005, 17(10), 1347~1362
    [90]Han J, Cheng H, Xin D, Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery, 2007, 15(1),55~86
    [91]J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 2000, 1~12
    [92]Liu G, Lu X, Yu JX, CFP-tree: A compact disk-based structure for storing and querying frequent itemsets, Information Sciences, 2007, 32(2), 295~319
    [93]T.-P. Hong, C.-W. Lin, Y.-L. Wu, Incrementally fast updated frequent pattern trees, Expert Systems with Applications, 2008, 34 (4), 2424~2435
    [94]A.J.T. Lee, C.-S. Wang, An efficient algorithm for mining frequent inter-transaction patterns, Information Sciences, 2007 (177), 3453~3476
    [95] Qiu, Y., Lan, Y. J., & Xie, Q. S, An improved algorithm of mining from FP-tree, Proceedings of the third international conference on machine learning and cybernetics, Shanghai, 2004, 26~29
    [96]刘乃丽,李玉忱,马磊,一种有效且无冗余的快速关联规则挖掘算法,计算机应用,2005,25(6),1396～1397
    [97]MJ Zaki, Mining Non-Redundant Association Rules, Data Mining and Knowledge Discovery,Springer,2004,9(3), 223~248
    [98]MA X, TONG Y, TANG S, Efficient incremental maintenance of frequent Patterns with FP-tree, Journal of Computer Science and Technology, 2004,19(6),876~884
    [99]邹力鸱,张其善,基于CAN-树的高效关联规则增量挖掘算法,计算机工程,2008,34(3):29～31
    [100]T. Hu, S.Y. Sung, H. Xiong, Q. Fu, Discovery of maximum length frequent itemsets, Information Sciences, 2008(178),69~87

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700