基于SQL Server构建数据挖掘解决方案的研究及应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
DM是近年来信息产业界讨论和研究的一个热点,目前DM的研究大多集中在算法方面。大多数DM系统无法与数据存储的主要介质数据库无缝集成,同时由于数据挖掘标准语言的欠缺,使DM技术的应用范围仅仅限于领域专家。因此将DM与数据库紧密耦合及数据挖掘标准语言的开发,已经成为当前DM领域中新的研究热点。
     本文就是在这样一个大背景下,结合上海市教委重点学科沪教委科(2001)71资助的中法合作项目“数据挖掘在GIS中的应用”,研究了基于SQL Server构建数据挖掘解决方案的方法以及DM技术在GIS中的应用。本文主要做了以下几个方面的工作:
     第一,本文研究了OLE DB For DM和PMML两种标准DM语言,探讨了支持这两种标准语言的SQL Server的基本结构,在此基础上研究了基于SQL Server分析服务构建数据挖掘解决方案的方法,构造了相应的系统结构,给出了软件开发实例,利用OLE DB For DM中定义的DDL以及DSO分别从客户端和服务器端完成了DM模型的创建、训练和存储,达到了DM、数据库和应用程序一体化的目标。
     第二,本文研究了将自主开发的DM算法外挂于SQL Server构建数据挖掘解决方案的方法,构造了系统结构,给出了软件开发实例。
     第三,本文重点研究了如何在SQL Server中集成自主开发的DM算法,给出了整体实现框架,并在VC++7.0下实现了DM算法的集成,将DM算法与SQL Server数据库无缝集成到了一起,使得DM算法符合OLE DB For DM,利用该算法构建了DM模型,做出了预测查询。
     第四,本文研究了DM在GIS中的应用,探讨了DM与GIS的集成,提出了一种DM与GIS集成的结构,构造了智能航线设计系统,将自主开发的复合聚类分析算法以模块形式外挂于SQL Server的方式应用到了智能航线设计系统中,完成了航线设计,同时本文还基于SQL Server分析服务构建了GIS数据挖掘解决方案,创建了GIS数据挖掘模型,并将该模型以PMML标准形式存储,给出了详细的船舶分布统计信息。
     第五,本文比较了三种基于SQL Server构建数据挖掘解决方案的优缺点,得出了理想的构建数据挖掘解决方案的方法,为数据挖掘的广泛应用提供了一条新的思路,同时对利用复合聚类分析算法和微软聚类算法设计的航线进行了分析比较。
Data Mining has become one of the most popular researches in IT industry in recent years, but the research is mainly concentrated on algorithm. Presently the application of DM is only confined to domain experts, since almost all the Data Mining systems are not seamlessly integrated with Relational Database, and the deficiency of Data Mining language is also the reason for it. Therefore the integration of Data Mining with database as well as the development of Data Mining standard language has become one hot spot in current researches in DM field.
    Under such a background, the thesis, in combination of a Sino-French cooperation project "The Application of Data Mining in GIS", which was financed by Shanghai Education Commission Section (2001) 71 as Key Discipline of Shanghai Education Commission, studies the formulation of Data Mining solution based on SQL Server as well as the application of DM technology in GIS.
    SQL Server and Data Mining constitute the main line throughout the whole thesis. The author's object is, on the basis of OLE DB For DM, to integrate Data Mining with Relational Database as well as the application program. For this reason the thesis particularly discusses three ways of building the solution for Data Mining based on SQL Server.
    The first way is to use the Data Mining algorithms provided by SQL Server Analysis Services to solve the problems of Data Mining. Those algorithms are completed in accordance with OLE DB For DM, so that they can be directly used to build Data Mining models from Relational Database. The models will be stored in PMML style and can be used in any application program. In this part, the author provides the system structure and gives an example of this kind of Data Mining solutions.
    The second way of building the solution for Data Mining based on SQL Server is to embed some Data Mining algorithms designed by the author himself or others into SQL Server as independent program modules. Then these modules could be used to find knowledge in data warehouse.
    The third way is to use the interfaces provided by SQL Server Analysis
    
    
    Services to integrate the Data Mining algorithms of third providers. The algorithms must accord with OLE DB For DM, so that they can communicate with Analysis Services through a series of COM interfaces. In this part, the author develops a DLL based on line regression algorithm in VC++ development environment. The algorithm will be useful after compiled the DLL, now people can use this algorithm to build Data Mining model and train it in Relational Database.
    Finally, the author researches how to integrate Data Mining into GIS and builds two solutions of Data Mining in GIS based on SQL Server. In the first solution, a new improved integrated clustering analysis algorithm is used to carry out course design. In the second solution, the author separately builds Data Mining models from server and client in GIS application program. In this way, the author also designs the passage with Microsoft clustering algorithm and obtains some other valuable results.
    Besides, the author makes comparison between and among the three solutions of Data Mining based on SQL Server discussed in this thesis, and analyzes two kinds of results with different methods to optimize passage design.
    This thesis facilitates the formulation of Data Mining solutions based on SQL Server, and bridges the gap between Data Mining and Relational Database. In this way, Data Mining application program in Relational Database or Data Warehouse can be directly developed or operated. In addition, the way to formulate Data Mining solutions in GIS is showed in the thesis as well.
    Hao Ruiji (Control Theory and Control Engineering) Directed by Prof. Tang Tianhao and Prof. Shi Weifeng
引文
1. Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Copyright 2001 by Morgan Kaufmann Publishers, 1~5.
    2.黄晓霞,孝蕴诗.数据挖掘应用研究及展望.计算机辅助工程,No.4 Dec.2001:23~24
    3.高敏.数据挖掘应用现状与产品分析.微计算机应用,2002.9,第23卷第5期:281
    4.吴载斌,王斌会.数据挖掘软件的介绍及其评价.计算机时代,2002,第7期:3~4
    5.恽爽,胡南军,董浚等.数据挖掘软件现状研究.计算机工程与应用,2003.8,189
    6.皮德常.CIMS数据库中数据挖掘关键技术的研究.南京:南京航空航天大学博士学位论文,2002
    7.萨师煊,王珊.数据库系统概论.,第三版.北京:高等教育出版社,2000.2.6~9
    8.东方人华.SQL Server 2000与Visual Basic.NET数据库入门与提高.北京:清华大学出版社,2002.6.5~13
    9.陈楠.基于数据仓库和数据挖掘技术的决策支持系统.计算机时代,1998,第3期:14~15
    10. Inmon, W. H. Building the Data Warehouse, Third Edition. Copyright 2002 by John Wiley & Sons, Inc. 21~24.
    11.马丽娜,刘弘,张希林.数据挖掘、OLAP在决策支持系统中的应用.计算机应用研究,2001,第11期:10-12
    12.王泽明.基于数据仓库的综合决策支持系统.电脑开发与应用,第14卷第6期:15~17
    13.罗可,蔡碧野,卜胜贤等.数据挖掘及其发展研究.计算机工程与应用,2002.14,182~185
    14.郭斯羽.动态数据中的数据挖掘研究.浙江:浙江大学博士学位论文,2002.6
    15.邓大伟.数据库系统开发与数据挖掘技术.成都:西南石油学院硕士论文,2002.5
    16. E Bezerra, M Mattoso, G Xexeo. An Analysis of the Integration between Data Mining Applications and Database Systems In: Nebecken. CABrebbir. DataMining
    
    Published by WITP ress, 2000.151~160
    17. Han J Fu Y, Koperski K, et al. DMQL: A Data Mining Query Language for Relational Databases, In Proc. of the 1996 SIGMOD workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Canada, 1996.
    18. I mielinski, T virmani, A Abdulghani. A Discovery Board Application Programming Interface and Query Language for Database Mining .In Proc. of the 2nd Int'l Conference on Knowledge Discovery and Data Mining, Port land, Oregon, USA, 1996.
    19. P.Chapman R. Kerber, J.Clinton, T. Khabaza、T.Reinartz, R.Wirth: The CRISP-DM process model, Technical Report, http://www.crisp-dm.org/1999.
    20. Sunita Sarawagi, Shiby Thomas. Rakesh Agrawal: Integrating mining with relational database systems: alternatives and implications. Proc. of SIGMOD Conference, pp. 343~354, 1998.
    21. P. Hipson: OLE DB and ADO developer's guide, McGraw-Hill, 1999.
    22.颜雪松,李宏,王欣.用SQL Server 2000构建数据挖掘解决方案.计算机与现代化,2001,第5期:30
    23. Amir Netz, Surajit Chaudhuri, Jeff Bernhardt: Integrating Data Mining with SQL Databases: OLE DB for Data Mining. ICDE 2001 P. 379-387
    24.朱建秋,张晓,蔡伟杰等.数据挖掘语言浅析,http://www.dmgroup.org.cn/1w2.htm
    25. Robert Grossman, Stuart Bailey, Ashok Ramu, Balinder Malhi, Michael Cornelison, Philip Hallstrom, and Xiao Qin. "The Management and Mining of Multiple Predictive Models Using the Predictive Modeling Markup Language (PMML)", AFCEA'99
    26. IBM CORP. Somers, NY. Microsoft, Redmond, WA. Oracle Corporation Redwood Shores, CA. SPSS Inc. Chicago, IL. http://www.dmg.org/
    27.孙洪波.组件式数据挖掘平台的设计与开发.浙江:浙江大学硕士学位论文,2003
    28.罗芳.数据挖掘和XML技术的研究.武汉:武汉理工大学硕士学位论文,2002
    29. Microsoft Corporation. "OLE DB for Data Mining Specification" Version 1.0. July 2000, 84
    30.邓文华.基于SQL Server 2000的数据挖掘方案研究.物流科技,2003,第2期: 54~55
    31. Claude Seidman. Data Mining with SQL Server 2000 Technical Reference.Copyright
    
    2002 by Microsoft Corporation. P. 10-15
    32.SQL Server 2000.联机丛书帮助文档
    33.周怡,周卫平,李燕茹.用SQL Server 2000构建数据仓库的尝试.数理医药学杂志,2003,第16卷第1期:75~76
    34. Fabio Arciniegas. XML Developer's Guide, Copyright 2001 by The McGraw-Hill Companies, Inc, 97~121.
    35. Mike Gunderloy, Tim Sneath. SQL Server Developer's Guide to OLAP with Analysis Services, Copyright 1999 SYBEX Inc. 336~339
    36.刘彦青.解读SQL Server 2000分析服务中的数据挖掘功能,天极新闻与评论,http://www.chinabyte.com/20010425/171474.shtml,2001.4.25
    37. Gunderloy, M. Visual Basic Developer's Guide to ADO, Copyright 1999 SYBEX Inc. 20~48
    38.苑玉凤.一元回归分析的应用研究,汽车科技,1996年第1期:22~25
    39.盛骤,谢式千,潘承毅.概率论与数理统计.北京:高等教育出版社,1997.12.264~279
    40. Seth Paul. Third-Party Data Mining Providers, Microsoft Corporation, 2001.3, 1~5
    41. Raman Iyer , Bogdan Cdvat. Microsoft SQL Server~(TM) "Yukon" Analysis Services, 2003.12, 1~14
    42.张元教.基于GIS的空间决策支持系统理论及其应用.南京:河海大学硕士学位论文.2001.4
    43.周海燕.空间数据挖掘的研究.郑州:中国人民解放军信息工程大学博士学位论文.2003.4
    44.袁红春,熊范纶,淮晓永.空间数据挖掘及其与智能系统集成框架,信息与控制,2002.8,第31卷第4期:303~307
    45.吴信才.地理信息系统设计与实现,北京:电子工业出版社,2002.2,1~5
    46.王天真.基于神经网络的智能数据挖掘方法及应用研究.上海:上海海运学院硕士学位论文,2003.12.
    47. M. F. Goodchild. Geographic data modeling, Computers and Creosciences, 1992, 401~408
    48.刘光.地理信息系统—基础篇,北京:中国电力出版社,2003.2,2~5
    49.郝瑞吉,汤天浩,王天真.基于DM和OLAP的地理信息决策支持系统研究,复旦大学学报,2004,第5期
    50.谢兴澜,赵德鹏,王德强.ECDIS中的航线设计与最优航法.大连:大连海事大
    
    学硕士学位论文,2002.12
    51.[英]A·N·科克罗夫特等,海上避碰规则指南,赵劲松译.大连:大连海事大学出版社,1992,20~30
    52.张成才,魏文秋.VB与MapInfo集成开发应用软件的研究,武汉水利电力大学学报,2000年2月,第33卷第1期:77~78
    53.黄添强.基于空间数据挖掘的环境调控空间决策支持系统研究.福州:福州大学硕士学位论文,2002.12
    54.王天真,郝瑞吉,汤天浩.一种基于数据挖掘的GIS及在航海中的应用,上海:中国航海,2003.9,1~4
    55.齐锐,屈韶琳,阳琳赞.用MapX开发地理信息系统.北京:清华大学出版社,2002 11.1~11

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700