货票信息综合应用系统实现与数据挖掘技术应用的研究

英文题名：Study on Realization of Waybill Information Synthetic System and Application of Data Mining Technology
作者：左建丽
论文级别：硕士
学科专业名称：交通运输规划与管理
中文关键词：数据库 ; 数据仓库 ; 数据挖掘 ; 货票 ; 综合应用
英文关键词：database ; data warehouse ; data mining ; waybill ; synthetic application
学位年度：2003
导师：刘澜
学科代码：082303
学位授予单位：西南交通大学
论文提交日期：2003-02-01

摘要

近年来，信息产业发展迅猛，信息技术相继在各行业领域内得到广泛应用，不仅带来了生产、办公的规范化与高效率，而且导致了企业管理制度与生产方式的重大变革。我国铁路在行业管理信息化进程中，结合行业自身特点，运用先进的信息技术，开发出满足现有铁路运输需求的运输管理信息系统。其中以货运处理为中心的货票管理信息系统已在全国铁路范围内得到广泛应用，取得了良好的经济效益与社会效益。但是海量货票数据采集后并未得到充分的应用和开发，信息含量高、内容丰富的货票数据库成为无人问津的数据坟墓，造成资源的极大浪费。
     数据挖掘技术与数据仓库解决方案的兴起与迅速发展，为海量数据的实际应用提供了理论依据与技术支持，为货票信息的综合应用带来了新的契机。从铁路现代化运营的角度来看，可以应用数据挖掘技术，参照数据仓库解决方案，对货票系统长期积累的海量数据用数学方法进行挖掘，将数据处理的重点从传统的业务处理扩展到对运输数据的联机分析处理，建造以数据挖掘技术为核心的决策支持系统——货票信息综合应用系统，为运输管理和决策服务。
     本篇论文首先通过对当前国内外数据挖掘理论与技术的研究，讨论了数据挖掘的相关理论及其相关技术，其中包括数据挖掘的概念、数据挖掘预处理技术和数据挖掘的概念描述、联机分析处理、关联规则、分类与预测技术。然后基于对我国铁路信息化建设发展与现状，特别是TMIS工程之一——货票管理信息系统的调查与研究，描述了铁路货票信息综合应用系统的信息组成、数据关系及其扩充“U”型结构。在对现有货票信息和数据挖掘技术研究与分析的基础上，结合数据仓库解决方案，提出数据挖掘技术在货票信息综合应用系统中应用的总体结构，论述数据挖掘和数据仓库相关技术与方法的运用。论文最后论述了货票信息前台综合应用系统的设计与实现过程，该系统根据用户需求、知识库中的数据模式与专家知识，以及相关的软件工程标准进行详细设计，利用开发工具编程实现，集查询、统计、数据挖掘与分析于一体，是数据挖掘与数据仓库技术应用研究的最终成果。
Information industry has get rapid development recently, and information technology has been widely applied in various domain one after another, which not only brought standardization and high efficiency to production and business, but also resulted in important revolution of enterprise management system and work style. In the information process of industry management, our railway developed the Transportation Management Information System fitting to the demand of nowadays railway transportation by applying advanced information technology and combining its own characters. One sub-system of it is Waybill Management Information System, which deals with the railway freight task mainly. It has been widely applied allover the country railway freight stations and achieved upstanding economic and social benefit. But the magnanimity Waybill data is not full used and exploited after being colleted, and the Waybill database which is full of useful information becomes a data tomb no body to make inquire on any more, which is an enormous waste of resource.
    The rise and development of data mining technology and data warehouse solution provides the practical application academic gist and technological support, and brings new turning point to the synthetic application of Waybill information at the same time. From the point of railway modernization management we can use mathematical method to mine useful information from magnanimity Waybill data and develop Waybill Information Synthetic Application System which serves for the transportation management and decision-making by applying data mining technology and consulting data warehouse solution.
    First of all, the author entered into the correlativity theories and technologies of data mining, including the concept, pretreatment technology, concept description, online analysis program, association rule, assorting and forecasting technology of data mining, through the study on nowadays data mining theories and technology. And then, based on the research in the development and actuality of railway information construction in our country, especially the Waybill Management Information System, the author described the data composing, data relationship and the extended "U" type structure. The author brought up the



    framework on which data mining technology is applied in the Waybill Information Synthetic Application System, and discussed the application of some data mining and data warehouse technology and method, after working over and analyzing the waybill information and data mining technology. In the last, the author discussed the design and realization process of Waybill Information Synthetic Application System in Client. This system is designed in detail according to the user demanding, the data mode and specialist knowledge in repository, and some of soft engineering standards. It becomes an application including such as inquiring, statistic, data mining and analyzing function by using developing tools to programme, and it's the final production of study on the application of data mining and data warehouse technology.

引文

[1] R.Agrawal and R.Srikant. Fast algorithms for mining association rules. In Proc. 1995Int. Conf. Very Large Data Bases, Santiago, Chile, 1994:487-499
    [2] K.Beyer and R.Ramakrishnan. Bottom-up computation of sparse and ice-burg cubes. In Proc. 1999 ACM-SIGMOD Conf. On Management of Data, Philadelphia, Pa, 1999:359-370
    [3] A.Berson and S.J.Smith. Data Warehousing, Data Mining, and OLAP. New York: McGraw-Hill, 1997
    [4] E.F.Codd,S.B.Codd, and C.T.Salley. Beyond decision support. Computer World. 1993,27:97-108
    [5] S.Chaudhuri and U.Dayal. An overview of data warehousing and OLAP technology. ACM-SIGMOD Record. 1997,26:65-74
    [6] C.Carter and H.Hamilton. Efficient attribute-Oriented generalization for knowledge discovery from database. IEEE Trans. Knowledge and Data Engineering. 1998,10:193-208
    [7] K.Cios,W.Pedryca,and R.Swiniarski. Data Mining Methods for Knowledge Discovery. Boston: Kluwer Academic Publisher, 1998:2-9
    [8] J.Han.Towards on-line analytical mining in large databases. ACM-SIGMOD reord. 1998,27:97-107
    [9] R.Elmasri and S.B.Navathe.Fundamentals of Database Systems. 2nd ed. Redwood City, CA: Benjamin/Cummings, 1994
    [10] U.M.Fayyad,G.Piatetsky-shapiro,P.Smyth,and R.Uthurusamy, editors. Advances in knowledge Discovery and Data Mining. Cambridge,Ma:AAAI/MIT Press,1996
    [11] U.M.Fayyad,R.Uthurusamy and G..Piatetsky-Shapiro,editors.Notes of AAAI'93 Workshop Knowledge Discovery in Database,Washington,DC,1993
    [12] U.M.Fayyad and R.Uthurusamy,editors.Notes of AAAI'94 Workshop Knowledge Discovery in Database,Seattle,WA,1994
    [13] U.M.Fayyad and R.Uthurusamy,editors.Proc.1st Int. Conf. Knowledge Discovery and Data Mining, Montreal, Canada, 1995. AAAI press
    [14] J.Gray,S.Chaudhuri,A.Bosworth,A.Layman,D.Reihart,M.Venkatrao, F.Pellow,and H.Pirahesh. Data cube: A relational aggregation operator

    generalizing group-by, cross-tab and sub-totals. Data Mining and Knowledge Discovery. 1997,1:29-54
    [15] J.L.Devore. Probability and Statistics for Engineering and the Sciences. 4th ed. New York: Duxbury Press,1995
    [16] J.Hellerstein,P.Haas, and H.Wang. Online aggregation. In Proc. 1997 ACM-SIGMOD Int. Conf.Management of Data.Tucson,AZ,1997,5:171-182
    [17] W.H.Inmon.Building the Data Warehouse. New York: John Wiley &Sons, 1996:5-9
    [18] D.A.Keim.Visual techniques for exploring database.In Tutorial Notes,3rd Int. Conf. Knowledge Discovery and Data Mining,Newport Beach,CA,1997
    [19] G.Piatetsky-Shapiro,editor.Notes of IJCAI'89 Workshop Knowledge Discovery in Database,Detroit,MI,1989
    [20] G.Piatetsky-Shapiro,editor.Notes of AAAI'91 Workshop Knowledge Discovery in Database,Anaheim,CA,1989
    [21] Y.Kambayashi,W.Winiwarter and M.Arikawa. Data Warehousing and Knowledge Discovery. 3rd International Conference,Munich,Germany,2001. Berlin: Springer
    [22] D.Pyle.Data Preparation for Data Mining.San Francisco: Morgan Kaufmann, 1999
    [23] J.R.Quinlan.Induction of deision trees. Machine Learning.1986,1:81-106
    [24] D.E.Rumelhart,G.E.Hinton,and R.J.Williams.Learning internal representations by error propagation.In D.E.Rumehart and J.L.McClellan, editors, Parallel Distributed Processing.Cambridge,Ma: MIT Press,1986
    [25] J.Hipp,U.Guntzer,and G.Nakhaeizadeh.Data Mining of Association Rules and the Process of Knowledge Discovery in Databases.P.Perner.Advances in Data Mining 2002. Berlin, Springer: 15-36
    [26] A.Silberschatz,H.F.Korth,and S.Sudarshan.Database System Concepts,3rd ed. New York:McGraw-Hill,1997
    [27] E.Thomsen.OLAP Solution: Building Multidimensional Information System. New York:John Wiley & Sons,1997
    [28] E.R.Tufte.The Visual Display of Quantitative Information. Graphics press, 1983
    [29] F.Coenen and P.Leng. Finding Association Rules with Some Very Frequent

    Attrbutes.T.Elomaa,H.Mannila and H.Toivonen. PKDD, Helsinki, Finland, 2002. Berlin: Springer: 99-111
    [30] J.D.Ullman and J.Widom. A First Coursse in Database System. Englewood Cliffs, NJ, Prentice Hall, 1977
    [31] D.Dey,V.C.Storey and T.M.Barron. Improving Database Design through the Analysis of Relationships. ACM Trans. Database System.1999,24(4):453-486
    [32] C.Westphal and T.Balxton. Data Mining Solutions: Method and Tools for Solving Real-World Problem.New York, John Wiley & Sons, 1998
    [33] S.M.Weiss and N.Indurkhya. Predictive Data Mining.San Francisco: Morgan Kaufmann, 1998.
    [34] S.M.Weiss and C.A.Kulikowski.Computer systems That Learn: Classfication and prediction Methods from statistics, Neural Net, Machine Learning, and Expert Systems. San Mateo, CA, Morgan Kaufmann, 1991
    [35] R.Wang,V.Storey,and C.Firth.A framework for analysis of date quality research.IEEE Trans. Knowledge and Data engineering,1995,7:623-640
    [36] C.E.Brodley and P.E.Utgoff. Multivariate descision trees.Machine learning,1995,19:45-77
    [37] R.Duda and P.Hart.Pattern Classification and Scene Analysis.New York,John Wiley & Sons,1973
    [38] B.B.Hubbard. The World According to Wavelets.Wellesley,Ma,A.K.Peters,1996:16-53
    [39] J.Devore and R.Peck.Statistics: The Exploration and Analysis of Data.New York,Duxbury Press,1997
    [40] M.Goebel and L.Gruenwald. A survey of data mining and knowledge discovery software tools.SIGKDD Explorations,1999,1:20-33
    [41] W.H.Inmon,J.A.Zachman and J.G.Geiger.Data Stores,Data Warehouse and the Zachman Framework.New York:McGraw-Hill,1997
    [42] 左建丽．货票信息综合应用系统前台开发研究与应用．铁路计算机应用．2002，11：21-23
    [43] 姜振春，李方，张玉福，左建丽．铁路货票信息综合应用的开发研究．曲凤山，王豹臣．2002年国际高新技术研讨会暨惠普机用户大会论文集，南京，2002．中国计量出版社．118-122
    [44] 李方，姜振春，张玉福，左建丽．铁路行业三级建库的集群环境测试．曲

    凤山，王豹臣．2002年国际高新技术研讨会暨惠普机用户大会论文集，南京，2002．中国计量出版社．123-128
    [45] 赵景林．数据仓库的体系结构与设计策略．计算机工程与设计．2001，22(3)：54-56
    [46] 方翔，李伟生．基于数据仓库多版本数据更新控制算法的研究．铁路计算机应用．2001，3：8-10
    [47] 林友芳，黄厚宽，田盛丰．铁路货运数据仓库多维视图的组织及其物化策略．铁道学报．2001，23(2)：8-12
    [48] 励晓健，林友芳，黄厚宽．数据仓库的视图动态物化调整策略．铁道学报．2001，23(4)：59-62
    [49] 倪志伟，蔡庆生，史东辉．神经网络专家系统及其设计挖掘技术的探讨．系统工程学报．2001，16(1)
    [50] 倪志伟，蔡庆生．用神经网络来进行数据库中的知识发现．系统仿真学报．2000，12(6)：685-687
    [51] 王春花，黄厚宽，田盛丰，王志海．从大型数据库中分布式挖掘多层关联规则的算法．铁道学报．2000，22(5)：47-50
    [52] 许向东，张全寿．数据仓库与数据发掘的应用．计算机系统应用．1998，4：53-55
    [53] 王珊等．数据仓库技术与联机分析处理．科学出版社，1998
    [54] Harjinder S．GILL．数据仓库-客户服务器计算指南．王仲谋，刘书丹．清华大学出版社，1997
    [55] 李子木，莫倩，周兴铭．数据库技术的研究现状及未来方向．计算机科学．1998，25(4)
    [56] Ryan K．Stephens，and Ronald R．plew．数据库设计．何洁，武欣，邓一凡．机械工业出版社，2001
    [57] 王珊，陈红，文继荣．数据库与数据库管理系统．电子工业出版社，1995
    [58] 王秀坤．数据库系统概论．大连理工大学出版社，1994
    [59] 杨娟，朱静，张思发．基于Windows编程开发VB中API函数的应用．交通与计算机．2000，18(6)
    [60] Microsoft Corporation．数据库创建、数据仓库与优化．郭东清等．清华大学出版社，2001
    [61] 林杰斌，刘明德，陈湘．数据挖掘与OLAP理论与实务．清华大学出版社，2003


    [62] 王兴晶，施波．Visual Basic 6.0 开发与实例．电子工业出版社，1999
    [63] 梁恩，Visual Basic6.0编程与实例解析．科学出版社，2000
    [64] Evangelos Petroutsos，Kevin Hough．Visual Basic 6.0高级开发指南．邱仲潘．电子工业出版社，1999
    [65] 吴湘洲，田盛丰．数据挖掘原型系统中分类挖掘模块设计与实现．铁路计算机应用．2002，2：10-12
    [66] 苟娟琼，关忠良．数据库体系化环境及若干问题研究．铁路计算机应用．2002，2：18-21
    [67] 李方．货票信息管理系统的研究．铁路计算机应用．2002，6
    [68] 货票课题组．货票信息综合应用系统技术．铁路计算机应用．2002，10
    [69] 周广声．信息系统工程原理、方法及应用．清华大学出版社，1998
    [70] 胡怀勇，胡勇军，龚维荣．Oracle数据库访问技术的探讨．现代计算机．2002，5：46-48
    [71] 邓晓蓓，陈有青．面向客户数据仓库的数据集成方法．现代计算机．2002，5：42-45
    [72] 旷平剑．数据采掘与知识发现综述．现代计算机．2002，6：13-16
    [73] 李云峰，陈建文，程代杰．关联规则挖掘的研究及对Apriori算法的改进．计算机工程与科学．2002，24(6)：65-67
    [74] 丁镜善．关系数据库嵌套查询的一种简化方法．计算机工程与科学．2002，24(3)
    [75] 李初民，吴中福，李华．一种基于网络管理数据库的多级关联挖掘算法．计算机工程与科学．2002，24(3)：58-61
    [76] 杨明，孙志辉．快速关联规则与更新算法．计算机科学．2002，29(8)
    [77] 张英朝，邓苏，张维明，刘青室．智能数据挖掘引擎的设计与实现．计算机科学．2002，29(10)：11-13
    [78] 王兴鹏，沙金．利用Apriori算法进行序列模式挖掘．现代计算机．2002，10：14-16
    [79] 娄兰芳，蒋志芳，王乐强．数据挖掘中关联规则的有趣性研究．现代计算机．2002，10：10-13
    [80] 陈练坚，姚赤丹．数据仓库的数据提取方法．现代计算机．2002，11
    [81] 李昶，余立人．数据库应用系统性能与数据查询优化．现代计算机．2002，3：14-17
    [82] 伍秀娟，叶鹰翔．VB中数据访问技术的应用．现代计算机．2002，9


    [83] 王聪华．ADO访问数据库实例剖析．计算机应用研究．2002，5：159-161
    [84] 刘同明．数据挖掘技术及其应用．国防工业出版社，2001
    [85] 肖忠祥．数据采集原理．西北工业大学出版社，2001
    [86] 吴乐南．数据压缩．电子工业出版社，2000
    [87] Joseph Giarratano，Gary Riley．专家系统原理与编程．印鉴，刘星成，汤庸．机械工业出版社，2000
    [88] 周志逵．数据库原理与技术．科学出版社，1994
    [89] Michael J．Corey，Michael Abbey，Ian Abramson，Ben Taub．Oracle 8数据仓库分析、构建实用指南．陈越，郭渊博，张红旗．机械工业出版社，2000
    [90] William A．Giovinazzo．面向对象数据仓库设计．潇湘工作室．人民邮电出版社，2000
    [91] Steve Bobrowski．Oracle 8体系结构．王焱，王磊，蒋蕊．机械工业出版社，2000
    [92] Michael Corey．Oracle8i数据仓库．施平安．机械工业出版社，2001
    [93] 黄寿根，肖海涛．Oracle数据库管理系统和SQL标准数据库语言．机械工业出版社，1993
    [94] 马垣．关系数据库理论．清华大学出版社，1999
    [95] 郑人杰．软件工程(高级)．清华大学出版社，1999
    [96] 陈金水，原小艳．利用VB开发决策支持程序．计算机工程与科学．2002，24(3)：53-55
    [97] 许舜渊．Visual Basic数据库程序设计——提高篇．人民邮电出版社，1997
    [98] 谢炎桦．Visual Basic&Access数据库管理系统构建实例．清华大学出版社，2001
    [99] 张炜．Visual Basic 6.0数据库开发应用教程．航空工业出版社，2000
    [100] 清汉计算机工作室．Visual Basic6.0数据库开发实例．机械工业出版社，2000
    [101] 清宏计算机工作室．Visual Basic编程技巧(网络与数据库篇)．机械工业出版社，2001
    [102] 陈文伟．决策支持系统及其开发．清华大学出版社，2002
    [103] BO技术文档．太平洋软件公司．1999

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700