数据挖掘在电子政务办公系统中的应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
电子政务近年来在中国有了很大发展,各级政府部门建立了大量的数据库,数据呈指数级增长。如何利用新的数据分析技术高效、准确地从电子政务系统中提取有用的信息成为了一个有现实意义的问题。
     本文将使用数据挖掘技术对市级电子政务办公系统iGRP进行数据分析,目的是通过对iGRP电子政务办公系统进行数据挖掘来发现影响用户活跃度的属性。
     在数据分析过程中首先根据分析目的选择合适的目标属性和预测属性;然后从iGRP数据库中抽取、集成、清洗所选择的目标属性和预测属性;接着对数值型预测属性进行噪声处理和离散化处理;接下来使用ODM(Oracle Data Mining)的“属性重要度”功能对目标属性及其相关预测属性进行属性重要度分析,将无关的预测属性排除,以达到减少数据维度的目的;之后,对目标属性及其相关的预测属性使用ODM的O-Cluster算法进行聚类分析,为数值型目标属性找到一个合适的分裂点,根据这个分裂点将目标属性转化为二元属性;最后,使用ODM的决策树算法对目标属性进行分类挖掘并进行测试评估。
     本文从某市iGRP电子政务系统5个数据库中抽取了7827条数据,包含30个预测属性和2个目标属性。按上述方法对该数据集进行数据挖掘后得出如下结论:对用户活跃度影响最大的属性是“收藏数量”,其次是“发文员”和“收文员”角色。根据此结论,应进一步了解用户对“收藏文件夹”这个功能模块的需求和使用反馈,以便改进提高此功能,为用户提供更好的服务。其次,在用户培训和用户反馈调查中应更加关注具有“发文员”和“收文员”角色的用户。
     本文使用数据挖掘技术对真实的电子政务系统数据进行了数据分析,实现了对海量数据的高效、准确分析,为改进iGRP产品及提高用户满意度提供了依据。
E-government in China has developed greatly in recent years. A large number of databases have been established in all levels of government departments.And data grows exponentially.It has become a problem of practical significane to efficiently and accurately extract useful information from E-government system with new data analysis techniques.
     IGRP, a kind of municipal E-government OA system will be analyzed with data mining techniques in this article.The target is to find the attributes that impact user activity by data mining in iGRP E-government OA system.
     Firstly,appropriate target attributes and predictor attributes are selected based on the target in the procedure of data analysis. Secondly, the selected target attributes and predictor attributes are extracted,integrated and cleaned from databases of iGRP system. Thirdly, numeric predictor attributes are noise processed and discretized. Fourthly, the target attributes and related predictor attributes are analyzed by "Attribute Importance" function of ODM (Oracle Data Mining).And unrelated attributes are excluded to reduce the data dimensions. Fifthly, the target attributes and related predictor attributes are analyzed by O-Cluster algorithm and an appropriate point is found to split the target attributes to binary attributes. At last, the target attributes are classified and estimated by decision-tree.
     7,827 cases, including 30 predictor attributes and 2 target attributes, are extracted from 5 databases of a municipal E-government OA system in this article. The result of data mining in the dataset is concluded as follows.The greatest impact on user activity attribute is "Total Favorites".The second is role attribute of "person responsible for sending official documents" and role "person responsible for receiving official documents".Based on the conclusion, user requirements and feedback of the "Favorite Folder" function module should be learned more in order to improve the function and provide better services for users. Secondly, users with role of "person responsible for sending official documents" or "person responsible for receiving official documents" should be paid more attention in user training and user feedback survey.
     In this article,data mining techniques are applied in the real data of E-government sysem. It is relized to efficiently and accurately analyz massive data. And the basis is provided to improve the product of iGRP and increase customer satisfaction.
引文
[1]杨凤春.什么是电子政务.决策咨询,2002年,第7期:48
    [2]陈爱菊.论电子政务中的数据挖掘.经济师,2008年,第1期:191
    [3]刘志荣.电子政务的数据挖掘研究.广东师范学院学报,2008年,第3期:8
    [4]李幸丽,杜培军,张华鹏.电子政务中的数据挖掘及其应用.科技资讯,2006年,第6期:158
    [5]白庆华.电子政务教程.上海:同济大学出版社,2009年
    [6]周立卓.中美电子政务战略研究综述.电子政务:2010年,01期:90
    [7]联合国和日本早稻田大学分别发布2010年电子政府相关评估报告.电子政务,2010年,2-3期:89
    [8]刘秋平,宋国梁.我国电子政务现状分析.中国管理信息化,2006年,第9卷第2期:68
    [9]韩家炜,Micheline Kamber.数据挖掘概念与技术,北京:机械工业出版社2007年
    [10]陈志泊.数据仓库与数据挖掘.北京:清华大学出版社,2009年
    [11]纪希禹.数据挖掘技术应用实例.北京:机械工业出版社,2009年
    [12]朱明.数据挖掘.合肥:中国科学技术大学出版社,2008年,第二版
    [13]王桂芹,黄道.数据挖掘技术综述.全国第18届计算机技术与应用学术会议(CACIS),2007年
    [14]廖芹,郝志峰,陈志宏.数据挖掘与数学建模.北京:国防工业出版社,2010年
    [15]邵峰晶,于忠清,王金龙,孙仁诚.数据挖掘原理与算法.北京:科学出版社,2009年,第二版
    [16]黄孜祺,肖健,陈海玲.数据挖掘技术的比较.广西计算机学会2008年年会
    [17]杨静,张楠男,李建,刘延明,梁美红.决策树算法的研究与应用.计算机技术与发展,2010年,第2期:114
    [18]谭俊璐,武建华.基于决策树规则的分类算法研究.计算机工程与设计, 2010年,第31卷第5期:1017
    [19]黄子诚.基于决策树的数据挖掘技术.电脑知识与技术.2010年,第6卷第8期:1949
    [20]尹晖.决策树分类算法的研究与应用:[硕士论文].兰州:兰州大学,2009年
    [21]R. Ng, and J. Han. Efficient and effective clustering method for spatial data mining. In Proc.1994 Int. Conf. on Very Large Data Bases (VLDB'94), 1994,144
    [22]T.Zhang, R. Ramakhrisnan, and M. Livny.BIRCH:An efficient data clustering method for very large databases. In Proc.1996 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'96),1996,103
    [23]S.Guha, R. Rastogi, and K. Shim. CURE:An efficient clustering algorithm for large databases.In Proc.1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'98),1998,73
    [24]M. Ester, H.-P.Kriegel,J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial database. In Proc.1996 Int. Conf. Knowledge Discovery and Data Mining (KDD'96),1996,226
    [25]W. Wang, J.Yang, M. Muntz. STING:A statistical information grid approach to spatial data mining.In Proc.1997 Int. Conf. on Very Large Data Bases (VLDB'97),1997,186
    [26]R. Agrawal, J. Gehrke, D.Gunopulos, and P.Raghavan. Automatic subspace clustering of high dimensional data for data mining applications.In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'98),1998, 94
    [27]C.C.Aggarwal, C.Procopiuc, J. L.Wolf, P.S.Yu, and J. S.Park. Fast algorithms for projected clustering. In Proc.1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'99),1999,61
    [28]杨启仁.数据挖掘中聚类算法的研究.牡丹江大学学报.2010年,第19卷 第6期:106
    [29]张强.聚类算法的维度分析:[硕士论文].天津:天津大学,2007年
    [30]Borianan L.Milenova, Marcos M. Campos.o-cluster:scalable clustering of large high dimensional data sets.second IEEE International Conference on Data Mining,2002,290
    [31]A. Hinneburg, and D.A. Keim. Optimal grid-clustering:Towards breaking the curse of dimensionality in high-dimensional clustering.In Proc.25th Int. Conf. on Very Large Data Bases (VLDB'99),1999,506

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700