数据挖掘技术在JSBAS系统构建和大客户分析中的应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
国内电信行业正在逐渐走向开放和自由化,市场竞争日趋激烈,如何应用先进技术来改进企业的经营管理,提供高质量的服务,从而吸引和留住客户、扩大市场份额、降低成本、提高收益,已经成为电信业决策者们共同关注的课题。
     本文首先介绍了商业智能系统及其基本概念,然后结合中国移动建设中的商业智能系统——江苏移动经营分析系统(JSBAS),详细介绍了该系统的构建以及数据仓库技术、OLAP技术和数据挖掘技术在其中的应用。文章重点研究了在电信行业中运用数据挖掘技术处理业务数据,发现内在规律的一般方法,并以大客户分析专题为例,给出了一套遵循CRISP-DM方法论的数据挖掘应用开发流程。文章同时分析了现有的神经网络、决策树和Logistic回归等数据挖掘算法,详细讨论了算法在实践中的应用、改进和参数的调整优化,并使几种算法相互协作以提高模型的性能。最后,对于数据挖掘中常见的偏态数据,本文给出了使用过采样技术和错误加权技术的处理方法。实践表明,本数据挖掘子系统可以很好的满足移动公司的需要,并已投入商业运营。
With the gradual opening of domestic telecom market and the continuous evolution of telecom services, the competitions of market become more intense. Increasing competitive pressures require every telecom organization to develop new and innovative ways to satisfy the increasing demands of its customers and improve the management of the organization.
    Based on the research of a real telecom business intelligence system- "JiangSu Business Analysis System", the thesis first introduces some basic aspects of business intelligence system and its relational technology. In succession, the author elaborates the development process of the data mining subsystem which can greatly enhance the management of VIP client. This thesis also gives an in-depth analysis of some data mining algorithms including neural network, decision tree and regression. Finally, a research is performed on the advantage of combining these technologies in our real project and the technologies of over sample and misclassification cost are used to improve the performance of the predicting models. The evaluation with actual data demonstrates that the data mining subsystem can satisfy the requirement of commercial application well.
引文
[1] J. Han, M. Kamber著,范明,孟晓峰等译,《数据挖掘:概念与技术》,北京:机械工业出版社,2001.
    [2] W. H.Inmon著,王志海,林友芳等译,《数据仓库》,北京:机械工业出版社,2000.
    [3] 段云峰,吴唯宁等编著,《数据仓库及其在电信领域中的应用》,北京:电子工业出版社,2003.
    [4] 边肇祺,张学工等编著,《模式识别》,北京:清华大学出版社,2001.
    [5] 中国电信大客户管理系统需求说明书,中国电信,北京:2002.
    [6] Clementine Users Guide, SPSS, 2003.
    [7] 胡侃,夏绍纬,基于大型数据仓库的数据采掘,软件学报,9(1):53-63,1998。
    [8] 张范明,刘威威,数据仓库技术在移动通信领域的应用探讨,电信技术,8:29-31,2001.
    [9] 景奉贤,陈宏立 主编,《经济预测与决策方法》,暨南大学出版社,2001.
    [10] Pete Chapman, Julian Clinton, CRISP-DM Manual 1.0, SPSS, 2000.
    [11] IM Applications Guide, IBM, 2001.
    [12] J. R. Quinlan, "C5", http:/rulequest.com, 1997.
    [13] 刘小虎,李生,“决策树的优化算法”,软件学报,9(10):797-800,1998。
    [14] 李艳,商业智能的支撑技术,计算机世界,2003.
    [15] D. E. Rumelhart, G.E. Hinton and R. J. Williams, "Learning representations by back-propagating errors", Nature, 323(9): 533-536, 1986.
    [16] R. Reed, "Pruning algorithms-A survey", IEEE Trans Neural Networks, 4(5):740-747, 1993.
    [17] 薛家祥,黄石生,B P神经网络优化训练技术的研究,华南理工大学学报(自然科学版),26(7):21-24,1998.
    [18] 申东日,冯少辉,陈义俊,B P网络改进方法概述,计算机应用,化工自动化及仪表,27(1):30-32,2000.
    [19] M. T. Hagan, H. B. Demuth, M. H. Beale著,戴葵等译,《神经网络设计》,北京:机械工业出版社,2002.
    [20] Y. Hirose, "Back propagation algorithm which varies the number of hidden units", Neural Networks, 4:61-66, 1991.
    [21] V. Phansalker, P. Sastry, "Analysis of the back propagation algorithm with momentum", IEEE Trans on Neural Networks, 5(3):505-506, 1994.
    [22] 邓志东,吴增沂,利用线性再励的自适应变步长快速BP算法,模式识别与人工智能,6(2):320-323,1993.
    
    
    [23] J. R. Quinlan, "Bagging, boosting, and C4.5", In Proceedings of the 13th National Conference Artificial Intelligence, Portland, Ore, 725-730, 1996.
    [24] H. J. Lu, R. Setiono, H. Liu, "Effective data mining using neural networks ", IEEE Transactions on Knowledge and Data Engineering, 8(6):957-961, 1996.
    [25] J.R. Quinlan and R.L.Rivest, "Inferring Decision Trees Using the Minimum Description Length Principle", Information and Computation, 80:227-248, 1989.
    [26] J.R. Quinlan, "Induction of Decision Trees", Machine Learning, (1):81-106,1986.
    [27] J.R. Quinlan, C4.5:Programs or machine learning, Morgan aufmann Publishers, 1993.
    [28] K. M. Ting, "Inducing Cost-sensitive Trees Via Instance Weighting", In Proceedings of The Second European Symposium on Principles of Data Mining and Knowledge Discovery, LNAI-1510, 139-147, 1998.
    [29] 沈学华,周志华,吴建鑫,陈兆乾,Boosting和Bagging综述,计算机工程与应用12:31-33,2000.
    [30] Y. Freund, "Boosting a Weak Algorithm by Majority", Information and Computation, 121 (2):256-285, 1995.
    [31] Y. Freund, R. E. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting", Journal of Computer and System Sciences, 55(1):119-139, 1997.
    [32] L. Breiman, "Bagging Predictors", Machine learning, 24(2):123-140, 1996.
    [33] 侯锡铭,王伟杰,李飒,多元统计中Logistic回归模型参数估计的一种方法及应用,生物数学学报,9(4):219-224,1994.
    [34] 骆克任,《社会经济定量研究与SPSS和SAS的应用》,电子工业出版社,2002.
    [35] 赵宁,Logistic回归分析中的一些应用问题,山西医学院学报25(1):26-28,1994.
    [36] IBM中国公司软件部,IBM数据挖掘红皮书,IBM,2002.
    [37] M. A. Maloof, "Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown", Georgetown University, Washington, 2003.
    [38] J. P. Bradford, C. Kunz, R. Kohavi, C. Brunk and C.E. Brodley, "Pruning Decision Trees with Misclassification Costs ", In Proceedings of 10th European Conf. Machine Learning, 131-136, 1998.
    [39] K.M. Ting and Z. Zheng, "Boosting Cost-Sensitive Trees ", In Proceedings of First International Conference Discovery Science, 244-255, 1998.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700