固网运营商客户流失预警模型研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着移动业务分流的加重、固网市场竞争的加剧,国内的固网运营商面临着巨大的挑战,客户流失现象日益严重。老客户流失带来的损失以及获取新客户的困难使得固网运营商意识到实施客户流失预警以及客户挽留的重要性。本论文针对固网运营商对客户流失预警的迫切需求以及国内相关研究和应用较少的现象,展开固网运营商客户流失预警模型的研究。
     本文应用CRISP-DM数据挖掘过程方法论,结合固网运营商的业务特点,详细阐述了建立固网运营商客户流失预警模型的各个步骤:商业理解、数据理解、数据准备、建模和模型评价。同时,在总结固网运营商客户流失预警数据特点的基础上,指出了固网运营商客户流失预警的关键问题。
     特征变量的构造和特征变量的选取对客户流失预警模型的学习效率以及最终模型的准确性和稳定性有很大影响。在分析和比较了众多变量关系分析理论的基础上,本文引入受试者操作特征曲线(ROC曲线)和信息论中的互信息量的概念来建立特征变量选取机制及具体方法:删除无分类预测能力的特征变量(ROC曲线的AUC小于等于0.5的变量),对于高相关的特征变量,优先保留高分类预测能力的特征变量,删除低分类预测能力的冗余变量。
     建模方法是预测结果是否有效的关键。本文在创新模型TreeLogit的基础上提出了mSTree-Logistic模型。该模型通过对使用多个样本集分别训练出的多棵决策树预测函数进行逻辑回归来得到最终的预测函数。
     本文对某固网运营商一市级分公司的客户数据进行上述方法的实证应用。应用结果证明了上述方法的可行性和有效性。
With the growing substitution by mobile communication services and increasing competition in the fixed-line market, domestic fixed-line operators are facing great challenges. The increasing loss of subscribers is one of the biggest challenges. The huge loss caused by the switch of subscribers and the great difficulty of winning new ones make the fixed-line operators realize the importance of subscriber churn prediction and subscriber retention. In response to the fixed-line operators' strong desires for churn prediction and the lack of researches and practices in the fixed-line market, this thesis studies how to apply data mining theories and technologies to churn prediction of fixed-line subscriber.
     Applying the CRISP-DM methodology, and combining it with the understandings of fixed-line business, this thesis elaborates the steps of building churn prediction model for fixed-line subscriber, including business understanding, data understanding, data preparation, modeling and evaluation. This thesis also points out the key issues of churn prediction of fixed-line subscriber, after summarizing the problems of the available data for churn prediction of fixed-line subscriber.
     The construction and selection of characteristics has great impact on the learning efficiency, accuracy and stability of final models. After analyzing various variable correlation theories, the thesis introduces ROC curves and mutual information theories to work out the method for characteristic selection. In the method, ROC curves are firstly applied to detect and deselect ineffective characteristics. Subsequently, mutual information is used to detect strongly correlated characteristics, among which characteristics with superior predictive performance are kept.
     Modeling method is the key to the effectiveness of prediction results. This thesis proposes mSTree-Logistic model, being inspired by TreeLogit model. In this model, a logistic regression function is induced from multiple decision trees, which are built based on different training sample sets respectively.
     A practice of churn prediction is conducted in a filiale of a fixed-line operator. The theories and methods proposed in this thesis are proved to be feasible and effective.
引文
[1]Thomas O.Jones,W.Earl Sasser Jr.,Why satisfied Customers Defect,Harvard Business Revtew,73(6),88-99(1999).
    [2]舒华英、齐佳音,电信客户全生命周期,北京邮电学院出版社,2004
    [3]Pete Chapman,Julian Clinton,Randy Kerber,Thomas Khabaza,Tomas Reinartz,Colin Shearer,and Rudiger Wirth,CRISP-DM 1.0,SPSS.
    [4]Jiayin Qi,Yangrning Zhang,Yingying Zhang,and Shuang Shi,TreeLogit Model for Customer Churn Prediction,2006 IEEE Asia-Pacific Conference on Services Computing,70 -75(2006).
    [5]REICHHELD Frederick F.,SASSER Earl W.,Zero Defections:Quality Comes to Services.Harvard Business Review.105-111(1990).
    [6]Michael J.A.Berry,Gordon S.Linoff著,别荣芳、尹静、邓六爱译,数据挖掘技术——市场营销、销售与客户关系管理领域应用,机械工业出版社,2006
    [7]REICHHELD Frederick F.,The Loyalty Effect-the Relationship Between Loyalty and Profits,European Business Journal,12(3),173-179(2000).
    [8]Saharon Rosset,Einat Neumann,Uri Eick,and Nurit Vatnik,Customer lifetime value modeling and its use for customer retention planning,KDD-2002.
    [9]李怀祖、韩新民、齐佳音、万映红,客户关系管理理论与方法,中国水利水电出版社,知识产权出版社,2006
    [10]R.格罗思著,侯迪、宋擒豹译,数据挖掘——构筑企业竞争优势,西安交通大学出版社,2001
    [11]贝里、利诺夫著,袁卫等译,数据挖掘——客户关系管理的科学与艺术,中国财经出版社,2004
    [12]Dorian Pyle,业务建模与数据挖掘,机械出版社,2005
    [13]Nong Ye,The Handbook of Data Mining,Lawrence Erlbaum,425-432(2003).
    [14]Jiawei Han&Micheline Kamber Data Mining-Concepts and Techniques Academic Press,2000
    [15]C.C.Aggarwal,P.S.Yu.Data mining techniques for associations,clustering and classification.Proceedings of 3rd Pacific-Asia Conference on Knowledge Discovery in Database,Beilin,13-23,1999.
    [16]WJ Frawely,G.Piatesky-Shapiro,CJ Matheus,Knowledge Discovery in Databases:An Overview,in:Knowledge Disrovery in Databases,edited by G.Piatesky-Shapiro & W.J.Frawley(AAAI/MIT Press,Cambridge,1991),pp.1-27.
    [17]Dnham,M.H.著,郭凤占、勒晓明等译,数据挖掘教程——世界著名计算机教程精选,清华大学出版社,2005
    [18]Jiawei Han,Macheline Kamber著,范明、孟小峰等译,数据挖据概念与技术,.机械工业出版社,2001
    [19]Pang-Ning Tan,Michael Steinbach,and Vipin Kumar著,范明、范宏建等译,数据挖掘导论,人民邮电出版社,2006
    [20]陈文伟、黄金才、赵新昱,数据挖掘技术,北京工业大学出版社,2002
    [21]路红梅,基于决策树的经典算法综述,宿州学院学报,22(2),91-95(2007)
    [22]E.B.Hunt,J.Marin,and P.Stone,Experiments in Induction,Academic Press,New York,US,1966.
    [23]Quinlan J R.,Induction of Decision Trees,Machine Learning,1-356(1986).
    [24]Quinlan J R.,C4.5 Programs for Machine Learning,Morgan Kauffman,81-106(1993).
    [25]Breiman L.,Friedman J.H.,Olshen R.A.,and Stone C.J.,Cliassification and Regression Trees,Wdsworkds,Belmont,CA,1984.
    [26]Rastogi R.,and K.Shim,Public:A decision tree classifier that integrates building and pruning,Data Mining and Knowledge Discovery,4(4),315-344(2000).
    [27]Mehta M.,Agrawal R.,Rissanen,J.,SLIQ:A fast Scalable Classifier for Data Mining,Lecture Notes in Computer Science,1057,18(1996).
    [28]John Shafer,Rakeeh Agrawal,and Manish Mehta,SPRINT:A scalable parallel classifier for data mining,22~(nd)VLDB Conference,544-555(1996).
    [29]任康、李刚,Logistic回归模型在判别分析中的应用,统计与信息论坛,22(6),71-73(2007)
    [30]SAS Institute Inc.,SAS Institute White Paper,From Data to Business Advantage:Data Mining,The SEMMA Methodology and the SAS(?)System,Cary,NC:SAS Institute Inc.1998
    [31]SAS Institute,North Carolina(2000);http://www.sas.com.
    [32]The CRISP-DM Consortium(August,2000).http://www.crisp-dm.org.
    [33]Chih-Ping Wei,I-Tang Chiu,Turning Telecommunication Call Details to Churn Prediction:A Data Mining Approach,Expert Systems with Applications,3(2),103-112(2002).
    [34]Zhao Wei,He Jianmin,Wang Chunlin,and Chen Jinbo,Application of a Cost-Sensitive Method for Churn Prediction in Telecommunication Industry,Journal of Southest University,23(1),135-138(2007).
    [35]钱苏丽、何建敏、王纯麟,基于改进支持向量机的电信客户流失预警模型,管理科学,20(1),54-58(2007).
    [36]Chan P.K.,Fan W.,Prodromidis,A.L.,and Stolfo,S.J.,Distributed data mining in credit card fraud detection,Intelligent Systems and Their Applications,IEEE,14(6),67-74(1999).
    [37]Meyer-Baese Anke,Watzel Rolf,Transformation Radial Basis Neural Network for Relevant Feature Selection,Pattern Recognition Letters,19(14),1301 - 1306(1998).
    [38]Datta,P.,Masand,B.,Mani,D.R.,and Li,B,Automated Cellular Modeling and Prediction on a Large Scale,Artificial Intelligence Review,14(6),p 485-502(2000).
    [39]王纯麟,何建敏,基于AdaBoost的电信客户流失预测模型,价值工程,2,106-109(2002).
    [40]Lian Yan,Wolniewicz,R.H.,Dodier,R.,Predicting Customer Behavior in Telecommunications,Intelligent Systems,IEEE,19(2),50-58(2004).
    [41]Lanshen Guo,Minglu Zhang,Lixin Sun and Zhong Wang,Chum Analysis Model of Securities Business Based on the Decision Tree,Proceedings of the 6~(th)World Congress on Intelligent Control and Automation,June 21-23,2006.
    [42]Shin-Yuan Hung,David C.Yen,Hsiu-Yu Wang,Applying data mining to telecom chum management,Expert system with applications,515-524,2006(31),
    [43]Zan Mo,Shan Zhao,Li Li,Ai-jun Liu,A predictive model of chum in Telecommunications based on Data Mining,2007 IEEE International Conference on Control and Automation,May 30- June 1,2007.
    [44]郭玉滨 一种基于离散度的决策树改进算法,现代电子技术,12,106-108(2006).
    [45]Bin Luo,Peiji Shao,and Juan Liu,Customer Chum Prediction Based on the Decision Tree in Personal Handyphone System Service,Service Systems and Service Management,2007 International Conference on,1-4,June 9-11,2007.
    [46]Bin,Luo;Peiji,Shao and Duyu,Liu,Evaluation of Three Discrete Methods on Customer Churn Model Based on Neural Network and Decision Tree in PHSS,Data,Privacy,and E-Commerce,2007.
    [47]Ng,K.,Liu,H.,Artificial Intelligence Review,14(6),569-590(2000).
    [48]Au W.,Chan C.C.,Yao X.,A Novel Evolutionary Data Mining Algorithm with Applications to Churn Prediction.IEEE Transactions on Evolutionary,Computation 2003,7:532-45(2003).
    [49]Mozer,M.C.,Wolniewicz,R.,Grimes,D.B.,Johnson,E.,and Kaushansky,H.,Predicting Subscriber Dissatisfaction and Improving Retention in the Wireless Telecommunications Industry,IEEE Transactions on Neural Networks,11(3),690-696(2000).
    [50]Hwang H,Euiiho Suh T.,An LTV Model and Customer Segmentation Based on Customer Value:a Case Study on the Wireless Telecommunications Industry,Expert Systems with Applications,26,181-188(2004).
    [51]夏国恩,基于核主成分分特征提取的客户流失预测,计算机应用,28(1),149-151(2008).
    [52]Xu,E.,Liangshan Shao,Xuedong Gao,and Baofeng Zhai,An Algorithm for Predicting Customer Chum via BP Neural Network Based on Rough Set,Services Computing,2006 IEEE Asia-Pacific Conference on Service Computing.
    [53]刁洪祥、刘伟铭,基于BP神经网络的ETC系统客户的流失分析研究,企业技术开发,25(9),34-36(2006).
    [54]Wai-Ho Au,Keith CC Chan,Xin Yao,A Novel Evolutionary Data Mining Algorithm with Applications to Chum Prediction,Evolutionary Computation,IEEE,7(6),532 - 545(2003).
    [55]Archaux,C.,Martin,A.,and Khenchaf,A.,An SVM Based Chum Detector in Prepaid Mobile Telephony,Information and Communication Technologies:From Theory to Applications,2004.
    [56]Kim Sun,Shin Kyung-Shik,and Park Kyungdo,Lecture Notes in Computer Science,Advances in Natural Computation:First International Conference,ICNC 2005.3611(Part Ⅱ),636-647(2005).
    [57]Kristof Coussement and Dirk Van den Poel,Churn prediction in subscription in subscription services:An application of support vector machines while compariing two parameter-selection techniques,Expert System with Applications(2006),doi:10.1016/j.eswa.2006.09.038.
    [58]应维云、谭正、赵宇、李兵、李秀,SVM方法及其在客户流失预测中的应用研究,系统工程理论与实践,7,110-105(2007).
    [59]LU J.,Predicting Customer Chum in Thetelecommunications Industry:An application of Survival Analysis Modeling Using SAS,SAS Group International 27th Annual Conference,114-122(2000).
    [60]John Hadden,Tiwari Ashutosh,Rajkumar Roy,and Dymitr Ruta,Computer Assisted Customer Chum Management:State-of-the-art and Future Trends,Computers and Operations Research,34(10),2902-2917(2007).
    [61]王雷、陈松林、顾学道,客户流失倾警模型及其在电信企业的应用,电信科学,09,47-51(2006)
    [62]孙长亮、何峻、肖怀铁,基于ROC曲线的目标识别性能评估方法.,雷达科学与技术,5(1),17-21(2007)
    [63]陈志杰、冯德军、王雪松,基于ROC曲线的弹道目标识别评估及优化,系统仿真学报,19(17),4028-4054(2007)
    [64]涂福泉、陈奎生、陈建勋、骆名剑,ROC分析技术的研究现状和发展趋势,计算机与数字工程,35(3),33-38(2007)
    [65]张晓龙、江川、骆名剑,ROC分析技术在机器学习中应用,计算机工程与应用,43(4),243-248(2007)
    [66]丁震,电信企业信息化建设面临挑战及发展探讨,支点网,http://www.topoint.com.cn,2006-8-18
    [67]Yingying Zhang,Jiayin Qi,Huaying Shu,Yuanquan Li,Case Study on CRM:Detecting Likely Churners with Limited Information of Fixed-line Subscriber,2006 IEEE International Conference on Service Systems and Service Management,Vol2,1495-1500(2006).
    [68]盛骤、谢式千、潘承毅,概率论与数理统计,高等教育出版社,2001
    [69]黑斯蒂等著,范明等译,统计学习基础,电子工业出版社,2004
    [70]李辉、戴旭初、葛洪魁、王宝善、林建民、陈颙,基于互信息量的地震信号检测和初至提取方法,地球物理学报,50(4),1190-1197(2007)
    [71]薛素静、上官同英、孙江山,决策树技术在电信行业客户流失分析中的应用,现代生产与管理技术,22(2),32-34(2005)
    [72]蒋正君,南京移动客户流失分析,江苏通信技术,21(6),29-31(2005)

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700