基于支持向量机的移动通信业客户流失预测研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
我国移动通信市场在经历了客户数量高速膨胀的阶段之后,目前新客户增加放缓,受到各种促销方案的影响,加上简单的入网手续,顾客的随机性大幅度上升,各移动运营商均面临客户大量流失(churn)问题。因此,预测出有流失倾向的客户已成为移动通信企业的迫切需要。本文主要研究支持向量机(Support Vector Machine SVM)算法在移动通信业客户流失预测中的应用。
     本文针对移动客户的多类别特点以及不同类别客户的数据集分布不平衡,给出了DAG-SVM+CW-SVM算法,建立了移动客户流失预测模型。分别用一对多( 1-V-R SVM )、一对一( 1-V-1 SVM )、有向树( DAG-SVM )与DAG-SVM+CW-SVM算法对移动客户进行流失预测,并用接受者操作特性曲线(Receiver Operating Characteristic Curve简称ROC曲线)、ROC曲线下的面积(the Area under the Receiver Operating Characteristic Curve简称AUC值)、提升度(lift)对四种算法进行评价对比。实验表明DAG-SVM+CW-SVM算法不仅能够处理数据多类别分类问题而且能够有效的解决数据集不平衡对预测结果造成的影响,具有较好的预测效果。
The mobile communication market of our country had experienced rapid expansion, now new customers increase slowly, by the impact of various marketing programs, coupled with simple access procedures, the randomness of customers is substantial increase, and the mobile operators are facing on the huge problem of customers churn. Therefore, predict churn has become an urgently needed of mobile communication enterprises. On this need the investigation of predictive system of China Mobile churn based on Data Mining introduced in this paper.
     The main research of this paper is application of Support Vector Machine (SVM) algorithm in the mobile communications enterprises churn prediction. According to the churn data which is imbalance and have three classes, the class weighted SVM model CW-SVM is applied, an improved SVM- DAG-SVM+CW-SVM was presented to predict churner, which was better arithmetic performance than 1-V-R SVM , 1-V-1 SVM , DAG-SVM. It was demonstrated that this algorithm was suitable for solving multi-class data and imbalanced data with higher precision.
引文
[1] Frederick E.Reichheld. Harvard Business Review, 1996.
    [2] U Fayyad, G Piatetsky, Shapiro and P Smyth. Knowledge Discovery and Data Mining: Towards a Unifying Framework. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (XDD-96).Portland,oregon, August(2-4),1996,AAAIPress: 82-88.
    [3]杜奖胜,胥芳,张其前等.基于分类回归树(CART)方法的统计解析模型的应用与研究[J].浙江工业大学学报,2002,17.
    [4] Alex Berson, Stephen Smith, Kurt Thearling著.贺奇,郑岩等译.构件面向CRM的数据挖掘应用[M].北京:人民邮电出版社, 2001.
    [5] http://businessintelligence.ittoolbox.com/topics/t.asp.
    [6]夏国恩,陈云.电信业客户流失战略管理模型[J].科技管理研究,2006, 26(12): 117-120
    [7] Lu J. Predicting customer churn in the telecommunications industry: An application of survival analysis modeling using SAS [C]. SAS Group International 27th Annual Conference, 2000: 114-1221.
    [8] Gupta S, Kamakura W, Lu J, Mason C, et al. CRM Presentations[c]. Informs Marketing Science Conference, Maryland, June 2003.
    [9] Scott c N, Golovnya M, Steinberg D. Churn modeling for mobile telecommunications[c].Informs Marketing Science Conference,Maryland, June 2003.
    [10] Mozer M C, Wolniewicz R, Grimes D B, et al. Churn reduction in the wireless industry[J]. Advances in Neural Information Processing Systems, 2000(12): 935-941.
    [11] Lemmens A, Croux C. Bagging and Boosting Classification Trees to predict churn[R], DTEW Research Report 0361, 2003: 40.
    [12] Vapnik V, The nature of statistical learning theory. Springer -Verlag, NewYork.N Y, 1995,张学工译,统计学习理论的本质[M].北京:清华大学出版社,2000.
    [13] Vapnik V,Statistical learning theory[M]. New York: John Wiley&Sons, 1998.
    [14]边肇棋,张学工.模式识别[M].北京:清华大学出版社,2001年5月.
    [15] VladimirN .Vapnik著.许建华,张学工译.统计学习理论[M].北京:电子工业出版社,2004年6月.
    [16]范听炜.支持向量机算法的研究及其应用[D].浙江大学博士论.2003年5月.
    [17] Cherkassky V, Shao X, Mulier F M, Vapnik V N. Model complexity control for regression using VC generalization bounds [J]. IEEE Transactions on Neural Networks, 1999, 10(5): 1075-1089.
    [18] Vapnik V.N., LemerA. Patern recognition using generalized portrait method, Automation and Remote Control, 1963: 24.
    [19] Vapnik V. An overview of statistical learning theory. IEEE T Neural Networks, 1999, 10(5): 988-999.
    [20] Vapnik V and Chervoknenkis A. On the uniform convergence of relative frequencies of events to their probabilities [J]. Theory of Probability and its Application, 1971, 16(2): 263-280.
    [21] Burges C.J.C. A tutorial on support vector machines for pattern recognition, Knowledge Discovery and Data Mining, 1998, 2(2): 121-167.
    [22] Smola A.J,Regression estimation with support vector learning machines,Technische Universitat Mdnchen, 1996.
    [23]钱颂迪.运筹学[M],修订版.北京:清华大学出版社,1990: 174-177.
    [24] Corinna Cortes, V.Vapnik. Support Vector Network. Machine Learning, 1995,20: 273-297.
    [25]陈宝林.最优化理论与算法.北京:清华大学出版社,1989.
    [26] J.C.Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Bell Laboratories, Lucent Technologies. 1997.
    [27] Chang Chih-Chung, Lin Chih-Jen. Training v-support vector classifiers: theory and algorithms[J], Neural Computation, 2001, 13(9): 2119-2147.
    [28] Suykens J. and Vandewalle J., Least square support vector machine classifiers. Neural Processing Leters, 1999, 9(3): 293-300.
    [29]刘志刚,等.支持向量机在多类分类问题中的推广[J].计算机工程与应用,2004,(7): 10-13.
    [30]赵晶,等.基于支持向量机的多类形状识别系统[J].合肥工业大学学报,2004,27(1): 23-26.
    [31]萧嵘,等.一种具有容噪性能的SVM多值分类器[J].计算机研究与发展,2000,37(9): 1071-1075.
    [32] J . Weston and C. Watkins. Support vector machines for multi- class pattern recognition[D] . In Proceedings of 7th European Symposium on Artificial Neural Networks (ESANN’99):219-224.
    [33] HSU C W, LIN C J. A Comparison of Methods for Multi-class Support Vector Machines [J]. IEEE Transaction on Neural Network, 2002, 13(2): 415-425.
    [34] Lin Chun Fu, Wang Sheng De. Fuzzy support vector machines[J].IEEE Transactions on Neural Networks,2002,13(2): 46 4-471.
    [35]汪应洛等.系统工程(第2版)[M].北京:机械工程出版社,1999: 65-70.
    [36] Jiawei Han, Micheline Kamber著.范明,孟小峰等译.数据挖掘概念与技术[M].机械工业出版社,2001年8月.
    [37] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector machines.http://www.csie.ntu.edu.tw/ ~cjlin/ libsvm,2001.
    [38]韩萌等,基于交叉验证的BP算法的改进与实现[J].计算机工程与设计, 2008,29(14): 3738-3742.
    [39]官秀军,史忠植.基于Bayes潜在语义模型的半监督Web挖掘.软件学报,2002,13(8): 1508-1514.
    [40] Diao Lili, Hu Keyun, Lu Yuchang, et al. Improved Stumps Combined by Boosting for Text Categorization[J].软件学报,2002, 13(8).
    [41]应维云,覃正等.SVM方法及其在客户流失预测中的应用研究[J].系统工程理论与实践,2007,(7): 105-110.
    [42]林海明,张文霖.主成分分析与因子分析详细的异同和SPSS软件[J].统计研究,2005. .

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700