基于数据挖掘的客户流失预测研究

英文题名：The Research of Customer Churn Prediction Based on Data Mining Methods
作者：王芳
论文级别：硕士
学科专业名称：计算机应用
中文关键词：数据挖掘 ; 粗集 ; 客户关系管理 ; 客户描述 ; 属性约简 ; 关联规则挖掘 ; 序列模式发现 ; 客户流失模型
英文关键词：Data Mining ; Rough Set ; Customer Relationship ; Management Customer Profile ; Attribute Reduction ; Association ; Rule Mining ; Sequence Pattern Discovery ; Customer Churn Model
学位年度：2003
导师：邱玉辉
学科代码：081203
学位授予单位：西南师范大学
论文提交日期：2003-04-01

摘要

经济的全球化导致行业的市场竞争日益激烈，信息时代的企业必须利用大量数据中隐含的知识才能抓住时机，提升核心竞争力。
     客户是企业至关重要的成功因素和利润来源。将数据挖掘技术应用于客户关系管理，能够为企业提供经营和决策的量化依据，使企业更有效地利用有限的资源，拓展利润上升空间。
     客户流失预测和控制是当今所有企业面临的一大难题。大量而频繁的客户流失延长了企业利润回收的周期，给企业造成了巨大的损失。目前国内外对流失控制的研究一般是采取提供个性化服务、进行客户满意度和客户忠诚度分析的方法，这些方法的有效性很难验证，而且不能从根本上解决问题。
     本文将多种数据分析技术应用于客户流失研究，针对目前相关研究中存在的问题，给出了客户流失研究中涉及的主要问题的解决方案，包括客户描述、属性约简、流失模型发现、流失原因分析以及流失预测与控制策略等，重点解决其中流失模型的建立问题。
     客户流失模型是通过对流失客户的数据进行分析后得出的，包括基本模型和行为模型。对客户的基本数据实施关联规则挖掘，可以发现描述流失客户基本特征的关键属性集合。论文中采取的是一种能自动调节最小支持度的、受限的关联规则挖掘算法CAARM，该算法是在前人研究的基础上，作了一些调整和改善后得到的。客户流失的行为模型采用序列模式发现方法，识别出流失客户的典型行为序列，用作流失趋势的预测。
     论文对客户价值分析也作了初步的探讨，认为应将客户流失预测群体中价值较高的子群体作为市场策略的目标群体，并结合消费者心理学的有关知识对客户流失原因进行了简单的分析。
     最后给出了部分关键算法的详细描述和分析。
With the increasingly keen industry competition caused by the globalization of economy, enterprises in Information Age are compelled to capture opportunities and build up their core competition ability by utilizing knowledge concealed in large amount of data.
    For most enterprises, customers are the key success factor and the most important source of profit. Customer Relationship Management (CRM) based on data mining techniques provides a quantitative criterion in business management and decision-making. CRM helps enterprises utilize their limited resources more effectively so as to broaden their profit development space.
    It is a tough problem for all enterprises to predict and control customer churn. They have suffered heavy losses caused by the frequently occurred customer churn that prolongs their cost recover cycle. Research routine of decreasing customer churn is to provide customized service, or analyze customers' satisfaction and loyalty. The effectiveness of these methods is hard to be verified. Furthermore, they could not solve the problem essentially.
    In this thesis, data analysis techniques are merged into the research of customer churn. Solutions of existing problems involved are proposed in detail, including customer profiling, attribute reducing, customer churn model building, analysis of churn causation, churn prediction and controlling strategies. Among those we focus on the customer churn model building problem.
    Customer churn model, consists of basic model and behavioral model, is built based on analysis of the churned customers' basic data and behavioral history. The set




    of key attributes that describe churned customers' basic character are founded by applying association rule mining to their basic profiling data. After adjusting and revising algorithms proposed by precedent researchers, we develop a Constrained Adaptive Association Rule Mining (CAARM) algorithm that can find rule with specific head, and it does not require a given minimum support. Sequence pattern discovery method is used to build behavioral churn model, by which we can distinguish the typical behavioral sequences of the churned customers and predict present customers' churn tendency.
    Based on the superficially discussed issue of customer lifetime value, we suggest that customer subgroups with higher lifetime value should be selected as target of specific market strategy.
    Finally, there are central algorithms described in detail and the analysis results of them as well.

引文

[1] Z. Pawlak. Rough sets. International Journal of Information and Computer Science, Vol. 11, No.5, 1982: 341～356.
    [2] Z. Pawlak. Rough Sets: Theoretical Aspects of Reasoning about Data. Dordrecht, Boston, London: Kluwer Academic Publishers, 1991.
    [3] H. S. Nguyen. Discretization Problems for Rough Set Methods. Proceedings of the 1st International Conference on Rough Sets & Current Trend in Computing, Warsaw, Poland, June 1998: 545-552.
    [4] 尹旭日，周志华，何佳洲等．一种基于Rough集理论的数据过滤方法．计算机研究与发展，Vol．37，No．9，2000：1082～1086．
    [5] U. M. Fayyad, G. Piatetsky-Shapiro et al. From data mining to knowledge discovery in databases. AI Magazine, Vol. 17, No.3, 1996:37～54.
    [6] 史忠植．高级人工智能．北京：科学出版社，1998．
    [7] C. C. Aggarwal, P. S. Yu. Data mining techniques for associations, clustering and classification. Proceedings of 3rd Pacific-Asia Conference on Knowledge Discovery in Database, Beilin, 1999: 13-23.
    [8] 周育健，王珏．RSL：基于Rough Set的表示语言．软件学报，Vol．8，No．8，Aug．1997：569～576．
    [9] 郝比格著，芮建伟译．跨文化市场营销．北京：机械工业出版社，2000．
    [10] R. Agrawal, H. Mannila, R. Srikant et al. Fast discovery of association rules. Advances in Knowledge Discovery and Data Mining, AAAI Press/The MIT Press, 1996: 307～328.
    [11] R. Srikant, R. Agrawal. Mining quantitative association rules. Proceedings of 21th International Conference on VLDB, Canada, June 1996: 432-444.
    [12] 慕红宇，熊金明，基于数据仓库的数据挖掘技术．绍兴文理学院学报，Vol．22，No．1，Mar．2002：45～49．
    [13] L. Weiyang, A. A. Sergio, R. Carolina. Efficient adaptive-support association rule mining for recommender systems. Data Mining and Knowledge Discovery, Kluwer Academic Publishers, Vol.6, 2002: 83～105.
    [14] H. Zhexue. Extensions to the K-Means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, Vol.2, 1998: 283～304.
    [15] 苗夺谦．Rough Set理论中连续属性的离散化方法．自动化学报，Vol．27，No．3，May 2001：296～302．
    [16] T. Fukuda, Y. Morimoto, S. Morishita et al. Mining optimized association rules

    for numeric attributes. Proceedings of the ACM SIGMOD International Conference on Management of Data, 1996:13～24.
    [17] 李永敏，朱善君，陈湘晖等．基于粗糙集理论的数据挖掘模型．清华大学学报自然科学版，Vol．39，No．1，1999：110～113．
    [18] 蔡伟杰，张晓辉，朱建秋等．关联规则挖掘综述．计算机工程，Vol．27，No．5，May 2001：31～33．
    [19] K. Thearling. Data mining and customer relationships. Sky Writings, Issue 11,April 2000.
    [20] 丁夷．数据挖掘——技术与应用综述．西安邮电学院学报，Voo．4，No．3，Sep，1999：41～44．
    [21] G. Adomavicius, A. Tuzhilin. Expert-driven validation of rule-based user models in personalization applications. Data Mining and Knowledge Discovery, Vol.5, 2001:33～58.
    [22] G. Adomavicius, A. Tuzhilin. User profiling in personalization applications through rule discovery and validation. Proceedings of 5th ACM SIGMOD International Conference on Knowledge Discovery and Data Mining, 1999:377～381.
    [23] B. Liu, W. Hsu, Y. Ma. Pruning and summarizing the discovered associations. Proceedings of the 5th ACM SIGMOD International Conference on Knowledge Discovery and Data Mining, 1999.
    [24] C. Tom, F. Dongping, C. John et al. A robust and scalable clustering algorithm for mixed type attributes in large database environment. Proceedings of the 7th ACM SIGMOD International Conference on Knowledge Discovery and Data Mining, USA, Aug. 2001: 263～268.
    [25] 江林．消费者心理与行为．北京：人民大学出版社，2002．
    [26] T. Zhang, R. Ramakrishnan, M. Livny. BIRCH: a new data clustering algorithm and its applications. Proceedings of the 2th ArM SIGMOD International Conference on Knowledge Discovery, and Data Mining, Vol. 25, No.2, June 1996: 103～114.
    [27] 吕廷杰，尹涛，王琦．客户关系管理与主题分析．北京；人民邮电出版社，2002．
    [28] 马元元，孙志辉，高红梅．时态数据库中增量关联规则的挖掘．计算机研究与发展，Vol．37，No．12，Dec．2000：1446～1451．
    [29] Z. Zijian, R.Kohavi, L. Mason. Real world performance of association rule algorithms. Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, California, Aug. 2001.
    [30] T. Kanungo, N. S. Netanyahu. An efficient K-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.24, No.7, July 2002: 881～892.


    [31] P. S. Bradley, U. Fayyad, C. Reina. Scaling clustering algorithms to large databases. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, 1998: 9～15.
    [32] M. Bamshad, D. Honghua, N. Miki. Discovery and evaluation of aggregate usage profiles for web personalization. Data Mining and Knowledge Discovery, Vol.6, 2002: 61～82.
    [33] R. Agrawal, C. Aggarwal, V. Prasad. Mining sequence patterns. Proceedings of the International Conference on Data Engineering, Taiwan, March 1995.
    [34] M. J. Zaki, C. J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. Proceedings of 2002 SIAM International Conference on Data Mining, Arlington, April 2002.
    [35] S. Parthasarathy, M. J. Zaki, S. Dwarkadas. Incremental and interactive sequence mining. ACM International Conference on Information and Knowledge Management, Nov. 1999.
    [36] J. Yang, P. Yu, W. Wang, J. Han. Mining Long Sequential Patterns in a Noisy Environment. Proceedings, of ACM SIGMOD International Conference on Management of Data, Madison, WI, June 2002.
    [37] P. S. Bradley, U. M. Fayyad. Refining initial points for K-Means clustering. Proceedings of the 15th International Conference on Machine Learning, 1998: 91～99.
    [38] Alex Berson，Stephen Smith，Kurt Thearling著．贺奇，郑岩，魏藜等译．构建面向CRM的数据挖掘应用．北京：人民邮电出版社，2001．
    [39] W．H．Inmon著，王志海等译．Building the Data Warehouse(数据仓库，第二版)．北京：机械工业出版社，2000．
    [40] 孙即祥．模式识别中的特征提取与计算机视觉不变量．北京：国防工业出版社，2001．
    [41] D. Lewis, W. A. Gale. A sequential algorithm for training text classifiers. Proceedings of the 17th Annual ACM SIGIR Conference, London, Vol.3, 1994.
    [42] 程岩，黄梯云．最小归纳依赖关系在采掘聚类关联规则中的应用研究．计算机工程与应用，July 2000：105～108．
    [43] R. Agrawal, T. Imielinski, A. Swami. Mining association rules between sets of items in large databases. Proceedings of the 1993 ACM SIGMOD Conference, Washington DC, May 1993: 207～216.
    [44] B. Olcay. Feature subset selection by using sorted feature relevance. Proceedings of the 2002 International Conference on Machine Learning and Applications, June 2002.
    [45] R. Srikant, Q. Vu, R. Agrawal. Mining association rules with item constraints. Proceedings of the 3~(rd) International Conference on Knowledge Discovery in Dtabases

    and Data Mining, California, Aug. 1997: 67～73.
    [46] S. K. Gupta, Vasudha Bhatnagar, S. K. Wasan. User-centric mining of association rules. DDMI Workshop, PKDD 2000, Lyons, France, Sep. 2000.
    [47] B. Liu, W. Hsu, et al. Discovering interesting knowledge using DM-Ⅱ. Proceedings of International Conference on Knowledge Discovery and Data Mining, Aug. 1999.
    [48] R. Meo. A new approach for discovery of frequent itemsets. Proceedings of 1st International Conference on Data Warehousing and Knowledge Discovery, Aug. 1999.
    [49] B. Uma. Incremental association rule algorithm for intension mining. Master thesis, Indian Institute of Technology, New Delhi, India, 1999.
    [50] Z. Pawlak. Rough set approach to knowledge-based decision support. The invited paper for the 14th European Conference on Operational Research, Jerusalem, Israel, July 1995.
    [51] C-S. Perng, W. Haixun, M. Sheng et al. Discovery in multi-attribute data with user-defined constraints. SIGMOD Exploration, Vol.4, Issue 1, June 2002: 56～64.
    [52] 易华容．聚类分析中相似性测量方法的研究．株洲师范高等专科学校学报，Vol．7，NO．2，Apr．2002：43～46．
    [53] R. Bayardo, R. Agrawal, D. Gunopulos. Constraint-based rule mining in large, dense databases. Proceedings of the 15th International Conference on Data Engineering, March 1999:188～197.
    [54] J. Fink, A. Kobsa. A review and analysis of commercial user modeling servers for personalization on the World Wide Web. User Modeling and User-Adapted Interaction, Vol. 10, 2000: 209～249.
    [55] G. Adomavicius, A. Tuzhilin. Using data mining methods to build customer profiles. IEEE computer, Vol.34, 2001:74～82.
    [56] I. Duentsch, G. Gediga, H. S. Nguyen. Rough set data analysis in the KDD Process. Proceedings of the 8th International Conference IPMU'2000, Span, July 2000: 220～226.
    [57] A. G. Buchner, S. S. Anand, J. G. Hughes. Data mining in manufacturing environments: Goals, techniques and applications. Studies in Informatics and Control, 1997: 319～328.
    [58] I. Duntsch. A logic for rough sets. Theoretical Computer Science, 1997: 427～436.
    [59] I. Duntsch, G. Gediga. Statistical evaluation of rough set dependency analysis. Human-Cumputer Studies, 1997: 589～604.
    [60] H. Wang, F. Murtagh, D. Bell. Feature subset selection based on relevance. Vistas

    in Astronomy, 1997: 387～396.
    [61] T. Raymond, V. S. Lakshmanan, H. Jiawei et al. Exploratory mining and pruning optimizations of constrained association rules. Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, Vol.27, Issue 2, June 1998: 13～24.
    [62]福克赛尔著，裴利芳何润宇译．市场营销中的消费者心理学．北京：机械工业出版社，2001．
    [63] Catherine Bounsaythip, Esa Rinta-Runsala. Overview of data mining for customer behavior modeling. VTT Research Report, TTE 1-2001-18, June 2001.
    [64] H. P. Crowder, J. Dinkelacker, M. Hsu. Predictive customer relationship management: Gaining insights about customer in the electronic economy. DM Review, Feb. 2001.
    [65] R. Bayardo. Efficiently mining long patterns from database. Proceedings of ACM SIGMOD International Conference on Management of Data, 1998: 85～93.
    [66] G. Alfred, S. Harry, T. Bhavani. Data mining for E-business: Developments and directions. Data Mining and Knowledge Discovery: Theory, Tools, and Technology Ⅱ, Proceedings of SPIE, Vol.4057, 2000: 388～392.
    [67]部分参考网址：
    http://www.sofarworld.com/ebusiness/index.asp
    http://www.ecw.com.cn/
    http://www.ccw.com.cn/applic/
    http://www.erpoo.com/eb-index.htm
    http://www.cec.globalsources.com/
    http://www.ctiforum.com/
    http://www.greaterchinacrm.org/
    http://www.amteam.org/.
    http://www.dmgroup.org.cn
    http://www.dmreview.com
    http://www.dwway.com
    http://www.kdnuggets.com
    http://www.ceocio.com.cn

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700