数据挖掘在电信产品生命周期管理中的应用研究

英文题名：Research of Data Mining in Telecom Product Life Cycle Management
作者：刘永
论文级别：硕士
学科专业名称：计算机系统结构
中文关键词：数据挖掘 ; 电信产品生命周期 ; 客户细分 ; 产品预演 ; 相似性计算 ; 距离
英文关键词：Data Mining ; Telecom Product Life Cycle ; Customer Segmentation ; Product Forecast ; Similarity Computing ; Distance
学位年度：2008
导师：陈治平
学科代码：081201
学位授予单位：湖南大学
论文提交日期：2008-04-17
答辩委员会主席：文双春

摘要

随着中国电信市场的逐渐成熟,电信市场竞争日益加剧,电信运营商为了留住老客户、吸引更多的新客户,不断推出新的电信产品。一个新产品从构思、开发、发布和退出市场等整个过程构成了电信产品的生命周期,而在这个周期中的新产品目标客户群的定位、推向市场前的收益预测等环节构成了电信产品生命周期管理中的难题。本文对电信产品生命周期管理模型进行了分析,并且运用数据挖掘的相关知识着重对模型中的客户细分和产品预演进行了研究。
     根据数据挖掘技术目前的发展现状和本文研究要用到的数据挖掘知识,本文首先分析了数据挖掘过程模型和对聚类算法进行了归纳。在此基础上,本文根据国内电信运营商的相关规范并结合实际项目提出电信产品生命周期管理模型,并对模型中的各个环节进行了具体分析。通过对客户信息进行归类,采用属性约简和数据规范化处理后,本文建立了电信客户细分模型和客户细分评价体系。然而,电信新产品在正式推向市场前没有客户和消费信息,要对其进行市场收益预测等只能借助于现有的电信产品,所以需要进行产品间的相似性计算。本文对电信产品属性进行分析,通过属性特征抽取建立了电信产品相似度计算模型,并利用对象属性相关性的强弱进行属性归并等提出了基于复杂对象分解的相似性度量方法。
     本文根据某电信运营商的实际数据进行的仿真试验结果表明:基于客户细分模型的K-Means算法在客户细分中有着较好的整体性能,为电信新产品目标客户群的市场定位提供了依据;基于产品相似度计算模型的复杂对象分解的相似性度量方法比传统距离度量方法能更准确的预测新产品的市场情况。
The competition of Chinese telecom market becomes intensified more and more with gradully mature of telecom market, and telecom operators provide the new telecom products unceasingly in order to detain the old customers and attract more new customers. A new telecom product process which contains idea, development, deployment, withdraw from the market and so on constitutes the telecom product life cycle (TPLC). But the new product goal customers and the income forecast of new product in the TPLC are a difficult problem. The paper analyzes the model of TPLC management, and using the correlative knowledge of Data Mining researches emphatically the customer segmentation and the product forecast in the TPLC.
     The paper analyzes first the process model of Data Mining and cluster algorithms according to the current development states of Data Mining technology and the knowledge of Data Mining which will be used in the paper. Based these introduction, the paper proposes the model of TPLC according to the related standard of domestic telecom operators and the actual telecom project , and also analyzes detail each link in the model. Through analysing the customer information, the paper establishes the telecom customer segmentatin model and the appraised system of the model after attribute reduction and data standardizatin processing. But new telecom products have not related object customers and expended information before the new product is deployed normally to market, and the information of the market income forecast of the new product can only obtain by the existing telecom product, so needs to carry on the similarity computing between the products. The paper analyzes the telecom product attributes and establishes the telecom product similarity computing model through the attribute character extraction, and proposes the decomposed similar measure method based on the complex object by combining attributes according to the strong and weak relations between attributes.
     The result of simulation test which is carried to use actual data of some telecom operators indicates that K-Means algorithm based on the customer segmentation has the good overall performance than other algorithms, and it provides the basis for market localization of new product object customers; The decomposed similar measure method based on the telecom product similarity computing model can forecast market conditions of new product more accurate than the tradition measure methods.

引文

[1] Shin-Yuan Hung,David C.Yen,and Hsiu-Yu Wang.Applying data mining to telecom churn management.Expert Systems with Applications,2006,31(3):515-524
    [2] Klemettinen,Mannila,Toivonen.Rule discovery in telecommunication alarm data.Journal of Network and Systems Management,1999,7(4):395-423
    [3] Roy Sterritt.Discoverying Rules for Fault Management.Jigsaw Research Symposium at QUB,2000
    [4] Sasisekhara,Seshadri.Data mining and forecasting in large-scale telecommunication networks.IEEE Intelligent System,1996,11(1):37-43
    [5] 李勇, 熊世平, 顾学道. 基于多元时间序列的移动通信业务产品的生命周期评价. 计算机应用,2006,26:175-177
    [6] 吴斌,郑毅,傅伟鹏等.一种基于群体智能的客户行为分析算法. 计算机学报,2003,26(8):913-918
    [7] 汤小文,蔡庆生. 数据挖掘在电信业中的应用. 计算机工程与应用, 2004, 30(6):36,37,41
    [8] 李萍, 齐佳音, 舒华英. 移动流失客户挽留价值估算模型探讨. 北京邮电大学学报(社会科学版), 2007, 7(3): 39-43
    [9] 范英, 张忠能, 凌君逸. 聚类方法在通信行业客户细分中的应用. 计算机工程(增刊), 2004, 30: 440-441,488
    [10] Jiawei Han, Micheline Kamber 著, 范明孟小峰等译. 数据挖掘: 概念与技术. 北京: 机械工业出版社,2001,3-4,231-238
    [11] Xiaoxin Yin, Jiawei Han. CPAR: Classification based on Predictive Association Rules. In: Intelligent System, 2006 3rd International IEEE Conference. London,2006, 483-487
    [12] Mu-Chun Su, Yi-Chun Liu. A new approach to clustering data with arbitrary shapes. The Journal of the Pattern Recognition,2005, 38:1887-1901
    [13] 丁世飞, 史忠植, 靳奉祥等. 基于广义信息距离的直接聚类算法. 计算机研究与发展,2007, 44(4):674-679
    [14] 李丙春, 耿国华. 数据仓库与数据挖掘在电信行业中的应用. 新疆大学学报(自然科学版),2002, 19(3): 358-360
    [15] 郑国荣, 张邦礼, 郭鹏等. 聚类分析在电信消费模式中的应用. 重庆大学学报(自然科学版),2006, 29(4): 119-121
    [16] 吴琼, 姚军剑, 杨文川. 数据仓库在电信客户流失分析中的应用. 中国科技论文在线
    [17] 于爱民. 利用数据挖掘实现电信行业客户流失分析. 广东通信技术,2004, 24(5): 4-7
    [18] 卢启程, 邹平. 数据挖掘的研究与应用进展. 昆明理工大学学报,2002, 27(5): 63-66,70
    [19] 黄解军, 潘和平, 万幼川. 数据挖掘技术的应用研究. 计算机工程与应用,2003, 3:45-48
    [20] 朱琳玲, 胡学钢, 穆斌. 基于 Web 的数据挖掘研究综述. 电脑与信息技术,2002, 6: 45-48
    [21] Dragos Arotaritei, Sushmita Mitra. Web mining: a survey in the fuzzy framework. Fuzzy sets and systems,2004,148:5-19
    [22] T.A.Runkler, J.C.Bezdek. Web mining with relational clustering. International Journal of Approximate Reasoning,2003, 32: 217-236
    [23] 朱杨勇, 熊赟. DNA 序列数据挖掘技术. 软件学报,2007, 18(11): 2766-2781
    [24] 杨斌, 孟志青. 一种文本分类数据挖掘的技术. 湘潭大学自然科学学报,2001, 23(4): 34-37
    [25] 杨春金, 潘玲. 空间数据挖掘在 GIS中的应用研究. 通讯和计算机,2007, 4(2): 15-18
    [26] 黄书剑. 时序数据上的数据挖掘. 软件学报,2004, 15(1): 1-8
    [27] 中国联通 2006 年第一期 IT 规范项目-联通 BSS 系统域客户关系管理系统(CRM)数据模型规范.
    [28] 中国电信 CTG-MBOSS 1.0 规范-BSS_CRM 系统业务功能规范,2004
    [29] Clementine 11.0 User Guider
    [30] Usama M.Fayyad , Gregory Piatetsky-Shapiro, Padhraic Smyth. Adavance In Knowledge Dicovery and Data Mining[M]. AAAI/MIT Press, 1996: 83-115
    [31] Pang-Ning Tan, Michael Steinback, Vipin Kumar 著,范明, 范宏建等译. 数据挖掘导论. 北京: 人民邮电出版社,2006: 310-320
    [32] 周水庚, 周傲英, 曹晶. 基于数据分区的 DBSCAN 算法. 计算机研究与发展,2000, 37(10): 1153-1159
    [33] 陈治平, 王雷, 李志成. 基于密度梯度的聚类算法研究. 计算机应用,2006, 26(10): 2389-2404
    [34] Carlos Ordonez. Integrating K-Means Clustering with a Relational DBMS Using SQL. IEEE Transactions on Knowledge and Data Engineering,2006 , 18(2): 188-201
    [35] 谢林. 基于生命周期的电信产品分析[北京邮电大学硕士学位论文].北京,北京邮电大学,2005, 22-25
    [36] 王靖霖. 电信产品管理系统的研究与设计[北京邮电大学硕士学位论文].北京,北京邮电大学,2006, 16-26
    [37] 张静, 王建民, 何华灿. 基于属性相关性的属性约简新方法. 计算机工程与应用,2005,28: 55-57
    [38] Hung Chim, Xiaotie Deng. A New Suffix Tree Similarity Measure for Document Clustering. WWW2007/Track: Data Mining, Banff,Alberta,Canada:121-129
    [39] 李桂林, 陈晓云. 关于聚类分析中相似度的讨论.计算机工程与应用,2004,31:64-65,82
    [40] 陈汉军, 杨雪. 欧几里德距离的几种定义与应用. 天津轻工业学院学报,2003, 18(数学专刊):65-66,78
    [41] 王丹丹, 刘同明. 复杂类型数据挖掘技术的研究现状. 华东船舶工业学院学报(自然科学版),2003, 17(1): 72-76
    [42] 韩智东, 张丽萍. 浅谈研究复杂类型数据挖掘的必然. 科学管理研究,2004, 22(2): 71,72,87
    [43] 刘兵, 严和平, 段江姣等. 度量空间一种自底向上索引树构造算法. 计算机研究与发展,2006, 43(9): 1651-1657
    [44] 潘定, 沈钧毅. 时态数据挖掘的相似性发现技术. 软件学报,2007, 18(2): 246-258
    [45] Marica Camila N.Barioni, Humberto Razente, Agma Traina etc. SIREN: A Similarity Retrieval Engine for Complex Data. VLDB’06,Seoul, Korea
    [46] 陶跃华. 基于向量的相似度计算方案. 云南师范大学学报,2001, 21(5): 17-19
    [47] 熬成龙, 苏英, 龚元明. 基于相似度的复杂数据对象比较. 北京理工大学学报,2003, 23(5): 593-595
    [48] 潘谦红, 王炬, 史忠植. 基于属性论的文本相似度计算. 计算机学报,1999, 22(6): 651-655

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700