基于特征选择和SVM的电信客户离网预测
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Prediction on customer leaving the telecom network based on feature selection and SVM
  • 作者:卢光跃 ; 张宏建 ; 闫真光 ; 吴洋
  • 英文作者:LU Guangyue;ZHANG Hongjian;YAN Zhenguang;WU Yang;Shaanxi Key Laboratory of Information Communication Network and Security,Xi'an University of Posts and Telecommunications;
  • 关键词:电信客户 ; 离网预测 ; 特征选择 ; 支持向量机
  • 英文关键词:telecommunications customer;;leaving the telecom network prediction;;feature selection;;support vector machine
  • 中文刊名:XAYD
  • 英文刊名:Journal of Xi'an University of Posts and Telecommunications
  • 机构:西安邮电大学陕西省信息通信网络及安全重点实验室;
  • 出版日期:2019-03-10
  • 出版单位:西安邮电大学学报
  • 年:2019
  • 期:v.24;No.137
  • 基金:陕西省工业科技攻关计划资助项目(2015GY-013,2016GY-113)
  • 语种:中文;
  • 页:XAYD201902005
  • 页数:5
  • CN:02
  • ISSN:61-1493/TN
  • 分类号:25-29
摘要
针对数据挖掘算法在预测电信客户离网时存在的过拟合问题,提出一种基于特征选择和支持向量机的电信客户离网预测算法。将原始的电信数据分别进行数据缺失值填充、数据冗余识别、数据结构化和数据归一化等预处理,得到利于分析处理的规范性数据;利用信息增益完成特征选择,提取影响客户离网的主要因素,降低数据维度,防止出现过拟合现象。将经过特征选择后的数据作为支持向量机算法的输入数据对客户是否离网进行分类,预测客户是否存在离网行为。测试结果表明,该算法预测离网客户的正确率为86%,提升了离网客户预测准确率。
        To solve the overfitting problem on predicting the customer's leaving the telecom network for data mining algorithms, a new algorithm based on feature selection and support vector machine is proposed in this paper. Original telecommunication data are processed through data loss, data redundancy identification and data structure to obtain the normalized data. Using information gain for feature selection, the main factors affecting customer out of network are extracted to remove irrelevant or redundant features and then to reduce the data dimension and prevent overfitting. The data after feature selection is then used as the input data of the SVM algorithm to classify whether the customer is out of network, to predict whether the customer has behaviours of potentially leaving the telecom network. Prediction results using this algorithm show that the accuracy rate of leaving the telecom network is 86%, and thus show this algorithm can improve prediction accuracy on the customer leaving the telecom network.
引文
[1] 卢光跃,王航龙,李创创,等.基于改进的K近邻和支持向量机客户流失预测[J/OL].西安邮电大学学报,2018,23(2):5-10[2018-09-29].http://dx.chinadoi.cn/10.13682/j.issn.2095-6533.2018.02.001.
    [2] 熊亚军,廖晓农,李梓铭,等.KNN数据挖掘算法在北京地区霾等级预报中的应用[J/OL].气象,2015,41(1):98-104[2018-09-29].http://dx.chinadoi.cn/10.7519/j.issn.1000-0526.2015.01.012.
    [3] 李婉华,陈宏,郭昆,等.基于随机森林算法的用电负荷预测研究[J/OL].计算机工程与应用,2016,52(23):236-243[2018-09-29].http://dx.chinadoi.cn/10.3778/j.issn.1002-8331.1606-0203.
    [4] 卢光跃,董静怡,岳赟,等.基于主成分分析和分类回归树的客户欠费预测[J/OL].西安邮电大学学报,2017,22(3):29-33[2018-09-29].http://dx.chinadoi.cn/10.13682/j.issn.2095-6533.2017.03.005.
    [5] 崔伟,夏汛,孙瑜鲁.基于随机KNN特征选择的高质量移动通信用户预测[J/OL].现代计算机(专业版),2017(26):9-12[2018-09-29].http://dx.chinadoi.cn/0.3969/j.issn.1007-1423.2017.26.002.
    [6] 赵清杰,刘若宇.基于随机森林的大迎角非线性非定常气动建模方法[J/OL].北京理工大学学报,2017,37(11):1171-1177[2018-09-29].http://dx.chinadoi.cn/10.15918/j.tbit1001-0645.2017.11.11.
    [7] 李红灵,邹建鑫.基于SVM和文本特征向量提取的SQL注入检测研究[J/OL].信息网络安全,2017(12):40-46[2018-09-29].http://dx.chinadoi.cn/10.3969/j.issn.1671-1122.2017.12.008.
    [8] 宋勇,蔡志平.大数据环境下基于信息论的入侵检测数据归一化方法[J/OL].武汉大学学报(理学版),2018,64(2):121-126[2018-09-29].http://dx.chinadoi.cn/10.14188/j.1671-8836.2018.02.004.
    [9] 胡敏杰,郑荔平,唐莉,等.联合谱聚类与邻域互信息的特征选择算法[J/OL].模式识别与人工智能,2017,30(12):1121-1129[2018-09-29].http://dx.chinadoi.cn/10.16451/j.cnki.issn1003-6059.201712008.
    [10] XU H,YU S,CHEN J,et al.An Improved Firefly Algorithm for Feature Selection in Classification[J/OL].Wireless Personal Communications,2018(3):1-12[2018-09-29].https://doi.org/10.1007/s11277-018-5309-1.
    [11] SHANG C,LI M,FENG S,et al.Feature selection via maximizing global information gain for text classification[J/OL].Knowledge-Based Systems,2013,54(4):298-309[2018-09-29].https://doi.org/10.1016/j.knosys.2013.09.019.
    [12] 周悦,邢妍妍,郭威.基于信息增益率的W-NB水下机器人故障分类[J/OL].计算机测量与控制,2016,24(10):42-44[2018-09-29].http://dx.chinadoi.cn/10.16526/j.cnki.11-4762/tp.2016.10.013.
    [13] 李航.统计学习方法[M/OL].北京:清华大学出版社,2012:1-235[2018-09-29].http://book.knowsky.com/book_827648.htm.
    [14] KODOVSKY J,FRIDRICH J,HOLUB V.Ensemble Classifiers for Steganalysis of Digital Media[J/OL].IEEE Transactions on Information Forensics & Security,2012,7(2):432-444[2018-09-29].https://doi.org/10.1109/tifs.2011.2175919.
    [15] PAL M,FOODY G M.Feature Selection for Classification of Hyperspectral Data by SVM[J/OL].IEEE Transactions on Geoscience & Remote Sensing,2010,48(5):2297-2307[2018-09-29].https://doi.org/10.1109/tgrs.2009.2039484.
    [16] MTETWA N,YOUSEFI M,REDDY V.Featureselection for an SVM based webpage classifier[C/OL]// 2017 IEEE 4th International Conference on Soft Computing & Machine Intelligence .Port Louis:IEEE,2018:85-88[2018-09-29].https://doi.org/10.1109/iscmi.2017.8279603.
    [17] CHAUHAN V K,DAHIYA K,SHARMA A.Problem formulations and solvers in linear SVM:a review[J/OL].Artificial Intelligence Review,2018(6):1-53[2018-09-29].https://doi.org/10.1007/s10462-018-9614-6.
    [18] YINGJIE T,YONG S,XIAOHUI L.Recent advances on support vector machines research[J/OL].Technological & Economic Development of Economy,2012,18(1):5-33[2018-09-29].https://doi.org/10.3846/20294913.2012.661205.
    [19] 李创创,卢光跃,王航龙.基于边界样本欠取样支持向量机的电信用户欠费分类算法[J/OL].电信科学,2017,33(9):85-91[2018-09-29].http://dx.chinadoi.cn/10.11959/j.issn.1000-0801.2017208.
    [20] 谷红勋,杨珂.基于大数据的移动客户行为分析系统与应用案例[J/OL].电信科学,2016,32(3):139-146[2018-09-29].http://dx.chinadoi.cn/10.11959/j.issn.1000-0801.2016039.
    [21] 包志强,崔妍.电信客户欠费模型评估[J/OL].西安邮电大学学报,2015,20(4):97-101[2018-09-01].http://dx.chinadoi.cn/10.13682/j.issn.2095-6533.2015.04.020.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700