一种基于KNN算法的客户身份识别方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A customer identification method based on KNN algorithm
  • 作者:杨菁 ; 刘鲲鹏 ; 金鹏
  • 英文作者:YANG Jing;LIU Kunpeng;JIN Peng;State Grid Service Evaluation Department of Customer Service Center;
  • 关键词:统一身份识别 ; 文本挖掘 ; 大数据 ; KNN模型
  • 英文关键词:unified identity recognition;;text mining;;big data;;KNN model
  • 中文刊名:GZDJ
  • 英文刊名:Power Systems and Big Data
  • 机构:国网客服中心服务考评部;
  • 出版日期:2019-04-21
  • 出版单位:电力大数据
  • 年:2019
  • 期:v.22;No.238
  • 语种:中文;
  • 页:GZDJ201904011
  • 页数:7
  • CN:04
  • ISSN:52-1170/TK
  • 分类号:73-79
摘要
针对电力客户标签对客户的认知不清晰,客服业务大多针对自然人开展,客户标签标记在电话号码上,而传统电力业务主要针对户(户号)开展,客户标签标记在户号上,存在信息无法共享的困难,提出了基于95598业务,利用大数据分析及文本挖掘方法,构建统一身份识别模型,有效识别客户来电号码与户号的对应关系。采用分词技术,有效解析用电地址信息、客户姓名等内容,并计算地址相似度得分、姓名相似度得分,作为对应关系校验以及识别疑似户号的关键因子指标;针对能获取到的对应关系,构建权重划分模型,计算对应关系匹配度得分,根据分值大小,校验对应关系的可靠性;针对找不到户号对应关系的来电号码,基于文本相似度得分构建KNN模型,计算对应关系匹配度得分,依据分值大小,识别疑似户号。
        In order to solve the problems such as lack of clear understanding of customers,customer service business is mostly carried out for natural persons,customer labels are marked on telephone numbers,while traditional electric power business is mainly carried out for households( household numbers). Customer labels are marked on household numbers,which makes it difficult to share information. Based on the 95598 business,a unified identity recognition model is constructed by using big data analysis and text mining to effectively identify the corresponding relationship between customer phone number and account number. Using participle technique,efficient parsing address information,customer name,address and calculate the similarity score,name similarity scores,check the corresponding relationships and identify the key factor of the suspected door number indicator. According to the obtained correspondence,the weight division model is built,the matching score of corresponding relation is calculated,and the reliability of corresponding relation is verified according to the score value. KNN model was built based on the similarity score of text to calculate the matching score of corresponding relation,and identify the suspected account number according to the score value.
引文
[1]厉建宾,朱雅魁,吴彬彬,等.电力客户标签体系框架构建研究及应用实践[J].计算机工程与应用,2017,53(S1):229-235.LI Jianbin,ZHU Yakui,WU Binbin,et al. Research and application of power customer tag system framework[J]. Computer Engineering and Application,2017,53(S1):229-235.
    [2]林森,欧阳柳.基于大数据理论的电力客户标签体系构建[J].电气技术,2016,(12):98-101,112.LIN Sen,OUYANG Liu. Study on the construction of power customer label system based on big data theory[J]. Electrical Engineering,2016,(12):98-101,112.
    [3]史梦洁,王庆娟,涂莹,等.电力营销客户标签体系分类方法研究[J].电力需求侧管理. 2018,20(02):51-53.SHI Mengjie,WANG Qingjuan,TU Ying,et al. Research on classification method of power customer tag collection[J]. Power Demand Side Management,2018,20(02):51-53.
    [4]裘华东,涂莹,丁麒.基于标签库系统的电力企业客户画像构建与信用评估及电费风险防控应用[J].电信科学,2017,(Z1):206-213.QIU Huadong,TU Ying,DING Qi. Construction of power customer portrait and its credit evaluation and electricity fee risk control based on tag library system[J]. Telecommunications Science,2017,(Z1):206-213.
    [5]杜伟,蒋鹏,王文浩,等.基于变压器大数据画像技术与应用研究[J].电力大数据,2017,20(08):10-14.DU Wei,JIANG Peng,WANG Wenhao,et al. Research on technology and application of big data portrait based on transformer[J]. Power Systems and Big Data,2017,20(08):10-14.
    [6]吕辉,许道强,仲春林,等.基于电力大数据的标签画像技术与应用研究[J].电力信息与通信技术,2017,15(02):43-48.LV Hui,XU Daoqiang,ZHONG Chunlin,et al. Study on tag portrait technology based on electric power big data and its application[J]. Electric Power Information and Communication Technology,2017,15(02):43-48.
    [7]谢骏凯,徐千,丁炳淼.“客户画像”在电费回收风险管控中的应用[J].电力需求侧管理. 2016. 18(S1):74-76.XIE Jukai,XU Qian,DING Bingmiao. Application of customer models in the risk control of electricity fees’recovery[J]. Power Demand Side Management,2016,18(S1):74-76.
    [8]邓雪,李家铭,曾浩健,等.层次分析法权重计算方法分析及其应用研究[J].数学的实践与认识,2012,42(07):93-100.DENG Xue,LI Jiaming,ZENG Haojian,et al. Research on computation methods of AHP Wight Vector and its applications[J]. Mathematics in practice and theory,2012,42(07):93-100.
    [9]金菊良,魏一鸣,丁晶.基于改进层次分析法的模糊综合评价模型[J].水利学报,2004,(03):65-70.JIN Juliang,WEI Yiming,DING Jing. Fuzzy comprehensive evaluation model based on improved analytic hierarchy process[J]. Journal of Hydraulic Engineering,2004,(03):65-70.
    [10]吴殿廷,李东方.层次分析法的不足及其改进的途径[J].北京师范大学学报(自然科学版),2004,40(02):264-268.WU Dianting,LI Dongfang. Shortcomings of analytical hierarchy process and the path to improve the method[J]. Journal of Beijing Normal University(Natural Science),2004,40(02):264-268.
    [11]郭显光.改进的熵值法及其在经济效益评价中的应用[J].系统工程理论与实践,1998,18(12):99-103.GUO Xianguang. Application of improved entropy method in evaluation of economic result[J]. Systems Engineering-Theory&Practice,1998,18(12):99-103.
    [12]孙刘平,钱吴永.基于主成分分析法的综合评价方法的改进[J].数学的实践与认识,2009,39(18):15-20.SUN Liuping,QIAN Wuyong. An improved method based on principal component analysis for the comprehensive evaluation[J]. Practice and Understanding of Mathematics,2009,39(18):15-20.
    [13]代六玲,黄河燕,陈肇雄.中文文本分类中特征抽取方法的比较研究[J].中文信息学报,2004,18(01):26-32.DAI Liuling,HUANG Heyan,CHEN Zhaoxiong. A comparative study on feature selection in chinese text categorization[J].Journal of Chinese Information Processing,2004,18(01):26-32.
    [14]李蓉,孙媛. SVM-KNN分类器——一种提高SVM分类精度的新方法[J].科学技术与工程,2009,9(16):4653-4656.LI Rong,SUN Yuan. Application of SVM-KNN classifier into web page classification[J]. Science Technology and Engineering,2009,9(16):4653-4656.
    [15]罗辛,欧阳元新,熊璋,等.通过相似度支持度优化基于K近邻的协同过滤算法[J].计算机学报,2010,33(08):1437-1445.LUO Xin,OUYANG Yuanxin,XIONG Zhang,et al. The effect of similarity support in K-nearest-neighborhood based collaborative filtering[J]. Chinese Journal of Computers,2010,33(08):1437-1445.
    [16]刘义,景宁,陈荦,等. MapReduce框架下基于R-树的k-近邻连接算法[J].软件学报,2013,24(08):1836-1851.LIU Yi,JING Ning,CHEN Qian,et al. Algorithm for processing K-Nearest join based on R-Tree in MapReduce[J]. Journal of Software,2013,24(08):1836-1851.
    [17]王振军,黄瑞章.基于Spark的矩阵分解与最近邻融合的推荐算法[J].计算机系统应用,2017,26(04):124-129.WANG Zhenjun,HUANG Ruizhang. Recommendation algorithm using matrix decomposition and nearest neighbor fusion based on Spark[J]. Computer Systems&Applications,2017,26(04):124-129.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700