机器学习在物联网虚假用户识别中的运用
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Application of machine learning in the fake user identification of IoT
  • 作者:张溶芳 ; 许丹丹 ; 王元光 ; 潘思宇 ; 李正茂
  • 英文作者:ZHANG Rongfang;XU DANDan;WANG Yuanguang;PAN Siyu;LI Zhengmao;Research Institute of China United Network Communication Co., Ltd.;E-Commerce Center, China United Network Communication Co., Ltd.;Chongqing Branch of China United Network Communication Co., Ltd.;
  • 关键词:物联网 ; 半监督式学习模型 ; 朴素贝叶斯分类器 ; 随机森林 ; 支持向量机 ; SPY分类器
  • 英文关键词:IoT;;positive and unlabeled semi-unsupervised model;;Na?ve Bayesian classifier;;random forest;;support vector machine;;SPY classifier
  • 中文刊名:DXKX
  • 英文刊名:Telecommunications Science
  • 机构:中国联合网络通信有限公司研究院;中国联合网络通信集团有限公司电子商务中心;中国联合网络通信有限公司重庆市分公司;
  • 出版日期:2019-07-20
  • 出版单位:电信科学
  • 年:2019
  • 期:v.35
  • 语种:中文;
  • 页:DXKX201907017
  • 页数:9
  • CN:07
  • ISSN:11-2103/TN
  • 分类号:142-150
摘要
随着通信技术的发展,物联网卡和5G技术将得到大规模应用,但存在个别企业利用物联网卡资费便宜、没有实名制等特点从中非法牟利、破坏社会稳定的问题,不利于行业健康发展。因此如何识别虚假用户成为物联网行业研究的重要课题。主要研究了在实时海量的物联网终端数据中,如何运用机器学习模型高效地识别疑似虚假用户。具体来看,通过研究相关数据的特征,采用基于正样本和未标记样本的半监督式学习模型建立实时监控异常行为的模型,达到识别物联网行业中潜在虚假用户的目的。本研究成果体现在节约大量人力物力的同时,可以帮助相关部门、人员及时发现用户的异常行为,采取相应的措施避免产生较大损失,具有广泛的行业应用前景。
        With the development of communication technology, IoT cards and 5 G technologies will be applied on a large scale. However, there are some companies have taken advantage of the fact that the price of SIM cards of IoT is cheap and the cards do not have real-name registration system. It is harmful to social stability, which is not conducive to the development of Io T industry. So how to identify these fake users has become an important topic in IoT industry. The purpose was to use machine learning models to identify users who have high suspiciousness effectively. By studying the characteristics of relevant data, a semi-supervised learning model based on positive and unlabeled samples was used to establish a real-time abnormal behavior monitoring model to identify potential fake users in the IoT industry users. At the same time, the model greatly enhanced the working efficiency and has saved the manpower physical resources. Also, it can help relevant departments and governments to discover the abnormal behavior of users in time and take corresponding measures to avoid large losses. So, the proposed method really has broad application prospects in the industry.
引文
[1]刘楠,刘露,张第,等.运营商在物联网大数据竞争中的应对思考[J].电信科学,2018,34(3):132-137.LIU N,LIU L,ZHANG D,et al.Reflections on operators’competition in big data of internet of things[J].Telecommunications Science,2018,34(3):132-137.
    [2]曹霞,谢颖华.基于Hadoop的协同过滤并行化算法[J].计算机系统应用,2018(5).CAO X,XIE Y H.Parallel algorithm of collaborative filtering based on Hadoop[J].Application of Computer System,2018(5).
    [3]胡俊,胡贤德,程家兴.基于Spark的大数据混合计算模型[J].计算机系统应用,2015,24(4):214-218.HU J,HU X D,CHENG J X.Big data hybrid computing mode based on Spark[J].Application of Computer System,2015,24(4):214-218.
    [4]刘永增,张晓景,李先毅.基于Hadoop/Hive的Web日志分析系统的设计[J].广西大学学报(自然科学版),2011,36(S1):314-317.LIU Y Z,ZHANG X J,LI X Y.Design of Web log analysis system based on Hadoop/Hive[J].Journal of Guangxi University(Natural Science Edition),2011,36(S1):314-317.
    [5]范建永,龙明,熊伟.基于HBase的矢量空间数据分布式存储研究[J].地理与地理信息科学,2012,28(5):39-42.FAN J Y,LONG M,XIONG W.Research of vector spatial data distributed storage based on HBase[J].Geography and Geo-information Science,2012,28(5):39-42.
    [6]蒋守壮.基于Apache Kylin构建大数据分析平台[M].北京:清华大学出版社,2016.JIANG S Z.Construction of large data analysis platform based on Apache Kylin[M].Beijing:Tsinghua University Press,2016.
    [7]王子毅,张春海.基于ECharts的数据可视化分析组件设计实现[J].微型机与应用,2016,35(14):46-48,51.WANG Z Y,ZHANG C H.Design and implementation of a data visualization analysis component based on ECharts[J].Microcomputer its Applications,2016,35(14):46-48,51.
    [8]牛晓玲,吴蕾.DevOps发展现状研究[J].电信网技术,2017(10).NIU X L,WU L.Study on DevOps development status[J].Telecommunication Network Technology,2017(10).
    [9]郝树魁.Hadoop HDFS和MapReduce架构浅析[J].邮电设计技术,2012(7):37-42.HAO S K.Brief analysis of the architecture of Hadoop HDFSand MapReduce[J].Designing Techniques of Posts and Telecommunications,2012(7):37-42.
    [10]钟清流,蔡自兴.基于支持向量机的渐近式半监督式学习算法[J].计算机工程与应用,2006(25):19-22.ZHONG Q L,CAI Z X.Semi-supervised leaning algorithm based on SVM and by gradual approach[J].Application of Computer System,2006(25):19-22.
    [11]LIU B,DAI Y,LI X,et al.Building text classifiers using positive and unlabeled examples[C]//2003 IEEE International Conference on Data Mining,Nov 11,2003,Melbourne,USA.Piscataway:IEEE Press,2003:179.
    [12]周志华.机器学习[M].北京:清华大学出版社,2016.ZHOU Z H.Machine learning[M].Beijing:Tsinghua University Press,2016.
    [13]BREIMAN L.Random forest[J].Machine Learning,2001,45(2):5-32.
    [14]ADANKON M M,CHERIET M.Support vector machine[J].ComputerScience,2002,1(4):1-28.
    [15]王和勇,樊泓坤,姚正安.SMOTE和Biased SVM相结合的不平衡数据分类[J].计算机科学,2008,35(5):174-176.WANG H Y,FAN H K,YAO Z A.Imbalance data set classification using SMOTE and Biased-SVM[J].Computer Science,2008,35(5):174-176.
    [16]CHEN Y Q,ZHOU X S,HUANG T S.One-class SVM for learning in image retrieval[C]//International Conference on Image Processing,Oct 7-10,2001,Thessaloniki,Greece.Piscataway:IEEE Press,2001:34-37.
    [17]王健峰,张磊,陈国兴,等.基于改进的网格搜索法的SVM参数优化[J].应用科技,2012,39(3):28-31.WANG J F,ZHANG L,CHEN G X,et al.A parameter optimization method for an SVM based on improved grid search algorithm[J].Applied Science and Technology,2012,39(3):28-31.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700