基于密集子图的银行电信诈骗检测方法

英文篇名：Dense subgraph based telecommunication fraud detection approach in bank
作者：刘枭 ; 王晓国
英文作者：LIU Xiao;WANG Xiaoguo;College of Electronics and Information Engineering, Tongji University;
关键词：电信诈骗 ; 无监督学习 ; 欺诈检测 ; 密集子图 ; 贪心算法
英文关键词：telecommunication fraud;;unsupervised learning;;fraud detection;;dense subgraph;;greedy algorithm
中文刊名：JSJY
英文刊名：Journal of Computer Applications
机构：同济大学电子与信息工程学院;
出版日期：2018-12-06 11:25
出版单位：计算机应用
年：2019
期：v.39;No.344
语种：中文;
页：JSJY201904045
页数：6
CN：04
ISSN：51-1307/TP
分类号：286-291

摘要

目前银行对电信诈骗的标记数据积累少,人工标记数据的代价大,导致电信诈骗检测的有监督学习方法可使用的标记数据不足。针对这个问题,提出一种基于密集子图的无监督学习方法用于电信诈骗的检测。首先,通过在账户-资源(IP地址和MAC地址统称为资源)网络搜索可疑度较高的子图来识别欺诈账户;然后,设计了一种符合电信诈骗特性的子图可疑度量;最后,提出一种磁盘驻留、线性内存消耗且有理论保障的可疑子图搜索算法。在两组模拟数据集上,所提方法的F1-score分别达到0.921和0.861,高于CrossSpot、fBox和EvilCohort算法,与M-Zoom算法的0.899和0.898相近,但是所提方法的平均运行时间和内存消耗峰值均小于M-Zoom算法;在真实数据集上,所提方法的F1-score达到0.550,高于fBox和EvilCohort算法,与M-Zoom算法的0.529相近。实验结果表明,所提方法能较好地应用于现阶段的银行反电信诈骗业务,且非常适合于实际应用中的大规模数据集。
Lack of labeled data accumulated for telecommunication fraud in the bank and high cost of manually labeling cause the insufficiency of labeled data that can be used in supervised learning methods for telecommunication fraud detection. To solve this problem, an unsupervised learning method based on dense subgraph was proposed to detect telecommunication fraud. Firstly, subgraphs with high anomaly degree in the network of accounts and resources(IP addresses and MAC addresses) were searched to identify fraud accounts. Then, a subgraph anomaly degree metric satisfying the features of telecommunication fraud was designed. Finally, a suspicious subgraph searching algorithm with resident disk, efficient memory and theory guarantee was proposed. On two synthetic datasets, the F1-scores of the proposed method are 0.921 and 0.861, which are higher than those of CrossSpot, fBox and EvilCohort algorithms while very close to those of M-Zoom algorithm(0.899 and 0.898), but the average running time and memory consumption peak of the proposed method are less than those of M-Zoom algorithm. On real-world dataset, F1-score of the proposed method is 0.550, which is higher than that of fBox and EvilCohort while very close to that of M-Zoom algorithm(0.529). Theoretical analysis and simulation results show that the proposed method can be applied to telecommunication fraud detection in the bank effectively, and is suitable for big datasets in practice.

引文

[1] JHA S, GUILLEN M, CHRISTOPHER W J. Employing transaction aggregation strategy to detect credit card fraud[J]. Expert Systems with Applications, 2012, 39(16):12650-12657.
    [2] van VLASSELAER V, BRAVO C, CAELEN O, et al. APATE:a novel approach for automated credit card transaction fraud detection using network-based extensions[J]. Decision Support Systems,2015, 75:38-48.
    [3] BAHNSEN A C, AOUADA D, STOJANOVIC A, et al. Detecting credit card fraud using periodic features[C]//ICMLA 2015:Proceedings of the 2015 IEEE 14th International Conference on Machine Learning and Applications. Piscataway, NJ:IEEE, 2015:208-213.
    [4] SAVGE D, WANG Q M, CHOU P L, et al. Detection of money laundering groups using supervised learning in networks[EB/OL].[2018-05-10]. https://arxiv. org/pdf/1608. 00708.
    [5] KHAC N A L, MARKOS S, KECHADI M. A data mining-based solution for detecting suspicious money laundering cases in an investment bank[C]//DBKDA 2010:Proceedings of the 2010 Second International Conference on Advances in Databases, Knowledge,and Data Applications. Piscataway, NJ:IEEE, 2010:235-240.
    [6] NEDA H, ALI H, MEHDI S. An intelligent anti-money laundering method for detecting risky users in the banking systems[J]. International Journal of Computer Applications, 2014, 97(22):35-39.
    [7] MICHALAK K, KORCZAK J. Graph mining approach to suspicious transaction detection[C]//Fed CSIS 2011:Proceedings of the 2011Federated Conference on Computer Science and Information Systems. Piscataway, NJ:IEEE, 2011:69-75.
    [8]喻炜,王建东.基于交易网络特征向量中心度量的可疑洗钱识别系统[J].计算机应用, 2009, 29(9):2581-2585.(YU W,WANG J D. Suspicious money laundering detection system based on eigenvector centrality measure of transaction network[J]. Journal of Computer Applications, 2009, 29(9):2581-2585.)
    [9] SOLTANI R, NGUYEN U, YANG Y, et al. A new algorithm for money laundering detection based on structural similarity[C]//UEMCON 2016:Proceedings of the 2016 IEEE 7th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference. Piscataway, NJ:IEEE, 2016:1-7.
    [10] HUANG J, SUN H, HAN J, et al. SHRINK:a structural clustering algorithm for detecting hierarchical communities in networks[C]//Proceedings of the 19th ACM International Conference on Information and Knowledge Management. New York:ACM, 2010:219-228.
    [11] GIANLUCA S, PIERRE M, GREGOIRE J, et al. EVILCOHORT:detecting communities of malicious accounts on online services[C]//SEC 2015:Proceedings of the 24th USENIX Conference on Security Symposium. Berkeley:USENIX Association, 2015:563-578.
    [12] BLONDEL V D, GUILAUME J L, LAMBIOTTE R, et al. Fast unfolding of communities in large networks[J]. Journal of Statistical Mechanics:Theory and Experiment, 2008, 2008(10):P10008.
    [13] PRAKAS B A, SRIDHARAN A, SESHADRI M, et al.Eigen Spokes:surprising patterns and scalable community chipping in large graphs[C]//PAKDD 2010:Proceedings of the 2010 Pacific-Asia Conference on Knowledge Discovery and Data Mining.Berlin:Springer, 2010:435-448.
    [14] SHAH N, BEUTEL A, GALLAGHER B, et al. Spotting suspicious link behavior with fB ox:an adversarial perspective[C]//ICDM:Proceedings of the 2014 IEEE International Conference on Data Mining. Piscataway, NJ:IEEE, 2014:959-964.
    [15] JIANG M, BEUTEL A, CUI P, et al. Spotting suspicious behaviors in multimodal data:a general metric and algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(8):2187-2200.
    [16] CHARIKAR M. Greedy approximation algorithms for finding dense components in a graph[C]//APPROX 2000:Proceedings of the Third International Workshop on Approximation Algorithms for Combinatorial Optimization. Berlin:Springer, 2000:84-95.
    [17] SHIN K, HOOI B, FALOUTSOS C. M-Zoom:fast dense-block detection in tensors with quality guarantees[C]//Proceedings of the 2016 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin:Springer, 2016:264-280.
    [18] SHIN K, HOOI B, FALOUTSOS C. Fast, accurate and flexible algorithms for dense subtensor mining[J]. ACM Transactions on Knowledge Discovery from Data, 2018, 12(3):1-30.
    [19] JIN R, XIANG Y, RUAN N, et al. 3-HOP:a high-compression indexing scheme for reachability query[C]//SIGMOD 2009:Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. New York:ACM, 2009:813-826.
    [20] BATAGELJ V, BRANDES U. Efficient generation of large random networks[J]. Physical Review E:Covering Statistical, Nonlinear,Biological, and Soft Matter Physics, 2005, 71(3):036113.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700