基于数据流深度学习算法的Android恶意应用检测方法

英文篇名：Android malware detection method based on data-flow deep learning algorithm
作者：朱大立 ; 金昊 ; 吴荻 ; 荆鹏飞 ; 杨莹
英文作者：ZHU Dali;JIN Hao;WU Di;JING Pengfei;YANG Ying;The 4th Laboratory, Institute of Information Engineering, Chinese Academy of Sciences;School of Cyber Security, University of Chinese Academy of Sciences;
关键词：机器学习 ; Android恶意应用检测 ; 深度学习 ; 数据流特征
英文关键词：machine learning;;android malware detection;;deep learning;;data flow feature
中文刊名：XAXB
英文刊名：Journal of Cyber Security
机构：中国科学院信息工程研究所第四研究室;中国科学院大学网络空间安全学院;
出版日期：2019-03-15
出版单位：信息安全学报
年：2019
期：v.4
基金：国家自然基金(No.61701494);; 中科院信工所青年之星(No.Y8YS016104);; 中国科学院战略性先导专项项目(No.XDA06010703)资助
语种：中文;
页：XAXB201902006
页数：16
CN：02
ISSN：10-1380/TN
分类号：57-72

摘要

目前针对未知的Android恶意应用可以采用机器学习算法进行检测,但传统的机器学习算法具有少于三层的计算单元,无法充分挖掘Android应用程序特征深层次的表达。文中首次提出了一种基于深度学习的算法DDBN(Data-flow Deep Belief Network)对Android应用程序数据流特征进行分析,从而检测Android未知恶意应用。首先,使用分析工具Flow Droid和SUSI提取能够反映Android应用恶意行为的静态数据流特征;然后,针对该特征设计了数据流深度学习算法DDBN,该算法通过构建深层的模型结构,并进行逐层特征变换,将数据流在原空间的特征表示变换到新的特征空间,从而使分类更加准确;最后,基于DDBN实现了Android恶意应用检测工具Flowdect,并对现实中的大量安全应用和恶意应用进行检测。实验结果表明,Flowdect能够充分学习Android应用程序的数据流特征,用于检测未知的Android恶意应用。通过与其他基于传统机器学习算法的检测方案对比, DDBN算法具有更优的检测效果。
At present, machine learning algorithm is always used to detect unknown malicious applications of Android.As traditional machine learning algorithm has less than three computing layers, it could not fully mine the deep characterizations of features in an Android application. For this problem, a Data-flow Deep Belief Network Algorithm(DDBN) is proposed, which learns data flow features deeply to detect Android malware. Firstly, we combine the analysis tools FlowDroid and SUSI to extract static data flow features, which can reflect malicious behaviors of an Android application.Then, we design DDBN to construct a deep model and transform the data flow features from the original representation space to a new feature space layer by layer, so as to achieve higher classification accuracy. Finally, we implement an automated tool named Flowdect based on DDBN to detect a number of benign and malicious applications in real. The experimental results show that Flowdect can fully learn the data flow features to detect unknown Android malware. What's more, DDBN performs better than other machine learning-based approaches on the accuracy and efficiency.

引文

[1]“Gartner says Android has surpassed 1.6 billion shipments of devices,”Gartner,http://www.gartn-er.com/newsroom/id/2954317,2016.
    [2]Sensor Tower,https://sensortower.com.
    [3]“Threat report 2015,”F-Secure,https://www.f-secure.com/documents/996508/1030743/Threat_Report_2015.pdf,2015.
    [4]V.Avdiienko,K.Kuznetsov,A.Gorla,A.Zeller,S.Arzt,S.Rasthofer,and E.Bodden,“Mining Apps for Abnormal Usage of Sensitive Data,”in Proc.37th IEEE International Conference on Software Engineering(ICSE’15),pp.426-436,2015.
    [5]W.Enck,P.Gilbert,S.Han,V.Tendulkar,B.G.Chun,L.P.Cox,and A.N.Sheth,“TaintDroid:an Information-flow Tracking System for Realtime Privacy Monitoring on Smartphones,”in Proc.Usenix Symposium on Operating Systems Design and Implementation(OSDI’10),pp.393-407,2010.
    [6]P.Hornyack,S.Han,J.Jung,S.Schechter,and D.Wetherall,“These Aren't the Droids you’re Looking For:Retrofitting Android to Protect Data from Imperious Applications,”in Proc.18th ACMConference on Computer and Communications Security(CCS’11),pp.639-652,2011.
    [7]S.Arzt,S.Rasthofer,C.Fritz,E.Bodden,A.Bartel,J.Klein,and P.McDaniel,“Flowdroid:Precise Context,Flow,Field,Object-sensitive and Lifecycle-aware Taint Analysis for Android Apps,”ACM Sigplan Notices,vol.49,no.6,pp.259-269,2014.
    [8]F.Wei,S.Roy,and X.Ou,“Amandroid:A Precise and General Inter-Component Data Flow Analysis Framework for Security Vetting of Android Apps,”in Proc.ACM Conference on Computer and Communications Security(CCS’14),pp.1329-1341,2014.
    [9]Z.Yang,and M.Yang,“Leakminer:Detect Information Leakage on Android with Static Taint Analysis,”in Proc.World Congress on Software Engineering(WCSE’12),pp.101-104,2012.
    [10]Z.Yuan,Y.Lu,and Y.Xue,“Droiddetector:Android Malware Characterization and Detection Using Deep Learning,”Tsinghua Science and Technology,vol.21,no.1,pp.114-123,2016.
    [11]Deep Learning,https://en.wikipedia.org/wiki/Deep_learning.
    [12]Deep Instinct,www.deepinstinct.com.
    [13]S.Rasthofer,S.Arzt,and E.Bodden,“A Machine-learning Approach for Classifying and Categorizing Android Sources and Sinks,”in Proc.Network and Distributed System Security Symposium(NDSS’14),2014.
    [14]VirusShare,http://virusshare.com/.
    [15]Y.Zhou,and X.Jiang,“Dissecting Android Malware:Characterization and Evolution,”in Proc.IEEE Symposium on Security and Privacy(S&P’12),pp.95-109,2012.
    [16]A.P.Fuchs,A.Chaudhuri,and J.S.Foster,“SCAndroid:Automated security certification of android applications,”Department of Computer Science,University of Maryland,College Park,Technical Report CS-TR-4991,November 2009.
    [17]W.Enck,M.Ongtang,and P.McDaniel,“On lightweight mobile phone application certification,”in Proc.16th ACM Conf.Comput.Commun.Security,2009,pp.235-245.
    [18]T.Debiaze,“Detecting malicious behavior for android applications by static analysis,”https://github.com/maaaaz/androwarn,2015.
    [19]V.Afonso,A.Bianchi,Y.Fratantonio,A.Doupé,M.Polino,and P.D.Geus,“Going Native:Using a Large-Scale Analysis of Android Apps to Create a Practical Native-Code Sandboxing Policy,”in Symposium on Network and Distributed System Security(NDSS’16),2016.
    [20]G.Suarez-Tangil,S.K.Dash,M.Ahmadi,J.Kinder,G.Giacinto,and L.Cavallaro,“DroidSieve:Fast and Accurate Classification of Obfuscated Android Malware,”in ACM on Conference on Data and Application Security and Privacy,pp.309-320,2017.
    [21]J.Seo,D.Kim,D.Cho,T.Kim,I.Shin,and J.Seo,“FLEXDROID:Enforcing In-App Privilege Separation in Android,”in Symposium on Network and Distributed System Security(NDSS’16),2016.
    [22]A.Desnos,and P.Lantz,“Droidbox:An android application sandbox for dynamic analysis,”Available:https://code.google.com/p/droidbox/,2013.
    [23]V.Rastogi,Y.Chen,and W.Enck,“AppsPlayground:automatic security analysis of smartphone applications,”in ACM Conference on Data and Application Security and Privacy,pp.209-220,2013.
    [24]L.K.Yan,and H.Yin,“DroidScope:Seamlessly Reconstructing the OS and Dalvik Semantic Views for Dynamic Android Malware Analysis,”in Proceedings of the 21st USENIX conference on Security symposium(USENIX’13),pp.29-29,2013.
    [25]M.Y.Wong,and D.Lie,“IntelliDroid:A Targeted Input Generator for the Dynamic Analysis of Android Malware,”in Symposium on Network and Distributed System Security(NDSS’16),2016.
    [26]K.Tam,S.J.Khan,A.Fattori,and L.Cavallaro,“CopperDroid:Automatic Reconstruction of Android Malware Behaviors,”in Network and Distributed System Security Symposium,2015.
    [27]H.S.Ham,and M.J.Choi,“Analysis of Android Malware Detection Performance Using Machine Learning Classifiers,”in Proc.IEEE International Conference on ICT Convergence(ICTC’13),pp.490-495,2013.
    [28]M.Z.Mas'Ud,S.Sahib,M.F.Abdollah,and S.R.Selamat,“Analysis of Features Selection and Machine Learning Classifier in Android Malware Detection,”in Proc.IEEE International Conference on Information Science&Applications(ICISA’14),pp.1-5,2014.
    [29]F.A.Narudin,A.Feizollah,N.B.Anuar,and A.Gani,“Evaluation of Machine Learning Classifiers for Mobile Malware Detection,”Soft Computing,vol.20,no.1,pp.1-15,2016.
    [30]D.Arp,M.Spreitzenbarth,M.Hübner,H.Gascon,and K.Rieck,“DREBIN:Effective and Explainable Detection of Android Malware in Your Pocket,”in Proc.Network and Distributed System Security Symposium(NDSS’14),2014.
    [31]P.Feng,J.Ma,and C.Sun,“Selecting Critical Data Flows in Android Applications for Abnormal Behavior Detection,”Mobile Information Systems,pp.1-16,2017.
    [32]M.Grace,Y.Zhou,Q.Zhang,S.Zou,and X.Jiang,“RiskRanker:scalable and accurate zero-day android malware detection,”in Proc.International Conference on Mobile Systems,Applications,and Services(Mobisys’12),pp.281-294,2012.
    [33]https://github.com/dineshshetty/Android-InsecureBankv2
    [34]L.Li,A.Bartel,J.Klein,and Y.L.Traon,“Using a path matching algorithm to detect inter-component leaks in android apps,”Giornale Di Malattie Infettive E Parassitarie,vol.23,no.5,pp.275-8,2014.
    [35]H.Dalziel,and A.Abraham,“Automated Security Analysis of Android and iOS Applications with Mobile Security Framework,”Syngress Publishing,2015.
    [36]“360 mobile security report,”360 mobile security lab,http://blogs.360.cn/360mobile/2015/04/14/analysis_of_dendoroid_b/,2015
    [37]Y.LeCun,Y.Bengio,and G.Hinton,“Deep Learning,”Nature,vol.521,no.7553,pp.436-444,2015.
    [38]Z.D.Su,Y.F.Zhu and L.Liu,“Android Malware Detection Based on Deep Learning,”Journal of Computer Applications,vol.37,no.6,pp.1650-1656,2017.
    [39]C.X.Zhang,N.N.Ji,and G.W.Wang,“Restricted Boltzmann Machine,”Chinese Journal of Engineering Mathematics(2),pp.159-173,2013.
    [40]E.D.Karnin,“A Simple Procedure for Pruning Back-propagation Trained Neural Networks,”IEEE Transactions on Neural Networks,vol.1,no.2,pp.239-42,1990.
    [41]H.Larochelle,Y.Bengio,J.Louradour,and P.Lamblin,“Exploring Strategies for Training Deep Neural Networks,”Journal of Machine Learning Research,vol.1,no.10,pp.1-40,2009.
    [42]G.E.Hinton,S.Osindero,and Y.W.Teh,“A Fast Learning Algorithm for Deep Belief Nets,”Neural Computation,vol.18,no.7,pp.1527-1554,2006.
    [43]CaffeOnSpark,https://github.com/yahoo/CaffeOnSpark/.
    [44]M.D.Zeiler,and R.Fergus,“Visualizing and understanding convolutional networks,”8689,pp.818-833,2013.
    [45]F.Idrees,M.Rajarajan,M.Conti,T.Chen,and Y.Rahulamathavan,“Pindroid:a novel android malware detection system using ensemble learning methods,”in Computers&Security,68,2017.
    [46]C.K.Guo,J.Xu,G.N.Si and S.H.Xu,“Model Checking for Software Information Leakage in Mobile Application,”Chinese Jounal of Computers,vol.39,no.11,pp.2324-2343,2016.
    [47]J.Garcia,M.Hammad,and Malek.Sam,“Lightweight,Obfuscation-Resilient Detection and Family Identification of Android Malware,”Transactions on Software Engineering and Methodology,2017.
    [48]Z.Yuan,Y.Lu,“Droid-Sec:deep learning in android malware detection,”in Acm Sigcomm Computer Communication Review(Sigcomm’14),pp.371-372,2014.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700