一种使用静态分析和遗传搜索在Android恶意软件检测中搜索最优特征的方法(英文)
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Discovering optimal features using static analysis and a genetic search based method for Android malware detection
  • 作者:Ahmad ; FIRDAUS ; Nor ; Badrul ; ANUAR ; Ahmad ; KARIM ; Mohd ; Faizal ; Ab ; RAZAK
  • 英文作者:Ahmad FIRDAUS;Nor Badrul ANUAR;Ahmad KARIM;Mohd Faizal Ab RAZAK;Department of Computer System and Technology, University of Malaya;Faculty of Computer System & Software Engineering, University Malaysia Pahang;Department of Information Technology, Bahauddin Zakariya University;
  • 关键词:遗传算法 ; 静态分析 ; Android ; 恶意软件 ; 机器学习
  • 英文关键词:Genetic algorithm;;Static analysis;;Android;;Malware;;Machine learning
  • 中文刊名:JZUS
  • 英文刊名:信息与电子工程前沿(英文)
  • 机构:Department of Computer System and Technology, University of Malaya;Faculty of Computer System & Software Engineering, University Malaysia Pahang;Department of Information Technology, Bahauddin Zakariya University;
  • 出版日期:2018-06-03
  • 出版单位:Frontiers of Information Technology & Electronic Engineering
  • 年:2018
  • 期:v.19
  • 基金:supported by the Ministry of Science,Technology and Innovation of Malaysia,under the Grant e Science Fund(No.01-01-03-SF0914)
  • 语种:英文;
  • 页:JZUS201806002
  • 页数:26
  • CN:06
  • ISSN:33-1389/TP
  • 分类号:19-44
摘要
移动设备制造商在全球范围内快速开发各种Android版本。同时,网络罪犯也在实施各种恶意行为,例如跟踪用户活动、窃取个人数据以及实施银行诈骗。由于在日常生活中使用Android进行重要通信的人群数量庞大,这些网络罪犯从中获得了巨大非法收益。为此,安全从业者通过静态和动态分析对恶意软件进行识别。静态分析具有整体代码覆盖、低资源消耗和快速处理的优势。然而,静态分析需要最少量的特征才能对恶意软件进行有效分类。因此,我们采用基于遗传算法(GA)的遗传搜索(GS)在106个字符串中选择特征。为评估由GS确定的最佳特征,我们使用了5种机器学习分类器,分别是Na?ve Bayes(NB)、功能树(FT)、J48、随机森林(RF)和多层感知器(MLP)。在这5种分类器中,FT仅使用6种特征,获得最高准确度(95%)和最高真正率(TPR)(96.7%)。
        Mobile device manufacturers are rapidly producing miscellaneous Android versions worldwide. Simultaneously, cyber criminals are executing malicious actions, such as tracking user activities, stealing personal data, and committing bank fraud. These criminals gain numerous benefits as too many people use Android for their daily routines, including important communications. With this in mind, security practitioners have conducted static and dynamic analyses to identify malware. This study used static analysis because of its overall code coverage, low resource consumption, and rapid processing. However, static analysis requires a minimum number of features to efficiently classify malware. Therefore, we used genetic search(GS), which is a search based on a genetic algorithm(GA), to select the features among 106 strings. To evaluate the best features determined by GS, we used five machine learning classifiers, namely, Na?ve Bayes(NB), functional trees(FT), J48, random forest(RF), and multilayer perceptron(MLP). Among these classifiers, FT gave the highest accuracy(95%) and true positive rate(TPR)(96.7%) with the use of only six features.
引文
Aafer Y,Du WL,Yin H,2013.Droidapiminer:mining API-level features for robust malware detection in Android.Proc 9th Int ICST Conf on Security and Privacy in Communication Networks,p.86-103.
    Adewole KS,Anuar NB,Kamsin A,et al.,2017.Malicious accounts:dark of the social networks.J Netw Comput Appl,79:41-67.https://doi.org/10.1016/j.jnca.2016.11.030
    Afifi F,Anuar NB,Shamshirband S,et al.,2016.Dyhap:dynamic hybrid ANFIS-PSO approach for predicting mobile malware.PLo S ONE,11(9):e0162627.https://doi.org/10.1371/journal.pone.0162627
    Android,2015.App manifest.http://developer.Android.com/guide/topics/manifest/manifest-intro.html[Accessed on Apr.28,2015].
    Android Developers,2015.Android security overview.Android.https://source.Android.com/devices/tech/security/[Accessed on Sept.1,2015].
    Anuar NB,Sallehudin H,Gani A,et al.,2008.Identifying false alarm for network intrusion detection system using hybrid data mining and decision tree.Malays J Comput Sci,21(2):101-115.
    Anuar NB,Papadaki M,Furnell S,et al.,2013.Incident prioritisation using analytic hierarchy process(AHP):risk index model(RIM).Secur Commun Netw,6(9):1087-1116.https://doi.org/10.1002/sec.673
    Apvrille A,Strazzere T,2012.Reducing the window of opportunity for Android malware gotta catch’em all.JComput Virol,8(1-2):61-71.https://doi.org/10.1007/s11416-012-0162-3
    Arp D,Spreitzenbarth M,Malte H,et al.,2014.Drebin:effective and explainable detection of Android malware in your pocket.Proc Symp on Network and Distributed System Security,p.1-15.
    Arzt S,Rasthofer S,Fritz C,et al.,2014.Flowdroid:precise context,flow,field,object-sensitive and lifecycle-aware taint analysis for Android Apps.Proc 35th ACMSIGPLAN Conf on Programming Language Design and Implementation,p.259-269.https://doi.org/10.1145/2666356.2594299
    Aung Z,Zaw W,2013.Permission-based Android malware detection.Int J Sci Technol Res,2(3):228-234.
    Bartel A,Klein J,Le Traon Y,et al.,2012.Automatically securing permission-based software by reducing the attack surface:an application to Android.Proc 27th IEEE/ACM Int Conf on Automated Software Engineering,p.274-277.https://doi.org/10.1145/2351676.2351722
    Bird S,Klein E,Loper E,2009.Natural language processing with Python-analyzing text with the natural language toolkit.O’Reilly Media.
    Burguera I,Zurutuza U,Nadjm-Tehrani S,2011.Crowdroid:behavior-based malware detection system for Android.Proc 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices,p.15-26.https://doi.org/10.1145/2046614.2046619
    Caruana R,Karampatziakis N,Yessenalina A,2008.An empirical evaluation of supervised learning in high dimensions.Proc 25th Int Conf on Machine Learning,p.96-103.https://doi.org/10.1145/1390156.1390169
    Chan PPK,Song WK,2014.Static detection of Android malware by using permissions and API calls.Proc Int Conf on Machine Learning and Cybernetics,p.82-87.https://doi.org/10.1109/ICMLC.2014.7009096
    Chang TK,Hwang GH,2007.The design and implementation of an application program interface for securing XMLdocuments.J Syst Softw,80(8):1362-1374.https://doi.org/10.1016/j.jss.2006.10.051
    Chess B,Mc Graw G,2004.Static analysis for security.IEEESecur Priv,2(6):76-79.https://doi.org/10.1109/MSP.2004.111
    Deshotels L,Notani V,Lakhotia A,2014.Droidlegacy:automated familial classification of Android malware.Proc ACM SIGPLAN on Program Protection and Reverse Engineering Workshop,Article 3.https://doi.org/10.1145/2556464.2556467
    Desnos A,2015.Androguard.https://github.com/androguard/androguard[Accessed on June 29,2015].
    Díaz-Uriarte R,de Andrés SA,2006.Gene selection and classification of microarray data using random forest.BMC Bioinform,7:3.https://doi.org/10.1186/1471-2105-7-3
    eBay,2016.Online shopping.www.ebay.com[Accessed on Apr.4,2016].
    Faruki P,Ganmoor V,Laxmi V,et al.,2013.Andro Similar:robust statistical feature signature for Android malware detection.Proc 6th Int Conf on Security of Information and Networks,p.152-159.https://doi.org/10.1145/2523514.2523539
    Feizollah A,Anuar NB,Salleh R,et al.,2013a.A study of machine learning classifiers for anomaly-based mobile botnet detection.Malays J Comput Sci,26(4):251-265.
    Feizollah A,Shamshirband S,Anuar NB,et al.,2013b.Anomaly detection using cooperative fuzzy logic controller.Proc 16th FIRA Robo World Congress,p.220-231.https://doi.org/10.1007/978-3-642-40409-2_19
    Feizollah A,Anuar NB,Salleh R,et al.,2015.A review on feature selection in mobile malware detection.Dig Invest,13:22-37.https://doi.org/10.1016/j.diin.2015.02.001
    Feizollah A,Anuar NB,Salleh R,et al.,2017.Androdialysis:analysis of Android intent effectiveness in malware detection.Comput Secur,65:121-134.https://doi.org/10.1016/j.cose.2016.11.007
    Feng Y,Anand S,Dillig I,et al.,2014.Apposcopy:semantics-based detection of Android malware through static analysis.Proc 22nd ACM SIGSOFT Int Symp on Foundations of Software Engineering,p.576-587.https://doi.org/10.1145/2635868.2635869
    Firdaus A,Anuar NB,2015.Root-exploit malware detection using static analysis and machine learning.Proc 4th Int Conf on Computer Science and Computational Mathematics,p.177-183.
    Frank E,Hall MA,Witten IH,2016.The WEKA Workbench(4th Ed.).Morgan Kaufmann.http://www.cs.waikato.ac.nz/ml/WEKA/Witten_et_al_2016_appendix.pdf
    Fr?hlich H,Chapelle O,Sch?lkopf B,2003.Feature selection for support vector machines by means of genetic algorithm.Proc 15th IEEE Int Conf on Tools with Artificial Intelligence,p.142-148.https://doi.org/10.1109/TAI.2003.1250182
    Gascon H,Yamaguchi F,Arp D,et al.,2013.Structural detection of Android malware using embedded call graphs.Proc ACM Workshop on Artificial Intelligence and Security,p.45-54.https://doi.org/10.1145/2517312.2517315
    Goldberg DE,Holland JH,1988.Genetic algorithms and machine learning.Mach Learn,3(2-3):95-99.https://doi.org/10.1023/A:1022602019183
    Google,2014.Google play store.https://play.google.com/store?hl=en[Accessed on Jan.1,2014].
    Gordon MI,Kim D,Perkins J,et al.,2015.Information-flow analysis of Android applications in droid Safe.Proc Network and Distributed System Security Symp,p.8-11.
    Grace M,Zhou YJ,Wang Z,et al.,2012a.Systematic detection of capability leaks in stock Android smartphones.Proc19th Network and Distributed System Security Symp,p.1-15.
    Grace M,Zhou W,Jiang XX,et al.,2012b.Unsafe exposure analysis of mobile in-app advertisements.Proc 5th ACMConf on Security and Privacy in Wireless and Mobile Networks,p.101-112.https://doi.org/10.1145/2185448.2185464
    Grace M,Zhou YJ,Zhang Q,et al.,2012c.Risk Ranker:scalable and accurate zero-day Android malware detection.Proc 10th Int Conf on Mobile Systems,Applications,and Services,p.281-294.https://doi.org/10.1145/2307636.2307663
    Hall M,Frank E,Holmes G,et al.,2009.The WEKA data mining software:an update.ACM SIGKDD Explor Newsl,11(1):10-18.https://doi.org/10.1145/1656274.1656278
    Huang CY,Tsai YT,Hsu CH,2013.Performance evaluation on permission-based detection for Android malware.Proc Int Computer Symp,p.111-120.https://doi.org/10.1007/978-3-642-35473-1_12
    Huang JJ,Zhang XY,Tan L,et al.,2014.As Droid:detecting stealthy behaviors in Android applications by user interface and program behavior contradiction.Proc 36th Int Conf on Software Engineering,p.1036-1046.https://doi.org/10.1145/2568225.2568301
    Ikinci A,Holz T,Freiling F,2008.Monkey-spider:detecting malicious websites with low-interaction honeyclients.Proc Sicherheit-Schutz und Zuverl?ssigkeit,p.407-421.
    Junaid M,Liu DG,Kung D,2016.Dexteroid:detecting malicious behaviors in Android apps using reverseengineered life cycle models.Comput Secur,59:92-117.https://doi.org/10.1016/j.cose.2016.01.008
    Kang H,Jang JW,Mohaisen A,et al.,2015.Detecting and classifying Android malware using static analysis along with creator information.Int J Distr Sens Netw,11(6),Article 7.https://doi.org/10.1155/2015/479174
    Karim A,Salleh RB,Shiraz M,et al.,2014.Botnet detection techniques:review,future trends,and issues.J Zhejiang Univ Sci-C(Comput&Elcetron),15(11):943-983.https://doi.org/10.1631/jzus.C1300242
    Karim A,Salleh R,Khan MK,2016.Smartbot:a behavioral analysis framework augmented with machine learning to identify mobile botnet applications.PLo S ONE,11(3):e0150077.https://doi.org/10.1371/journal.pone.0150077
    Khatavakhotan AS,Ow SH,2015.Development of a software risk management model using unique features of a proposed audit component.Malays J Comput Sci,28(2):110-131.
    Komili O,2015.Sophos detects 100%of Android malware in independent test-for the sixth time in a row.https://blogs.sophos.com/2015/08/14/sophos-detects-100-of-An droid-malware-in-independent-test-for-the-sixth-time-ina-row/[Accessed on Jan.1,2016].
    Kotsiantis SB,2013.Decision trees:a recent overview.Artif Intell Rev,39(4):261-283.https://doi.org/10.1007/s10462-011-9272-4
    Kotsiantis SB,Zaharakis ID,Pintelas PE,2006.Machine learning:a review of classification and combining techniques.Artif Intell Rev,26(3):159-190.https://doi.org/10.1007/s10462-007-9052-3
    La Delfa GC,Monteleone S,Catania V,et al.,2016.Performance analysis of visualmarkers for indoor navigation systems.Front Inform Technol Electron Eng,17(8):730-740.https://doi.org/10.1631/FITEE.1500324
    Lai HJ,Tang Y,Luo HX,et al.,2011.Greedy feature selection for ranking.Proc 15th Int Conf on Computer Supported Cooperative Work in Design,p.42-46.https://doi.org/10.1109/CSCWD.2011.5960053
    Lee J,Lee S,Lee H,2015.Screening smartphone applications using malware family signatures.Comput Secur,52:234-249.https://doi.org/10.1016/j.cose.2015.02.003
    Lee SH,Jin SH,2013.Warning system for detecting malicious applications on Android system.Int J Comput Commun Eng,2(3):324-327.https://doi.org/10.7763/IJCCE.2013.V2.197
    Liang SY,Keep AW,Might M,et al.,2013.Sound and precise malware analysis for Android via pushdown reachability and entry-point saturation.Proc 3th ACM Workshop on Security and Privacy in Smartphones&Mobile Devices,p.21-32.https://doi.org/10.1145/2516760.2516769
    Lippmann R,1987.An introduction to computing with neural nets.IEEE ASSP Mag,4(2):4-22.https://doi.org/10.1109/MASSP.1987.1165576
    Lu L,Li ZC,Wu ZY,et al.,2012.CHEX:statically vetting Android apps for component hijacking vulnerabilities.Proc ACM Conf on Computer and Communications Security,p.229-240.https://doi.org/10.1145/2382196.2382223
    Middlemiss MJ,Dick G,2003.Weighted feature extraction using a genetic algorithm for intrusion detection.Proc Congress on Evolutionary Computation,p.1669-1675.https://doi.org/10.1109/CEC.2003.1299873
    Narudin FA,Feizollah A,Anuar NB,et al.,2016.Evaluation of machine learning classifiers for mobile malware detection.Soft Comput,20(1):343-357.https://doi.org/10.1007/s00500-014-1511-6
    Peiravian N,Zhu XQ,2013.Machine learning for Android malware detection using permission and API calls.Proc25th Int Conf on Tools with Artificial Intelligence,p.300-305.https://doi.org/10.1109/ICTAI.2013.53
    Peng H,Gates C,Sarma B,et al.,2012.Using probabilistic generative models for ranking risks of Android apps.Proc ACM Conf on Computer and Communications Security,p.241-252.https://doi.org/10.1145/2382196.2382224
    Punch WFIII,Goodman ED,Pei M,et al.,1993.Further research on feature selection and classification using genetic algorithms.Proc 5th Int Conf on Genetic Algorithms,p.557-564.
    Rasthofer S,Arzt S,Bodden E,2014.A machine-learning approach for classifying and categorizing Android sources and sinks.Proc Network and Distributed System Security Symp,p.1-15.
    Razak MFA,Anuar NB,Salleh R,et al.,2016.The rise of“malware”:bibliometric analysis of malware study.JNetw Comput Appl,75:58-76.https://doi.org/10.1016/j.jnca.2016.08.022
    Russon MA,2016.Android malware discovered on Google Play has infected millions of users with spyware.http://www.ibtimes.co.uk/Android-malware-discoveredgoogle-play-store-1553341[Accessed on June 13,2016].
    Sahs J,Khan L,2012.A machine learning approach to Android malware detection.Proc European Intelligence and Security Informatics Conf,p.141-147.https://doi.org/10.1109/EISIC.2012.34
    Samra AAA,Yim K,Ghanem OA,2013.Analysis of clustering technique in Android malware detection.Proc7th Int Conf on Innovative Mobile and Internet Services in Ubiquitous Computing,p.729-733.https://doi.org/10.1109/IMIS.2013.111
    Sanz B,Santos I,Laorden C,et al.,2013a.PUMA:permission usage to detect malware in Android.Int Joint Conf CISIS’12-ICEUTE’12-SOCO’12 Special Sessions.Springer Berlin Heidelberg,p.289-298.
    Sanz B,Santos I,Laorden C,et al.,2013b.Mama:manifest analysis for malware detection in Android.Cybern Syst,44(6-7):469-488.https://doi.org/10.1080/01969722.2013.803889
    Sarip AG,Hafez MB,Daud MN,2016.Application of fuzzy regression model for real estate price prediction.Malays JComput Sci,29(1):15-27.https://doi.org/10.22452/mjcs.vol29no1.2
    Sarma BP,Li NH,Gates C,et al.,2012.Android permissions:a perspective combining risks and benefits.Proc 17th ACM Symp on Access Control Models and Technologies,p.13-22.https://doi.org/10.1145/2295136.2295141
    Schmidt AD,Bye R,Schmidt HG,et al.,2009a.Static analysis of executables for collaborative malware detection on Android.Proc IEEE Int Conf on Communications,p.1-5.https://doi.org/10.1109/ICC.2009.5199486
    Schmidt AD,Schmidt HG,Batyuk L,et al.,2009b.Smartphone malware evolution revisited:Android next target?Proc 4th Int Conf on Malicious and Unwanted Software,p.1-7.https://doi.org/10.1109/MALWARE.2009.5403026
    Schneider J,2016.Cross validation.http://www.cs.cmu.edu/~schneide/tut5/node42.html[Accessed on Aug.1,2016].
    Seo SH,Gupta A,Mohamed Sallam A,et al.,2014.Detecting mobile malware threats to homeland security through static analysis.J Netw Comput Appl,38:43-53.https://doi.org/10.1016/j.jnca.2013.05.008
    Shabtai A,Fledel Y,Elovici Y,2010.Automated static code analysis for classifying Android applications using machine learning.Proc Int Conf on Computational Intelligence and Security,p.329-333.https://doi.org/10.1109/CIS.2010.77
    Shabtai A,Kanonov U,Elovici Y,et al.,2012.“Andromaly”:a behavioral malware detection framework for Android devices.J Intell Inform Syst,38(1):161-190.https://doi.org/10.1007/s10844-010-0148-x
    Sharif M,Yegneswaran V,Saidi H,et al.,2008.Eureka:a framework for enabling static malware analysis.Proc 13th Symp on Research in Computer Security,p.481-500.https://doi.org/10.1007/978-3-540-88313-5_31
    Sheen S,Anitha R,Natarajan V,2015.Android based malware detection using a multifeature collaborative decision fusion approach.Neurocomputing,151:905-912.https://doi.org/10.1016/j.neucom.2014.10.004
    Skylot,2015.Jadx.https://github.com/skylot/jadx
    Stein G,Chen B,Wu AS,et al.,2005.Decision tree classifier for network intrusion detection with GA-based feature selection.Proc 43rd Annual Southeast Regional Conf,p.136-141.https://doi.org/10.1145/1167253.1167288
    Suarez-Tangil G,Tapiador JE,Peris-Lopez P,et al.,2014.Dendroid:a text mining approach to analyzing and classifying code structures in Android malware families.Expert Syst Appl,41(4):1104-1117.https://doi.org/10.1016/j.eswa.2013.07.106
    Talha KA,Alper DI,Aydin C,2015.Apk auditor:permission-based Android malware detection system.Dig Invest,13:1-14.https://doi.org/10.1016/j.diin.2015.01.001
    Thomas P,2015.Google’s Android operating system dominates the smartphone market.http://finance.yahoo.com/news/google-Android-operating-system-dominates-170640913.html[Accessed on June 11,2016].
    Tropp JA,2004.Greed is good:algorithmic results for sparse approximation.IEEE Trans Inform Theory,50(10):2231-2242.https://doi.org/10.1109/TIT.2004.834793
    Walenstein A,Deshotels L,Lakhotia A,2012.Program structure-based feature selection for Android malware analysis.Proc 4th Int Conf on Security and Privacy in Mobile Information and Communication Systems,p.51-52.https://doi.org/10.1007/978-3-642-33392-7_5
    Williams G,2010.ARFF data.http://datamining.togaware.com/survivor/ARFF_Data0.html[Accessed on Sept.10,2015].
    Wu DJ,Mao CH,Wei TE,et al.,2012.Droidmat:Android malware detection through manifest and API calls tracing.Proc 7th Asia Joint Conf on Information Security,p.62-69.https://doi.org/10.1109/Asia JCIS.2012.18
    Yang ZM,Yang M,2012.Leak Miner:detect information leakage on Android with static taint analysis.Proc 3rd World Congress on Software Engineering,p.101-104.https://doi.org/10.1109/WCSE.2012.26
    Yerima SY,Sezer S,Mc Williams G,et al.,2013.A new Android malware detection approach using Bayesian classification.Proc IEEE 27th Int Conf on Advanced Information Networking and Applications,p.121-128.https://doi.org/10.1109/AINA.2013.88
    Yerima SY,Sezer S,Mc Williams G,2014a.Analysis of Bayesian classification-based approaches for Android malware detection.IET Inform Secur,8(1):25-36.https://doi.org/10.1049/iet-ifs.2013.0095
    Yerima SY,Sezer S,Muttik I,2014b.Android malware detection using parallel machine learning classifiers.Proc8th Int Conf on Next Generation Mobile Apps,Services and Technologies,p.37-42.https://doi.org/10.1109/NGMAST.2014.23
    Yerima SY,Sezer S,Muttik I,2015.High accuracy Android malware detection using ensemble learning.IET Inform Secur,9(6):313-320.https://doi.org/10.1049/iet-ifs.2014.0099
    Yu L,Pan ZL,Liu JJ,et al.,2013.Android malware detection technology based on improved Bayesian classification.Proc 23rd Int Conf on Instrumentation,Measurement,Computer,Communication and Control,p.1338-1341.https://doi.org/10.1109/IMCCC.2013.297
    Zhang LS,Niu Y,Wu X,et al.,2013.A3:automatic analysis of Android malware.Proc 1st Int Workshop on Cloud Computing and Information Security,p.89-93.https://doi.org/10.2991/ccis-13.2013.22
    Zhang T,2009.On the consistency of feature selection using greedy least squares regression.J Mach Learn Res,10:555-568.
    Zhou W,Zhou YJ,Jiang XX,et al.,2012.Detecting repackaged smartphone applications in third-party Android marketplaces.Proc 2nd ACM Conf on Data and Application Security and Privacy,p.317-326.https://doi.org/10.1145/2133601.2133640
    Zhou W,Zhou YJ,Grace M,et al.,2013.Fast,scalable detection of“Piggybacked”mobile applications.Proc 2nd ACM Conf on Data and Application Security and Privacy,p.185-196.https://doi.org/10.1145/2435349.2435377
    Zia T,Akhter MP,Abbas Q,2015.Comparative study of feature selection approaches for Urdu text categorization.Malays J Comput Sci,28(2):93-109.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700