Defending unknown attacks on cyber-physical systems by semi-supervised approach and available unlabeled data

详细信息查看全文

作者：Shamsul Huda ; ^a ; ^{shamsul.huda@deakin.edu.au" class="auth_mail" title="E-mail the corresponding author} ; ^{nn.bsh2010@gmail.com" class="auth_mail" title="E-mail the corresponding author} ; Suruz Miah^b ; ^{smiah@bradley.edu" class="auth_mail" title="E-mail the corresponding author} ; Mohammad Mehedi Hassan^d ; ^{mmhassan@ksu.edu.sa" class="auth_mail" title="E-mail the corresponding author} ; Rafiqul Islam^c ; ^{mislam@csu.edu.au" class="auth_mail" title="E-mail the corresponding author} ; John Yearwood^a ; ^{john.yearwood@deakin.edu.au" class="auth_mail" title="E-mail the corresponding author} ; Majed Alrubaian^d ; ^{mrubaian.c@ksu.edu.sa" class="auth_mail" title="E-mail the corresponding author} ; Ahmad Almogren^d ; ^{ahalmogren@ksu.edu.sa" class="auth_mail" title="E-mail the corresponding author}
关键词：Dynamic analysis ; Malware behavior selection ; API parameters ; String feature
刊名：Information Sciences
出版年：2017
出版时间：10 February 2017
年：2017
卷：379
期：Complete
页码：211-228
全文大小：2636 K

文摘

Cyber-physical systems (CPS) are used increasingly in modern industrial systems. These systems currently encounter a significant threat of malicious activities created by malicious software intent on exploiting the fact that the software of such industrial systems is integrated with hardware and network systems. Malicious codes dynamically and continuously change their internal structure and attack patterns using obfuscation techniques, such as polymorphism and metamorphism, in order to bypass and hide from conventional malware detection engines. This requires continuously updating the database of the malware detection engine, which requires periodic effort from manual experts. This could limit the real-time protection of CPS. In addition, this also makes preserving the availability and integrity of the services provided by CPS against malicious code challenging because there is a demand for the development of specialized malware detection techniques for CPS.

In this paper, we propose a semi-supervised approach that automatically integrates the knowledge about unknown malware from already available and cheap unlabeled data into the detection system. The novelty of the proposed approach is that it does not require expert effort to update the database of the detection engine. Instead, the dynamic changes in malware attack patterns are extracted by unsupervised clustering from already available unlabeled data. Then the extracted geometric information about the intrinsic attack characteristics of the clusters is integrated into the classification systems of the detection engine, which updates the detection system automatically. The proposed approach uses global K-means clustering with term-frequency (TF), inverse document frequency (IDF), and cosine similarity as a distance measure for extracting the cluster information and adding it to a support vector machine (SVM) classification system. The proposed approach has been tested extensively on a real malware data set for both static and dynamic malware features. The experiment results show that the proposed semi-supervised approach achieves higher accuracy over the existing supervised approaches for all classifiers. We note that the static feature-based semi-supervised approach can improve detection accuracy significantly. While applying the proposed semi-supervised approach with the run-time characteristics of dynamic feature analysis, the combined effect of dynamic analysis and the proposed approach further increases the detection accuracy of all classifiers by up to a 100% for the SVM and the random forest classifiers, thus exceeding the existing supervised approaches with similar features.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700