用户名: 密码: 验证码:
融合特征降维和密度峰值的二进制协议数据帧聚类算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Clustering Algorithm for Binary Protocol Data Frames Combining Feature Dimensionality Reduction and Density Peaks Clustering
  • 作者:闫小勇 ; 李青
  • 英文作者:YAN Xiao-yong;LI Qing;School of Information System Engineering,Information Engineering University;
  • 关键词:协议识别 ; 二进制协议 ; 特征降维 ; 密度峰值 ; 帧聚类
  • 英文关键词:protocol identification;;binary protocol;;feature dimensionality reduction;;density peaks;;frames clustering
  • 中文刊名:XXWX
  • 英文刊名:Journal of Chinese Computer Systems
  • 机构:信息工程大学信息系统工程学院;
  • 出版日期:2018-12-11
  • 出版单位:小型微型计算机系统
  • 年:2018
  • 期:v.39
  • 语种:中文;
  • 页:XXWX201812019
  • 页数:7
  • CN:12
  • ISSN:21-1106/TP
  • 分类号:104-110
摘要
针对二进制协议会话流特征缺失和频繁模式难以提取的问题,通过采用特征降维和改进的密度峰值聚类算法,实现了无监督条件下以数据帧为颗粒度的二进制协议数据聚类.提出基于频繁项的特征降维算法,利用协议数据中存在的频繁项构造特征矢量表示原有数据帧,达到降维的目的;提出基于距离指数加权的密度峰值聚类算法自动选取聚类中心,有效提高了聚类中心和其它数据帧的区分度.通过在AIS、ARP、DNS、ICMP和SMB五种协议构成的三个数据集上进行测试,结果表明本文提出的算法对二进制协议数据帧具有较好的聚类效果.
        Aiming at the problem that session flow characteristics are missing and frequent patterns extracting is difficult for binary protocols,a clustering algorithm based on feature dimensionality reduction and improved density peaks clustering is proposed to achieve binary protocol data frames clustering under unsupervised condition. We propose feature dimensionality reduction based on frequent items,using the frequent items in protocol data to construct feature vectors to denote the original data frames. Meanwhile,we improve density peaks clustering based on distance index weighting. The improved density peaks clustering can select cluster centers automatically and enhance the distinction between cluster centers and other data frames effectively. We test our algorithm on three data sets consisting of AIS,ARP,DNS,ICMP and SMB. The experimental results show that our algorithm is effective on binary protocol data frames clustering.
引文
[1]Tao Si-yu,Yu Hong-yi,Li Qing.Bit-oriented format extraction approach for automatic binary protocol reverse engineering[J].Iet Communications,2016,10(6):709-716.
    [2]Liu Xing-bin,Yang Jian-hua,Xie Gao-gang,et al.Automated mining of packet signatures for traffic identification at application layer w ith apriori algorithm[J].Journal on Communications,2008,29(12):51-59.
    [3]Wang Yu,Yang Xiang,Yu Shun-zheng.Automatic application signature construction from unknow n traffic[C].Proceedings of the24th IEEE International Conference on Computer Society.2010:1115-1120.
    [4]Huang Xiao-yan,Chen Xing-yuan,Zhu Ning,et al.Binary protocol identification based on w eighted byte entropy vector[J].Application Research of Computers,2015,32(2):493-497.
    [5]Luo Jian-zhen,Yu Shun-zheng,Cai Jun.Capturing Uncertainty information and categorical characteristics for netw ork payload grouping in protocol reverse engineering[J].M athematical Problems in Engineering,2015,2015:1-9.
    [6]Yue Yang,Liu Yuan,Zhang Chun-rui,et al.Cluster system for binary data frame[C].International Conference on Information and Netw ork Security,2015.
    [7]He Ling,Cai Yi-chao,Yang Zheng.Survey of clustering algorithms for high dimensional data[J].Application Research of Computers,2010,27(1):23-26.
    [8]Hu Jie.Survey on feature dimension reduction for high dimensional data[J].Application Research of Computers,2008,25(9):2601-2606.
    [9]Sun Ji-gui,Liu Jie,Zhao Lian-yu.Clustering algorithms research[J].Journal of Software,2008,19(1):48-61.
    [10]Wang Yi-peng,Yun Xiao-chun,Shafiq M.-Zubair,et al.A semantics aw are approach to automated reverse engineering unknow n protocols[C].Proceedings of 20th IEEE International Conference on Netw ork Protocols,2012:1-10.
    [11]Yun Xiao-chun,Wang Yi-peng,Zhang Yong-zheng,et al.A semantics aw are approach to the automated netw ork protocol identification[J].IEEE/ACM Transactions on Netw orking,2016,24(1):583-595.
    [12]Li Bai-chao,Yu Shun-zheng.Keyword mining for private protocols tunneled over w ebsocket[J].IEEE Communications Letters,2016,20(7):1337-1340.
    [13]Rodriguez A,Laio A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
    [2]刘兴彬,杨建华,谢高岗,等.基于Apriori算法的流量识别特征自动提取方法[J].通信学报,2008,29(12):51-59.
    [4]黄笑言,陈性元,祝宁,等.基于字节熵矢量加权指纹的二进制协议识别[J].计算机应用研究,2015,32(2):493-497.
    [7]贺玲,蔡益朝,杨征.高维数据聚类方法综述[J].计算机应用研究,2010,27(1):23-26.
    [8]胡洁.高维数据特征降维研究综述[J].计算机应用研究,2008,25(9):2601-2606.
    [9]孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008,19(1):48-61.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700