用户名: 密码: 验证码:
基于等差隐私预算分配的大数据决策树算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Big Data Decision Tree Algorithm Based on Equal-arrival Privacy Budget Allocation
  • 作者:尚涛 ; 赵铮 ; 舒王伟 ; 刘建伟
  • 英文作者:SHANG Tao;ZHAO Zheng;SHU Wangwei;LIU Jianwei;School of Cyber Sci.and Technol.,Beihang Univ.;School of Electronic and Info.Eng.,Beihang Univ.;
  • 关键词:分类 ; 决策树 ; 差分隐私 ; 大数据
  • 英文关键词:classification;;decision tree;;differential privacy;;big data
  • 中文刊名:SCLH
  • 英文刊名:Advanced Engineering Sciences
  • 机构:北京航空航天大学网络空间安全学院;北京航空航天大学电子信息工程学院;
  • 出版日期:2019-03-13 10:54
  • 出版单位:工程科学与技术
  • 年:2019
  • 期:v.51
  • 基金:国家重点研发计划资助项目(2016YFC1000307)
  • 语种:中文;
  • 页:SCLH201902017
  • 页数:7
  • CN:02
  • ISSN:51-1773/TB
  • 分类号:134-140
摘要
针对传统差分隐私保护方案以剩余隐私预算的一半逐层分配,即等比分配隐私预算,被应用于决策树时,随着决策树高度的增加,分配至顶层的隐私预算过小,随机噪声过大,分类准确率受到影响的问题,作者提出以差分隐私保护结合主流决策树C4.5分类方法为基本思路,依据决策树高度等差分配隐私预算的方案。差分隐私中的Laplace机制和指数机制确保决策树分类的安全性。作者利用大数据Hadoop平台的MapReduce框架,主程序进行MapReduce参数配置以及外层循环。在执行到每一个节点时,主程序将数据集属性的统计任务交给Mapper类,Reducer类接收Mapper类的统计结果并利用Laplace机制添加随机噪声,加噪结果返回主程序中作为计算信息增益率的参数。主程序利用指数机制选择最佳细分方案,递归过程直至样本数为0时停止。实验采用UCI数据库的car数据集进行测试,在不同隐私预算下将等比分配与等差分配两种方案得到的分类结果准确率进行对比。实验结果表明:本文算法在可接受的分类准确率降低的情况下满足差分隐私保护;与传统隐私预算分配相比,本文算法在相同隐私预算下提高了分类准确率;对于car数据集,本文算法在隐私预算为0.7或0.8时可较好兼顾数据集的安全性和有效性。因此,在一定程度上依据决策树高度等差分配隐私预算的方案可改善分类准确率,可实际应用于决策树分类算法。
        In order to address the problem that the traditional differential privacy preservation scheme is distributed layer by layer by half of the remaining privacy budget, i.e., equal ratio allocation of privacy budget, and when it is applied to the decision tree, the privacy budget allocated to the top layer is too small, the random noise is too large, and the classification accuracy is affected with the increase of the height of decision tree, a scheme of equal-arrival privacy budget allocation based on decision tree height difference was proposed, which combined differential privacy protection preservation with mainstream decision tree C4.5 classification algorithm. The Laplace mechanism and the exponential mechanism can ensure the security of the decision tree. This scheme utilized the MapReduce framework of the big data Hadoop platform, and the main program performed parameter configuration of MapReduce and outer loop. When executed to each node, the main program passed the statistical task of the dataset attribute to the Mapper class. The Reducer class received the statistical result of the Mapper class. The Laplace mechanism was used to add random noise. The noise-added result was returned to the main program for calculation the information gain rate. The main program used the exponential mechanism to select the best subdivision scheme. The recursion process stopped until the number of samples was 0. The experiment used the car data set of UCI database to test, and compared classification results of two schemes under different privacy budgets. Experiment showed that this scheme can satisfy differential privacy with acceptable classification accuracy reduction, and improve the classification accuracy under the same privacy budget compared to the traditional privacy budget allocation. For the car data set, the algorithm can balance the security and effectiveness of the data set when the privacy budget was 0.7 or 0.8. Therefore, the scheme of equal-arrival privacy budget allocation based on decision tree height difference can improve classification accuracy to a certain extent. It can be practically applied to decision tree classification.
引文
[1]Cao Hua.Research on decision tree algorithm for privacy preserving[D].Lanzhou:Lanzhou University of Technology,2008.[曹华.保护隐私的决策树算法的研究[D].兰州:兰州理工大学,2008.]
    [2]Shao Minhui.A review of typical decision tree algorithm[J].Computer Knowledge and Technology,2018,14(8):175-177.[邵旻晖.决策树典型算法研究综述[J].电脑知识与技术,2018,14(8):175-177.]
    [3]Du Wenliang,Zhan Zhijun.Using randomized response techniques for privacy-preserving data mining[C]//Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining.New York:ACM,2003:505-510.
    [4]Sz?cs G.Random response forest for privacy-preserving classification[J].Journal of Computational Engineering,2013,2013(309):1-6.
    [5]Tai R K H,Ma J P K,Zhao Yongjun,et al.Privacy-preserving decision trees evaluation via linear functions[C]//Proceedings of the 22nd European Symposium on Research in Computer Security.Berlin:Springer,2017:494-512.
    [6]Dwork C.Differential privacy[C]//Proceedings of the 33rd International Colloquium on Automata,Languages and Programming.Berlin:Springer,2006:1-12.
    [7]Chen Yu,Li Lingjuan.A decision tree-based privacy pre-serving classification mining algorithm for data streams[J].Computer Technology and Development,2017,27(7):111-119 .[陈煜,李玲娟.一种基于决策树的隐私保护数据流分类算法[J].计算机技术与发展,2017,27(7):111-119.]
    [8]Dwork C.The promise of differential privacy:A tutorial on algorithmic techniques[C]//Proceedings of the 52nd Annual IEEE Symposium on Foundations of Computer Science.Washington:IEEE,2011:1-2.
    [9]朱明.数据挖掘导论[M].安徽:中国科学技术大学出版社,2012:44-69.
    [10]Gao Shang,Wang Changbao.Personal credit scoring based on decision tree C5.0 algorithm[C]//Proceedings of the 7th International Conference on Education,Management,Computer and Society.Pairs:Atlantis,2017:1729-1734.
    [11]Chen Yang,Yu Shoujian.Research on differential privaty for decision tree release technology[J].Computer and Modernization,2017,10(3):59-64.[陈杨,于守健.基于差分隐私的决策树发布技术研究[J].计算机与现代化,2017,10(3):59-64.]
    [12]Xiong Ping,Zhu Tianqing,Jin Dawei.Differential private data publishing algorithm for building decision tree[J].Application Research of Computers,2014,31(10):3108-3112.[熊平,朱天清,金大卫.一种面向决策树构建的差分隐私保护算法[J].计算机研究与应用,2014,31(10):3108-3112.]
    [13]McSherry F,Talwar K.Mechanism design via differential privacy[C]//Proceedings of the 48th Annual Symposium on Foundations of Computer Science.Washington:IEEE,2007:94-103.
    [14]Lam C,韩冀中.Hadoop实战[M].北京:人民邮电出版社,2011:17-134.
    [15]Rashid M M,Gondal I,Kamruzzaman J.Dependable large scale behavioral patterns mining from sensor data using Hadoop platform[J].Information Sciences,2017,379:128-145.
    [16]Zhu Tianqing,Xiong Ping,Xiang Yang,et al.An effective differentially private data releasing algorithm for decision tree[C]//Proceedings of the 12th IEEE International Conference on Trust Security and Privacy in Computing and Communications.Washington:IEEE,2013:388-395.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700