基于局部学习的差分隐私集成特征选择算法

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

基于局部学习的差分隐私集成特征选择算法

详细信息查看全文 | 推荐本文 |

英文篇名：An Ensemble Feature Selection Algorithm with Differential Privacy Based on Local Learning
作者：刘中锋
英文作者：LIU Zhong-feng;School of Computer Science,Nanjing University of Posts and Telecommunications;
关键词：特征选择 ; 集成 ; 差分隐私 ; 隐私度 ; 敏感度
英文关键词：feature selection;;ensemble;;differential privacy;;privacy degree;;sensitivity
中文刊名：WJFZ
英文刊名：Computer Technology and Development
机构：南京邮电大学计算机学院、软件学院、网络空间安全学院;
出版日期：2018-05-28 11:06
出版单位：计算机技术与发展
年：2018
期：v.28;No.258
基金：国家自然科学基金(61603197,91646116);; 江苏省自然科学基金(BK20140885)
语种：中文;
页：WJFZ201810017
页数：4
CN：10
ISSN：61-1450/TP
分类号：86-89

摘要

面对海量数据,特征选择在数据挖掘和机器学习领域上通常是不可或缺的一步。目前,机器学习安全领域受到了越来越多的关注,尤其是隐私保护方面。然而,对于隐私保护的特征选择仍然是一个比较新的课题,特别是与集成学习相关的集成特征选择。差分隐私是一种有着严格理论基础的隐私保护方法,因此提出了一种基于局部学习的差分隐私集成特征选择算法。该算法的主要思想是基于一种输出干扰策略,即向输出结果添加噪声从而保护隐私,而且该噪声依赖于原始算法的隐私度和敏感度。除了严格的理论证明之外,也从实验中展现了算法的性能。实验采用KNN和SVM作为分类器,分别分析了隐私度和特征数量的影响。结果显示随着隐私度的降低,提高了隐私保护程度。
When confronting massive data,feature selection is usually a necessary step for data mining and machine learning. Currently,secure machine learning, especially in privacy preservation,has attracted much attention. However,f eature selection with privacy preservation is still a new issue, especially for feature selection related to ensemble learning. In this paper,we present a differentially private ensemble feature selection algorithm,o f which the basic idea is the output perturbation where the density of perturbation noise depends on the privacy degree and sensitivity of original feature selection algorithm. Besides the theoretical proof, the experimental results also demonstrated their high performance under certain privacy preservation degree. In the experiment,KNN and SVM are selected as classifiers and the privacy degree and the number of features are researched. The results show that the privacy preserving degree is better, along with the decline of privacy degree.

引文

[1]李云.稳定的特征选择研究[J].微型机与应用,2012,31(15):1-2.
    [2] WOZNICA A,NGUYEN P,KALOUSIS A. Model mining for robust feature selection[C]//Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining.Beijing,China:ACM,2012:913-921.
    [3] LI Yun,SI J,ZHOU Guojing,et al.FREL:a stable feature selection algorithm[J]. IEEE Transactions on Neural Networks and Learning Systems,2015,26(7):1388-1402.
    [4]蔡红云,田俊峰.云计算中的数据隐私保护研究[J].山东大学学报:理学版,2014,49(9):83-89.
    [5] CHAUDHURI K,MONTELEONI C,SARWATE A D.Differentially private empirical risk minimization[J]. Journal of Machine Learning Research,2011,12:1069-1109.
    [6] TAN Mingkui,TSANG I W,WANG Li.Minimax sparse logistic regression for very high-dimensional feature selection[J].IEEE Transactions on Neural Networks&Learning Systems,2013,24(10):1609-1622.
    [7] YANG J,LI Y. Differential privacy feature selection[C]//Proceedings of international joint conference on neural networks.Beijing,China:ACM,2014:4182-4189.
    [8] SAEYS Y,ABEEL T,PEER Y.Robust feature selection using ensemble feature selection techniques[C]//Proceedings of the European conference on machine learning and knowledge discovery in databases. Antwerp,Belgium:Springer,2008:313-325.
    [9] SUN Yijun,TODOROVIC S,GOODISON S. Local learning based feature selection for high dimensional data analysis[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(9):1610-1626.
    [10] DWORK C.Differential privacy[C]//Proceedings of the 33rd international conference on automata,languages and programming.Venice,Italy:Springer-Verlag,2006:1-12.
    [11] DWORK C,ROTH A.The algorithmic foundations of differential privacy[J].Foundations and Trends in Theoretical Computer Science,2014,9(3-4):211-407.
    [12] CRAMMER K,BACHRACH R G,NAVOT A,et al.Margin analysis of the LVQ algorithm[C]//Proceedings of advances in neural information processing systems.[s. l.]:[s. n.],2002:462-469.
    [13] NG A Y.Feature selection,l1 vs. l2 regularization,and rotational invariance[C]//Proceedings of international conference on machine learning.Banff,Alberta,Canada:ACM,2004.
    [14] LI Yun,YANG Jun,JI Wei. Local learning-based feature weighting with privacy preservation[J]. Neurocomputing,2016,174:1107-1115.
    [15] DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]//Proceedings of the third conference on theory of cryptography. New York,NY:Springer,2006:265-284.
    [16]熊平,朱天清,王晓峰,等.差分隐私保护及其应用[J].计算机学报,2014,37(1):101-122.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700