对混合属性数据表可行的差分隐私保护方法

英文篇名：Differential privacy protection method for mixed data
作者：丁永善 ; 李立新
英文作者：Ding Yongshan;Li Lixin;The 3th College,Information Engineering University;
关键词：混合属性 ; 聚类 ; 差分隐私 ; 敏感度 ; 隐私保护
英文关键词：mixed data;;clustering;;differential privacy;;sensitivity;;privacy protection
中文刊名：JSYJ
英文刊名：Application Research of Computers
机构：信息工程大学三院;
出版日期：2018-02-08 17:54
出版单位：计算机应用研究
年：2019
期：v.36;No.328
基金：国家重点研发计划项目(2016YFB0501900)
语种：中文;
页：JSYJ201902044
页数：4
CN：02
ISSN：51-1196/TP
分类号：201-204

摘要

为加强隐私保护和提高数据可用性,提出一种可对混合属性数据表执行差分隐私的数据保护方法。该方法首先采用ICMD(insensitive clustering for mixed data)聚类算法对数据集进行聚类匿名,然后在此基础上进行ε-差分隐私保护。ICMD聚类算法对数据表中的分类属性和数值属性采用不同方法计算距离和质心,并引入全序函数以满足执行差分隐私的要求。通过聚类,实现了将查询敏感度由单条数据向组数据的分化,降低了信息损失和信息披露的风险。最后实验结果表明了该方法的有效性。
To enhance privacy protection and improve data availability,this paper proposed a differential privacy data protection method ICMD-DP for mixed data. ICMD-DP performed differential privacy on the results of ICMD(insensitive clustering method for mixed data). To satisfy the requirement of maintaining differential privacy,ICMD used different methods to calculate the distance and centroid of categorical and numerical attributes and introduced the total order function. The combination of clustering and differential privacy realizes the differentiation of query sensitivity from single record to group record. At the meanwhile,it reduced the risk of information loss and information disclosure. Finally,this paper gave experiments to illustrate the effectiveness of the method.

引文

[1]曹春萍,郑夏.社交网络中隐私保护的匿名模型研究[J].小型微型计算机系统,2016,37(8):1821-1825.(Cao Chunping,Zheng Xia. Research of anonymity model for privacy-preserving in social network[J]. Journal of Chinese Computer Systems,2016,37(8):1821-1825.)
    [2]刘晓迁,李千目.基于聚类匿名化的差分隐私保护数据发布方法[J].通信学报,2016,37(5):125-129.(Liu Xiaoqian,Li Qianmu.Differentially private data release based on clustering anonymization[J]. Journal on Communications,2016,37(5):125-129.)
    [3]刘向宇,王斌,杨晓春.社会网络数据发布隐私保护技术综述[J].软件学报,2014,25(3):576-590.(Liu Xiangyu,Wang Bin,Yang Xiaochun. Survey on privacy preserving techniques for publishing social network data[J]. Journal of Software,2014,25(3):576-590.)
    [4]李洪成,吴晓平,陈燕. MapReduce框架下支持差分隐私保护的Kmeans聚类方法[J].通信学报,2016,37(2):124-130.(Li Hongcheng,Wu Xiaoping,Chen Yan. K-means clustering method preserving differential privacy in MapReduce framework[J]. Journal on Communications,2016,37(2):124-130.)
    [5] Dwork C. Differential privacy[C]//International Colloquium on Automata,Languages,and Programming. Berlin:Springer,2006:1-12.
    [6] Wang Ding,He Debiao,Wang Ping,et al. Anonymous two-factor authentication in distributed systems:certain goals are beyond attainment[J]. IEEE Trans on Dependable&Secure Computing,2015,12(4):428-442.
    [7] Domingo-Ferrer J,Sánchez D,Hajian S. Database privacy[M]. Berlin:Springer International Publishing,2015.
    [8]龚卫华,兰雪锋,裴小兵,等.基于K-度匿名的社会网络隐私保护方法[J].电子学报,2016,44(6):1437-1444.(Gong Weihua,Lan Xuefeng,Pei Xiaobing,et al. Privacy preservation method based on k-degree anonymity in social networks[J]. Acta Electronica Sinica,2016,44(6):1437-1444.)
    [9]姜火文,曾国荪,马海英.面向表数据发布隐私保护的贪心聚类匿名方法[J].软件学报,2017,28(2):341-351.(Jiang Huowen,Zeng Guosun,Ma Haiying. Greedy clustering-anonymity method for privacy preservation of table data-publishing[J]. Journal of Software,2017,28(2):341-351.)
    [10]杨高明,杨静,张健沛.聚类的(α,k)-匿名数据发布[J].电子学报,2011,39(8):1941-1946.(Yang Gaoming,Yang Jing,Zhang Jianpei. Achieving(α,k)-anonymity via clustering in data publishing[J]. Acta Electronica Sinica,2011,39(8):1941-1946.)
    [11]黄石平,顾金媛.一种基于(p+,α)-敏感k-匿名的增强隐私保护模型[J].计算机应用研究,2014,31(11):3465-3468.(Huang Shiping,Gu Jinyuan,New based on(p+,α)sensitive k-anonymity enhanced privacy protection model[J]. Application Research of Computer,2014,31(11):3465-3468.)
    [12]李杨,温雯,谢光强.差分隐私保护研究综述[J].计算机应用研究,2012,29(9):3201-3205,3211.(Li Yang,Wen Wen,Xie Guangqiang. Survey of research on differential privacy[J]. Application Research of Computers,2012,29(9):3201-3205,3211.)
    [13] Torra V. Microaggregation for categorical variables:a median based approach[C]//Proc of Privacy in Statistical Databases. Berlin:Springer,2004:162-174.
    [14]赵兴旺,梁吉业.一种基于信息熵的混合数据属性加权聚类算法[J].计算机研究与发展,2016,53(5):1018-1028.(Zhao Xingwang, Liang Jiye. An attribute weighted clustering algorithm for mixed data based on information entropy[J]. Journal of Computer Research&Development,2016,53(5):1018-1028.)
    [15]李杨,郝志峰,温雯,等.差分隐私保护K-means聚类方法研究[J].计算机科学,2013,40(3):287-290.(Li Yang,Hao Zhifeng,Wen Wen,et al. Research on differential privacy preserving K-means clustering[J]. Computer Science,2013,40(3):287-290.)
    [16]张啸剑,孟小峰.面向数据发布和分析的差分隐私保护[J].计算机学报,2014,37(4):927-949.(Zhang Xiaojian,Meng Xiaofeng.Differential privacy in data publication and analysis[J]. Chinese Journal of Computers,2014,37(4):927-949.)
    [17]熊平,朱天清,王晓峰.差分隐私保护及其应用[J].计算机学报,2014,37(1):101-122.(Xiong Ping,Zhu Tianqing,Wang Xiaofeng. A survey on differential privacy and applications[J]. Chinese Journal of Computers,2014,37(1):101-122.)
    [18]夏赞珠.微数据发布中的隐私保护匿名化算法研究[D].金华:浙江师范大学,2011.(Xia Zanzhu. Research on microdata anonymity algorithms for privacy preservation data publishing[D]. Jinhua:ZheJiang Normal University,2011)
    [19]张慧哲,王坚.基于初始聚类中心选取的改进FCM聚类算法[J].计算机科学,2009,36(6):206-209.(Zhang Huizhe,Wang Jian.Improved fuzzy C means clustering algorithm based on selecting initial clustering centers[J]. Computer Science,2009,36(6):206-209.)
    [20]Soria C J,Domingo F J,Nchez D,et al. Enhancing data utility in differential privacy via microaggregation-based k-anonymity[J].VLDB Journal,2014,23(5):771-794.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700