基于互联网大数据的脱敏分析技术研究

英文篇名：Data Masking Analysis Based on Internet Big Data
作者：周倩伊 ; 王亚民 ; 王闯
英文作者：Zhou Qianyi;Wang Yamin;Wang Chuang;School of Economics and Management, Xidian University;
关键词：数据脱敏 ; k-匿名模型 ; 取整划分
英文关键词：Data Masking;;k-anonymity;;Integer Division
中文刊名：XDTQ
英文刊名：Data Analysis and Knowledge Discovery
机构：西安电子科技大学经济与管理学院;
出版日期：2018-02-25
出版单位：数据分析与知识发现
年：2018
期：v.2;No.14
语种：中文;
页：XDTQ201802022
页数：6
CN：02
ISSN：10-1478/G2
分类号：62-67

摘要

【目的】基于现有的脱敏技术,改进匿名组的划分效果,得到较优的脱敏模型及算法。【方法】基于k-匿名技术,改进维度划分标准,以KD树作为存储结构,构造新算法。利用Python实现程序,比较所产生的匿名组数量、NCP百分比,验证算法的可行性与有效性。【结果】新算法能够使得脱敏后整个数据集所生成的匿名组个数达到最大。且NCP百分比低于同类算法。【局限】对于有某一属性离散程度显著的数据集,循环计算划分维度较为繁琐。【结论】新算法相比于传统算法增加了匿名组个数,相比于同类算法,信息损失较低。
[Objective] This paper aims to improve the classification results of anonymous groups and then obtain better data masking model and algorithm. [Methods] First, we modified the dimension judgment standards based on k-anonymity. Then, we used the KD tree as storage structure to construct a new algorithm. Third, we implemented the proposed algorithm with Python. Finally, we examined the feasibility and effectiveness of the new algorithm with the number of anonymous groups and the percentage of NCP. [Results] The new algorithm could maximize the number of anonymous groups generated by the whole dataset, while the percentage of NCP was lower than similar algorithms. [Limitations] For datasets with significant degree of dispersion, the dimension of the loop computation was cumbersome. [Conclusions] The proposed algorithm could improve the availability of the anonymous groups and reduce the data loss.

引文

[1]穆良,程良伦.基于k-匿名位置隐私保护的自适应学习模型[J].计算机工程与应用,2017,53(18):89-94,101.(Mu Liang,Cheng Lianglun.Adaptive Learning Model Based on K-anonymity Location Privacy Protection[J].Computer Engineering and Applications,2017,53(18):89-94,101.)
    [2]叶云,石聪聪,余勇,等.保护隐私的分布式朴素贝叶斯挖掘[J].应用科学学报,2017,35(1):1-10.(Ye Yun,Shi Congcong,Yu Yong,et al.Privacy-Preserving Distributed Naive Bayes Data Mining[J].Journal of Applied SciencesElectronics and Information Engineering,2017,35(1):1-10.)
    [3]王静,闫仁武,刘亚梅.多敏感属性K-匿名模型的实现[J].计算机与数字工程,2017,45(7):1368-1372.(Wang Jing,Yan Renwu,Liu Yamei.Implementation of K-anonymous Model with Multi-sensitive Attributes[J].Computer&Digital Engineering,2017,45(7):1368-1372.)
    [4]王良,王伟平,孟丹.FVS k-匿名:一种基于k-匿名的隐私保护方法[J].高技术通讯,2015,25(3):228-238.(Wang Liang,Wang Weiping,Meng Dan.FVS K-anonymity:An Anonymous Privacy Protection Method Based on K-anonymity[J].Chinese High Technology Letters,2015,25(3):228-238.)
    [5]郑路倩,韩建民,鲁剑锋,等.抵制时空位置点链接攻击的(k,?,l)-匿名模型[J].计算机科学与探索,2015,9(9):1108-1121.(Zheng Luqian,Han Jianmin,Lu Jianfeng,et al.(k,?,l)-Anonymity Model to Resist Spatio-Temporal Point Linkage Attack[J].Journal of Frontiers of Computer Science and Technology,2015,9(9):1108-1121.)
    [6]吴英杰.隐私保护数据发布:模型与算法[M].北京:清华大学出版社,2015:7-16.(Wu Yingjie.Privacy Preserving Data Publishing:Models and Algorithms[M].Beijing:Tsinghua University Press,2015:7-16.)
    [7]吴英杰,唐庆明,倪巍伟,等.基于取整划分函数的k匿名算法[J].软件学报,2012,23(8):2138-2148.(Wu Yingjie,Tang Qingming,Ni Weiwei,et al.Algorithm for k-Anonymity Based on Rounded Partition Function[J].Journal of Software,2012,23(8):2138-2148.)
    [8]Xu J,Wang W,Pei J,et al.Utility-Based Anonymization Using Local Recording[C]//Proceedings of ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining(SIGKDD).2006:785-790.
    [9]Ghinita G,Karras P,Kalnis P,et al.Fast Data Anonymization with Low Information Loss[C]//Proceedings of the 33rd International Conference on Very Large Data Bases,VLDBEndowment.2007:758-769.
    [10]陈天莹,陈剑锋.大数据环境下的智能数据脱敏系统[J].通信技术,2016,49(7):915-922.(Chen Tianying,Chen Jianfeng.Intelligent Data Masking System for Big Data Productive Environment[J].Communications Technology,2016,49(7):915-922.)

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700