Semantic adaptive microaggregation of categorical microdata

详细信息	查看全文 \| 推荐本文 \|

作者：Sergio Martí ; nez ; ^{sergio.martinezl@urv.cat} ; [Author Vitae] ; David Sá ; nchez ; [Author Vitae] ; Aida Valls ; [Author Vitae]
关键词：Privacy protection ; Anonymity ; Microaggregation ; MDAV ; Ontologies ; Semantic similarity
刊名：Computers and Security
出版年：2012
期刊代码：38_01674048
类别：cp
出版时间：July, 2012
卷：31
期：5
页码：653-672
文件大小：1321 K

摘要

In the context of Statistical Disclosure Control, microaggregation is a privacy-preserving method aimed to mask sensitive microdata prior to publication. It iteratively creates clusters of, at least, k elements, and replaces them by their prototype so that they become k-indistinguishable (anonymous). This data transformation produces a loss of information with regards to the original dataset which affects the utility of masked data, so, the aim of microaggregation algorithms is to find the partition that minimises the information loss while ensuring a certain level of privacy. Most microaggregation methods, such as the MDAV algorithm, which is the focus of this paper, have been designed for numerical data. Extending them to support non-numerical (categorical) attributes is not straightforward because of the limitations on defining appropriate aggregation operators. Concretely, related works focused on the MDAV algorithm propose grouping data into groups with constrained size (or even fixed) and/or incorporate a basic categorical treatment of non-numerical data. This approach affects negatively the utility of the protected dataset because neither the distributional characteristics of data nor their underlying semantics are properly considered. In this paper, we propose a set of modifications to the MDAV algorithm focused on categorical microdata. Our approach has been evaluated and compared with related works when protecting real datasets with textual attribute values. Results show that our method produces masked datasets that better minimises the information loss resulting from the data transformation.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700