在数据挖掘中保护隐私信息的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
数据挖掘是当今社会最为重要的知识发现工具,它在为人们揭示出数据中的隐藏规律并创造出财富的同时,也对各类数据有着大量的需求。随着互联网的出现和发展,对所需数据的收集、交换和发布的过程正变得越来越便利。然而,这些丰富的数据资源中也同时包含着大量的个人隐私、商业情报和政府机密。更为令人担忧的是,在这些数据的实际使用过程中,特别是在挖掘过程中,大量的信息却能被不加限制的肆意利用,个人隐私和机密信息的泄露严重影响了人们的日常生活甚至社会的稳定。于是,数据挖掘过程中随手可得的海量信息也就使得人们对滥用隐私的忧虑在挖掘工具的运用上得到了集中的反映。
     面对在数据挖掘中保护隐私的迫切要求,传统的保护方法却难以胜任,因为它们在保护敏感信息的同时,也妨碍了数据中知识的获取。针对数据挖掘中的隐私保护和知识获取这一对棘手的矛盾,我们研究和提出了一系列变换原始数据的过程、协议和方法,阻止了挖掘过程的参与者对隐私信息直接或间接的获取,同时也使得挖掘算法能够从转换后的数据之中获得原始数据包含的信息和知识。大量仿真实验的测试结果,以及与现有方法的对比成绩也验证了我们方法的有效性。由此,我们不但消除了传统挖掘过程中存在的隐私泄露风险,也使得挖掘过程仍然可以取得准确的结果。我们将本文的创新点和主要工作概括如下:
     1.提出了隐私信息由数据关联构成的本质,并同时提出了两种保护隐私的策略。通过研究现有隐私保护模型中的不同数据对象,我们发现无论何种数据属性都不能准确的表示出数据集合中所包含的隐私信息。通过进一步的例证、理论分析和比较,我们提出了隐私信息的本质属性:数据间的关联,并由此提出了两类保护隐私的策略:分解隐私信息和转换隐私信息,将它们作为隐私保护研究的指导思想。同时,我们也详细介绍了隐私保护的原因、意义及其模型的应用范围和场景。
     2.提出了利用随机化技术来分解隐私信息的方法,并提出了平衡隐私保护和知识获取这对矛盾的可调节机制,同时也消除了先验知识对隐私的威胁。我们在发布数据集合的问题中,结合分解隐私信息的策略,提出了一种利用随机化技术来保护隐私的方法。该方法利用原始数据的分布信息,随机选取部分原始数值进行转换,与匿名化和多样化隐私保护模型相比,我们的方法不仅大幅提高了使用者对原始数据的不确定程度,而且还能够保持数据中的大部分有用知识。同时,针对用户掌握的先验知识可能会造成的隐私泄露,我们提供了一种平衡隐私保护和挖掘准确性的可调节方法。
     3.提出了转换隐私信息的数据变换协议和数据整合方法,在恶意合谋的情况下实现了隐私的保护,并提出了按需定制隐私保护程度的方法。我们结合转换隐私信息的策略,为每一位数据拥有者提出了转换其原始数据的方式和传输数据的协议,同时也为挖掘者提供了整合不同数据源的方法。我们的转换方法和协议都基于数据矩阵的变换,变换方式的正交性质在半诚实的计算环境中完美的避免了隐私保护和准确挖掘之间的矛盾;而在恶意合谋的情况下,我们的随机转换方式成功的将隐私泄露的风险控制在有限的范围内。另外,数据集合的不同属性在实际使用中通常拥有不同的重要程度,因此我们也实现了对隐私保护程度的定制方法,使得数据拥有者可以按照实际的需要,灵活的保护不同的属性。
     4.提出了能够适应大规模参与者的可扩展隐私保护方法,有效的实现了隐私保护、准确挖掘和可扩展性这三者之间的平衡,同时也进一步提出了适用于高维数据集合的保护方法。可扩展性问题一直是隐私保护研究所面临的挑战。我们量化分析了数据挖掘的参与者数量对隐私保护和准确挖掘所带来的不同影响。并提出了一个能够适应大规模数据提供者的原始数据转换方法,使得隐私保护方法的性能独立于参与者数量的变化。同时,我们也研究了干扰量的独立性对隐私保护的影响,并由此提出了一个能够灵活适应不同数据维度规模的隐私保护方法。
The recent development of networking and storage technologies make it more and moreconvenient to collect, process or publish large volumes of data which also contains greatamount of personal privacy, business secrets and classified information. When the data isobtained, especially during the mining process, most of it can be used without any restriction.As a result, once the sensitive part is disclosed, it will seriously invade our privacy, disturbour normal life or even threaten the security of our society. Data mining, as one of the mostpowerful technology for knowledge discovery, reveals to us the hidden information and datapatterns from the normal data. Although it brings us knowledge and profits, there are severeproblems in its way of dealing with data. The concerns over data privacy increase extremelysince anyone accessible to the mining process can obtain the original data records, whichfurther leads to a high risk of data misuse.
     Therefore, in the recent years, a number of techniques have been proposed to solve theseproblems. In our research, we aim at providing a privacy preserving way of data mining bytransforming the original data sets before the mining process. We’ve also developed severalnovel transformation techniques, so that we can still get accurate mining results while theprivacy is well protected. We conclude our main contributions as following:
     1. We’ve proposed the essence of data privacy and two strategies for protection. In ourresearch, we analyzed most of the current privacy preserving methods, in which thestructure of the privacy objects are discussed in detail. We found that few of theirdefinitions can accurately describe the essence of data privacy, which makes it difficultfor the corresponding methods to provide a comprehensive protection. Based on thisunderstanding, we redefined data privacy by using data associations which are muchmore close to the actual concept of privacy in our normal life. We also proposed twokinds of strategies to protect the new privacy. Also, at the beginning of the thesis, weintroduced in detail the background knowledge of privacy protection and its field ofapplication.
     2. We’ve proposed a novel method of randomized anonymization to decompose the dataprivacy. Moreover, we’ve also proposed a mechanism to compromise between the level of accuracy and privacy, so that the threats from the priori knowledge are elimi-nated. In the scenario of data publishing, we proposed a method of data randomizationby applying our first strategy. It randomly replaces the data in each record by usingthe distribution of the original data. By comparing with the famous k-anonymizationtechniques, our method not only offers a much higher level of privacy protection, butalso maintains the useful knowledge in the original data set. Furthermore, the usermay use his priori knowledge to infer the sensitive information which he is not al-lowed to know. We also developed a method to counteract the threats from these kindsof knowledge in the problem of data publishing. While the method brings more un-certainties on the inference of original values, it also provides a mechanism to balancebetween the privacy and accuracy.
     3. We’ve proposed protocols of data transmission and data integration to transform dataprivacy, so that the threats from malicious adversaries are counteracted. Moreover,we’ve also implemented customized privacy. By applying the second strategy, we pre-sented an efficient clustering method for distributed multi-party data sets using theorthogonal transformation and perturbation techniques. The miner, while receivingthe perturbed data, can still obtain accurate clustering results. This method protectsdata privacy not only in the semi-honest situation, but also in the presence of collu-sion. Moreover, each attribute in a data set usually involves a certain level of privacyconcerns. It is necessary to provide the data owner with a mechanism to customize theperturbation of his own data. We implemented the customized privacy, so that eachvariable in the data set can be perturbed according to its own importance which isspecified by the owner.
     4. We’ve proposed an extendible privacy preserving method which adapts to differentnumber of participants. Moreover, we’ve also proposed a method to generate an inde-pendent perturbation. One of the main technical challenges for privacy preserving datamining is to make its algorithms adaptable to participants while still keeping the pri-vacy and accuracy guarantees. We analyzed the in?uence on the accuracy and privacyprotection when the participants increase in the normal method. And we also pro-posed an improved method to solve the problem with a large number of participants.Moreover, we also proved the importance of independent perturbation, and proposeda method adaptive to large data dimensions.
引文
[1]陈传璋,金福临,朱学炎,欧阳光中.数学分析.高等教育出版社, 1983.
    [2]郭宇红,童云海,唐世渭,杨冬青.数据库中的知识隐藏.软件学报, 18(11):2782–2799, 2007.
    [3]丁楠,潘有能.数据挖掘中的隐私保护:法律与技术.理论与探索, 31(7):772–775, 2007.
    [4]王红梅,曾沅,赵政.分布的缺失数据中保护隐私的贝叶斯网络学习.计算机工程, 34(1):14–16, 2008.
    [5]宋宝莉,覃征.分布式环境下关联规则的安全挖掘算法.计算机工程, 32(21):35–37, 2006.
    [6]张鹏,唐世渭.朴素贝叶斯分类中的隐私保护方法研究.计算机学报, 30(8):1267–1276, 2007.
    [7]吕品,于文兵.隐私保护―分布式挖掘中的改进型评价函数.武汉理工大学学报, 30(6):140–142, 2008.
    [8] C. C. Aggarwal. On k-anonymity and the curse of dimensionality. In Proceedings of the 31stInternational Conference on Very Large Data Bases, pages 901–909, Trondheim, Norway, August2005.
    [9] C. C. Aggarwal and P. S. Yu. A general survey of privacy-preserving data mining models andalgorithms. In Privacy-Preserving Data Mining, volume 34 of Advances in Database Systems,pages 11–52. Springer US, 2008.
    [10] C. C. Aggarwal and P. S. Yu. A survey of randomization methods for privacy-preserving datamining. In Privacy-Preserving Data Mining, volume 34 of Advances in Database Systems, pages137–156. Springer US, 2008.
    [11] C.C. Aggarwal and P.S. Yu. A condensation approach to privacy preserving data mining. In Proc.of the 9th International Conference on Extending Database Technology, pages 183–199, 2004.
    [12] G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu. Achievinganonymity via clustering. In Proc. of the 25th ACM SIGMOD-SIGACT-SIGART Symposium onPrinciples of Database Systems, pages 153–162, 2006.
    [13] D. Agrawal and C.C. Aggarwal. On the design and quantification of privacy preserving data miningalgorithms. In Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principlesof Database Systems, pages 247–255, Santa Barbara, California, May 2001.
    [14] R. Agrawal and R. Strikant. Privacy-preserving data mining. In Proceedings of the 2000 ACMSIGMOD International Conference on Management of Data, pages 439–450, Dallas, Texas, May2000.
    [15] S. Agrawal, V. Krishnan, and J. R. Haritsa. On addressing efficiency concerns in privacy pre-serving data mining. In Proc. of 9th International Conference on Database Systems for AdvancedApplications, pages 113–124, 2004.
    [16] A. Amiri. Dare to share: Protecting sensitive knowledge with data sanitization. Decision SupportSystems, 43(1):181–191, 2007.
    [17] A. Asuncion and D.J. Newman. Uci machine learning repository. University of California, Irvine,School of Information and Computer Sciences, 2007.
    [18] M. Atallah, E. Bertino, A. Elmagarmid, M. Ibrahim, and V. Verykios. Disclosure limitation ofsensitive rules. In Proc. of the 1999 Workshop on Knowledge and Data Engineering Exchange,page 45, 1999.
    [19] R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In Proc. of the 21thInternational Conference on Data Engineering, pages 217–228, 2005.
    [20] J. Bethencourt, D. Song, and B. Waters. New constructions and practical applications for privatestream searching. In Proc. of the 2006 IEEE Symposium on Security and Privacy, pages 132–139,2006.
    [21] Claudio Bettini, Xiaoyang Sean Wang, and Sushil Jajodia. The role of quasi-identifiers in k-anonymity revisited. CoRR, abs/cs/0611035, 2006.
    [22] J. Burridge. Information preserving statistical obfuscation. Statistics and Computing, 13(4):321–327, 2003.
    [23] J. Byun, T. Li, E. Bertino, N. Li, and Y. Sohn. Privacy preserving incremental data dissemination.Technical Report 2007-07, Purdue University, 2007.
    [24] Dwork C., McSherry F., Nissim K., and Smith A. Calibrating noise to sensitivity in private dataanalysis. In Proceedings of the 3rd Theory of Cryptography Conference, pages 265–284, NewYork, 2006.
    [25] Michael Carlson and Mickael Salabasis. A data swapping technique for generating synthetic sam-ples: A method for disclosure control. Research in Official Statistics, 5:35–64, 2002.
    [26] L. Chang and I. S. Moskowitz. An integrated framework for database privacy protection. In Proc.of the IFIP Workshop on Database Security, pages 161–172, 2000.
    [27] L. Chang and I. S. Moskowitz. A study of inference problems in distributed databases. In Proc. ofIFIP Data Security and Applications, pages 229–243, 2002.
    [28] K. Chen and L. Liu. Privacy preserving data classification with rotation perturbation. In Proc. ofthe 5th IEEE International Conference on Data Mining, pages 589–592, 2005.
    [29] K. Chen and L. Liu. A survey of multiplicative perturbation for privacy-preserving data mining.In Privacy-Preserving Data Mining, volume 34 of Advances in Database Systems, pages 157–181.Springer US, 2008.
    [30] Xia Chen, Maria E. Orlowska, and Xue Li. A new framework of privacy preserving data sharing. InProceedings of 4th IEEE International Workshop on Privacy and Security Aspects of Data Mining,pages 47–56, Brighton, UK, 2004.
    [31] F. Chin. Security problems on inference control for sum, max and min queries. Journal of theACM, 33(3):451–464, 1986.
    [32] V. Ciriani, S. Vimercati, S. Foresti, and P. Samarati. k-anonymous data mining: A survey. InPrivacy-Preserving Data Mining, volume 34 of Advances in Database Systems, pages 105–136.Springer US, 2008.
    [33] C. Clifton. Using sample size to limit exposure to data mining. Journal of Computer Security,8(4):281–307, 2000.
    [34] C. Clifton, M. Kantarcioglu, and J. Vaidya. Defining privacy for data mining. In Proc. of the Na-tional Science Foundation Workshop on Next Generation Data Mining, pages 126–133, Marriott,Inner Harbor, Baltimore, November 2002.
    [35] C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M.Y. Zhu. Tools for privacy preserving dis-tributed data mining. ACM SIGKDD Explorations Newsletter, 4(2):28–34, 2002.
    [36] C. Clifton and J. Vaidya. Privacy-preserving data mining: Why, how, and when. IEEE Securityand Privacy, 2(6):19–27, 2004.
    [37] G. Crises. Additive noise for microdata privacy protection in statistical databases. Technical ReportCRIREP-04-001, Dept. of Computer Engineering and Mathematics, Rovira i Virgili University ofTarragona, 2004.
    [38] Dobkin D., Jones A., and Lipton R. Secure databases: Protection against user in?uence. ACMTransactions on Databases Systems, 4(1), 1979.
    [39] T. Dalenius and S. P. Reiss. Data-swapping: A technique for disclosure control. Journal of Statis-tical Planning and Inference, 6:73–85, 1982.
    [40] Tore Dalenius and Steven P. Reiss. Data-swapping: A technique for disclosure control (extendedabstract). In Proceedings of the Section on Survey Research Methods, pages 191–194, Washington,DC, 1978.
    [41] J. Domingo-Ferrer. A survey of inference control methods for privacy-preserving data mining.In Privacy-Preserving Data Mining, volume 34 of Advances in Database Systems, pages 53–80.Springer US, 2008.
    [42] J. Domingo-Ferrer and V. Torra. Ordinal, continuous and heterogeneous k-anonymity throughmicroaggregation. Data Mining and Knowledge Discovery, 11(2):195–212, 2005.
    [43] W. Du and M. J. Attalah. Secure multi-problem computation problems and their applications: Areview and open problems. In Proc. of New Security Paradigms Workshop, pages 11–20, 2001.
    [44] W. Du and Z. Zhan. Using randomized response techniques for privacy-preserving data mining.In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery andData Mining, pages 505–510, Washington, D.C., August 2003.
    [45] V. Estivill-Castro and L. Brankovic. Data swapping: Balancing privacy against precision in miningfor logic rules. In Proc. of the 1st International Conference on Data Warehousing and KnowledgeDiscovery, pages 389–398, 1999.
    [46] A. Evfimievski. Randomization in privacy preserving data mining. ACM SIGKDD ExplorationsNewsletter, 4(2):43–48, 2002.
    [47] A. Evfimievski. Privacy preserving information sharing. PhD thesis, Cornell University, 2004.
    [48] A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving datamining. In Proceedings of the 22th ACM SIGMOD-SIGACT-SIGART Symposium on Principles ofDatabase Systems, pages 211–222, San Diego, California, June 2003.
    [49] A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy preserving mining of associationrules. Information Systems, 29(4):343–364, 2004.
    [50] Alexandre Evfimievski and Tyrone Grandison. Privacy preserving data mining. In Viviana E.Ferraggine, Jorge H. Doorn, and Laura C. Rivero, editors, To appear in Encyclopedia of DatabaseTechnologies and Applications. 2007.
    [51] C. Farkas and S. Jajodia. The inference problem: a survey. ACM SIGKDD Explorations Newsletter,4(2):6–11, 2002.
    [52] B. C. M. Fung, K. Wang, and P. S. Yu. Anonymizing classification data for privacy preservation.IEEE Transactions on Knowledge and Data Engineering, 15(5):711–725, 2007.
    [53] W. Ge, W. Wang, H. Zhou, and B. Shi. Privacy preserving classification mining. Journal ofComputer Research and Development, 43(1):39–45, 2006.
    [54] B. Gilburdk, A. Schuster, and R. Wolff. k-ttp: a new privacy model for large-scale distributed en-vironments. In Proc. of the 10th ACM SIGKDD international conference on Knowledge discoveryand data mining, pages 563–568, 2004.
    [55] P. Giudici. Applied Data Mining: Statistical Methods for Business and Industry. John Wiley &Sons Ltd, 2003.
    [56] B. Goethals, S. Laur, H. Lipmaa, and T. Mielikainen. On private scalar product computation forprivacy-preserving data mining. In Proc. of the 7th International Conference on Information Secu-rity and Cryptology, pages 104–120, 2004.
    [57] O. Goldreich. Foundations of Cryptography: Basic Applications, volume 2. Cambridge UniversityPress, 2004.
    [58] O. Goldreich, S. Micali, and A. Wigderson. How to play any mental game - a completeness theoremfor protocols with honest majority. In Proc. of the 19th Annual Symposium on the Theory ofComputing, pages 218–229, 1987.
    [59] S. Gomatam, A. F. Karr, and A. P. Sanil. Data swapping as a decision problem. Journal of OfficialStatistics, 21(4):635–655, 2004.
    [60] Shanti Gomatam, Alan F. Karr, Chunhua”Charlie”Liu, and Ashish P. Sanil. Data swapping: Arisk-utility framework and web service implementation. In Proceedings of the 2003 annual nationalconference on Digital government research, pages 1–4, 2003.
    [61] Shanti Gomatam, Alan F. Karr, and Ashish P. Sanil. Data swapping as a decision problem. Journalof Official Statistics, 21(4):635–655, 2005.
    [62] J. M. Gouweleeuw, P. Kooiman, L.C.R.J. Willenborg, and P. P. Wolf. Post randomisation forstatistical disclosure control: Theory and implementation. Journal of Official Statistics, 14(4):463–478, 1998.
    [63] J. M. Gouweleeuw, P. Kooiman, L.C.R.J. Willenborg, and P. P. Wolf. Post randomisation forstatistical disclosure control: Theory and implementation. Journal of Official Statistics, 14(4):463–478, 1998.
    [64] B. Greenberg. Rank swapping for masking ordinal microdata. US Bureau of the Census, 1987.
    [65] S. Guo and X. Wu. On the use of spectral filtering for privacy preserving data mining. In Proc. ofthe 2006 ACM symposium on Applied computing, pages 622–626, 2006.
    [66] J. Hale and S. Shenoi. Catalytic inference analysis: Detecting inference threat due to knowledgediscovery. In Proc. of the 1997 IEEE Symposium on Security and Privacy, pages 188–199, 1997.
    [67] M. Halkidi, Y. Batistakis, and M. Vazirgiannis. Cluster validity methods: part i. ACM SIGMODRecord, 31(2):40–45, 2002.
    [68] J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2001.
    [69] D. J. Hand, H. Mannila, and P. Smyth. Principles of Data Mining. MIT Press, 2001.
    [70] S. Hettich and S. D. Bay. The uci kdd archive [http://kdd.ics.uci.edu]. Irvine, CA: University ofCalifornia, Department of Information and Computer Science, 1999.
    [71] T. Hinke. Inference and aggregation detection in database management systems. In Proc. of theSecurity and Privacy Conference, 1988.
    [72] A. A. Hintoglu, A. Inan, Y. Saygin, and M. Keskinoz. Suppressing data sets to prevent discoveryof association rules. In Proc. of the 5th IEEE International Conference on Data Mining, pages645–648, 2005.
    [73] R.A. Horn and C.R. Johnson. Matrix Analysis. Cambridge University Press, 1985.
    [74] Z. Huang, W. Du, and B. Chen. Deriving private information from randomized data. In Proc. ofthe 2005 ACM SIGMOD international conference on Management of data, pages 37–48, 2005.
    [75] A. Hundpool and L. Willenborg. Mu-argus and tau-argus: Software for statistical disclosure con-trol. In Proc. of the 3rd International Seminar on Statistical Confidentiality, 1996.
    [76] M. Z. Islam and L. Brankovic. A framework for privacy preserving classification in data mining. InProc. of the 2nd workshop on Australasian information security, Data Mining and Web Intelligence,and Software Internationalisation, pages 163–168, 2004.
    [77] W. Jiang and C. Clifton. Privacy-preserving distributed k-anonymity. In Proc. of the 19th AnnualIFIP WG 11.3 Working Conference on Data and Applications Security, pages 166–177, 2005.
    [78] G. Kantarcioglu, J. Jin, and C. Clifton. When do data mining results violate privacy? In Proc. ofthe 10th ACM SIGKDD international conference on Knowledge discovery and data mining, pages599–604, 2004.
    [79] M. Kantarcioglu and C. Clifton. Privacy-preserving distributed mining of association rules onhorizontally partitioned data. IEEE Transactions on Knowlege and Data Engineering, 16(9):1026–1037, 2004.
    [80] H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar. On the privacy preserving properties of randomdata perturbation techniques. In Proc. of the 3rd IEEE International Conference on Data Mining,page 99, 2003.
    [81] K. Kenthapadi, N. Mishra, and K. Nissim. Simulatable auditing. In Proc. of the 24th ACMSIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 118–127, 2005.
    [82] D. Kifer and J. Gehrke. Injecting utility into anonymized datasets. In Proceedings of the 2006 ACMSIGMOD International Conference on Management of Data, pages 217–228, Chicago, Illinois,June 2006.
    [83] N. Koudas, D. Srivastava, T. Yu, and Q. Zhang. Aggregate query answering on anonymized ta-bles. In Proceedings of the 23rd International Conference on Data Engineering, pages 116–125,Istanbul, Turkey, April 2007.
    [84] S. Laur, H. Lipmaa, and T. Mielikainen. Private itemset support counting. In Proceedings of the7th international conference on Information and communications security, pages 97–111, Beijing,China, 2005.
    [85] K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: efficient fulldomain k-anonymity. InProceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pages49–60, Baltimore, Maryland, June 2005.
    [86] K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. InProceedings of the 22nd International Conference on Data Engineering, page 25, Atlanta, Georgia,April 2006.
    [87] D. Li, M. Shen, F. Zhang, Z. Yu, and J. Hu. A thesaurus of math problems:algebra. ShanghaiLexicographical Publishing House, 1985.
    [88] J. Li, H. Wang, H. Jin, and J. Yong. Current developments of k-anonymous data releasing. InProceedings of the National e-Health Privacy and Security Symposium, pages 109–121, Brisbane,Australia, October 2006.
    [89] J. Li, R. C. Wong, A. W. Fu, and P. Jian. Achieving k-anonymity by clustering in attribute hierarchi-cal structures. In Proc. of the 8th International Conference on Data Warehousing and KnowledgeDiscovery, pages 405–416, 2006.
    [90] N. Li, T. Li, and S. Venkatasubramanian. t-closeness: privacy beyond k-anonymity and l-diversity.In Proceedings of the 23rd International Conference on Data Engineering, pages 106–115, Istan-bul, Turkey, April 2007.
    [91] Y. Lindell and B. Pinkas. Privacy preserving data mining. In Proceedings of the 20th AnnualInternational Cryptology Conference on Advances in Cryptology, pages 36–54, Santa Barbara,California, August 2000.
    [92] K. Liu. Multiplicative data perturbation for privacy preserving data mining. PhD thesis, Universityof Maryland, 2007.
    [93] K. Liu, H. Kargupta, and J. Ryan. Random projection-based multiplicative data perturbation forprivacy preserving distributed data mining. IEEE Transactions on Knowlege and Data Engineering,18(1):92–106, 2006.
    [94] G. Loukides and J. Shao. Towards balancing data usefulness and privacy protection in k-anonymisation. In Proc. of the 6th IEEE International Conference on Computer and InformationTechnology, 2006.
    [95] V.S. lyengar. Transforming data to satisfy privacy constraints. In Proc. of the 8th ACM SIGKDDinternational conference on Knowledge discovery and data mining, pages 279–288, 2002.
    [96] J. Ma. Learning from perturbed data for privacy-preserving data mining. PhD thesis, School ofElectrical Engineering and Computer Science, Washington State University, 2006.
    [97] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: privacy beyondk-anonymity. In Proceedings of the 22th International Conference on Data Engineering, page 24,Los Alamitos, California, 2006.
    [98] B. Malin. Trail re-identification and unlinkability in distributed databases. PhD thesis, Institute forSoftware Research, International School of Computer Science, Carnegie Mellon University, 2006.
    [99] S. Merugu and J. Ghosh. A privacy-sensitive approach to distributed clustering. Pattern Recogni-tion Letters, 26(4):399–410, 2005.
    [100] A. Meyerson and R. Williams. On the complexity of optimal k-anonymity. In Proceedings ofthe 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages223–228, Paris, France, June 2004.
    [101] Richard A. Moore. Controlled data-swapping techniques for masking public use microdata sets.In Statistical Research Division Report Series RR 96-04. U.S. Bureau of the Census, Washington,DC., 1996.
    [102] K. Muralidhar, R. Parsa, and R. Sarathy. A general additive data perturbation method for databasesecurity. Management Science, 45(10):1399–1415, 1999.
    [103] K. Muralidhar and R. Sarathy. Security of random data perturbation methods. ACM Transactionson Database Systems, 24(4):487–493, 1999.
    [104] K. Muralidhar and R. Sarathy. A theoretical basis for perturbation methods. Statistics and Com-puting, 13(4):329–335, 2003.
    [105] K. Muralidhar and R. Sarathy. An enhanced data perturbation approach for small data sets. Deci-sion Sciences, 36(3):513–529, 2005.
    [106] K. Muralidhar and R. Sarathy, R. Parsa. An improved security requirement for data perturbationwith implications for e-commerce. Decision Sciences, 32:683–698, 2001.
    [107] Krishnamurty Muralidhar and Rathindra Sarathy. Data shuf?ing―a new masking approach fornumerical data. Management Science, 52(5):658–670, 2006.
    [108] S. Nabar, K. Kenthapadi, N. Mishra, and R. Motwani. A survey of query auditing techniques fordata privacy. In Privacy-Preserving Data Mining, volume 34 of Advances in Database Systems,pages 415–431. Springer US, 2008.
    [109] M. Naor and B. Pinkas. Oblivious transfer and polynomial evaluation. In Proc. of the 31st annualACM symposium on Theory of computing, pages 245–254, 1999.
    [110] Juggapong Natwichai, Xue Li, and Maria E. Orlowska. A reconstruction-based algorithm forclassification rules hiding. In Proceedings of the 17th Australasian Database Conference, pages49– 58, Hobart, Australia, 2007.
    [111] M.E. Nergiz and C. Clifton. Thoughts on k-anonymization. In Proc. of the 22th InternationalConference on Data Engineering Workshops, page 96, 2006.
    [112] A. Ohrn and L. Ohno-Machado. Using boolean reasoning to anonymize databases. ArtificialIntelligence in Medicine, 15:235–254, 1999.
    [113] S. Oliveira. Data transformation for privacy-preserving data mining. PhD thesis, Department ofComputing Science, University of Alberta, 2005.
    [114] S. Oliveira and O. Zaiane. Privacy preserving clustering by data transformation. In Proc. of the18th Brazilian Symposium on Databases, pages 304–318, 2003.
    [115] S. R. M. Oliveira and O. R. Zaiane. Foundations for an access control model for privacy preserva-tion in multi-relational association rule mining. In Proc. of the IEEE international conference onPrivacy, security and data mining, volume 14, 2002.
    [116] S.R.M. Oliveira and O.R. Zaiane. A unified framework for protecting sensitive association rulesin business collaboration. International Journal of Business Intelligence and Data, 1(3):247–287,2006.
    [117] R. Ostrovsky and W. E. Skeith III. Private searching on streaming data. In Proc. of the 25th AnnualInternational Cryptology Conference, pages 223–240, 2005.
    [118] B. Pinkas. Cryptographic techniques for privacy-preserving data mining. ACM SIGKDD Explo-rations Newsletter, 4(2):12–19, 2002.
    [119] S. P. Reiss. Practical data-swapping: The first steps. In Proceedings of the 1980 IEEE Symposiumon Security and Privacy, pages 38–43, Oakland, California, April 1980.
    [120] S. Rizvi and J. Haritsa. Maintaining data privacy in association rule mining. In Proceedings of the28th Conference on Very Large Data Bases, pages 682–693, Hong Kong, China, August 2002.
    [121] P. Samarati. Protecting respondents identities in microdata release. IEEE Transactions on Knowl-edge and Data Engineering, 12(6):1010–1027, 2001.
    [122] P. Samarati and L. Sweeney. Generalizing data to provide anonymity when disclosing informa-tion (abstract). In Proc. of the 7th ACM SIGACT-SIGMOD-SIGART Symposium on Principles ofDatabase Systems, page 188, 1998.
    [123] A.P. Sanil, A.F. Karr, X. Lin, and J.P. Reiter. Privacy preserving regression modelling via dis-tributed computation. In Proceedings of the 10th ACM SIGKDD international conference onKnowledge discovery and data mining, pages 677–682, 2004.
    [124] R. Sarathy, K. Muralidhar, and R. Parsa. Perturbing nonnormal confidential attributes: The copulaapproach. Management Science, 48(12):1613–1627, 2002.
    [125] A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules inlarge databases. In Proc. of the 21th International Conference on Very Large Data Bases, pages432–444, 1995.
    [126] Y. Saygin, V. S. Verykios, and A. K. Elmagarmid. Privacy preserving association rule mining. InProc. of the 12th International Workshop on Research Issues in Data Engineering: EngineeringE-Commerce/E-Business Systems, pages 151–158, 2002.
    [127] G.W. Stewart. The efficient generation of random orthogonal matrices with an application to con-dition estimators. SIAM Journal on Numerical Analysis, 17(3):403–409, 1980.
    [128] L. Sweeney. Computational disclosure control: theory and practice. PhD thesis, MassachusettsInstitute of Technology, 2001.
    [129] L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. In-ternational Journal of Uncertainty, 10(5):571–588, 2002.
    [130] L. Sweeney. k-anonymity: a model for protecting privacy. International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems, 10(5):557–570, 2002.
    [131] Akimichi Takemura. Local recoding and record swapping by maximum weight matching for dis-closure control of microdata sets. Journal of Official Statistics, 18(2):275–289, 2002.
    [132] B. Thuraisingham. Multilevel security for relational database systems augmented by an inferenceengine. Computer and Security, 6:250–266, 1987.
    [133] B. Thuraisingham. Privacy-preserving data mining: Developments and directions. Journal ofDatabase Management, 16(1):75–87, 2005.
    [134] J. Vaidya and C. Clifton. Privacy preserving association rule mining in vertically partitioned data.In Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and DataMining, pages 639–644, 2002.
    [135] J. Vaidya and C. Clifton. Privacy-preserving k-means clustering over vertically partitioned data. InProc. of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining,pages 206–215, 2003.
    [136] V. S. Verykios and A. Gkoulalas-Divanis. A survey of association rule hiding methods for privacy.In Privacy-Preserving Data Mining, volume 34 of Advances in Database Systems, pages 267–289.Springer US, 2008.
    [137] V.S. Verykios, E. Bertino, I.N. Fovino, L.P. Provenza, Y. Saygin, and Y. Theodoridis. State-of-the-art in privacy preserving data mining. ACM SIGMOD Record, 33(1):50–57, 2004.
    [138] V.S. Verykios, A.K. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni. Association rule hiding.IEEE Transactions on Knowledge and Data Engineering, 16(4):434–447, 2004.
    [139] S. Vinterbo, L. Ohno-Machado, and S. Dreiseitl. Hiding information by cell suppression. In Proc.of the American Medical Informatics Association Annual Symposium, pages 726–730, 2001.
    [140] K. Wang, Fung B., and G. Dong. Integrating private databases for data analysis. In Proc. of theIEEE International Conference on Intelligence and Security Informatics, pages 171–182, 2005.
    [141] K. Wang and B. C. M. Fung. Anonymizing sequential releases. In Proc. of the 12th ACM SIGKDDinternational conference on Knowledge discovery and data mining, pages 414–423, 2006.
    [142] K. Wang, B. C. M. Fung, and P. S. Yu. Handicapping attacker’s confidence: an alternative tok-anonymization. Knowledge and Information Systems, 11(3):345–368, 2007.
    [143] K. Wang, P.S. Yu, and S. Chakraborty. Bottom-up generalization a data mining solution to privacyprotection. In Proc. of the 4th IEEE International Conference on Data Mining, pages 249–256,2004.
    [144] S.L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Jour-nal of the American Statistical Association, 60:63–69, 1965.
    [145] R. L. Wilson and P. A. Rosen. Protecting data through perturbation techniques the impact onknowledge discovery in databases. Journal of Database Management, 14(2):14–26, 2003.
    [146] W. E. Winkler. Using simulated annealing for k-anonymity. Technical Report Statistics 2002-07,U.S. Bureau of the Census, 2002.
    [147] D. Woodruff and J. Staddon. Private inference control. In Proc. of the 11th ACM conference onComputer and communications security, pages 188–197, 2004.
    [148] R. Wright and Z. Yang. Privacy-preserving bayesian network structure computation on distributedheterogeneous data. In Proc. of the 10th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, pages 713–718, 2004.
    [149] Y. Xia, Y. Yang, and Y. Chi. Mining association rules with non-uniform privacy concerns. In Pro-ceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and KnowledgeDiscovery, pages 27–34, Paris, France, June 2004.
    [150] X. Xiao and Y. Tao. Anatomy: simple and effective privacy preservation. In Proceedings of the32nd International Conference on Very Large Data Bases, pages 139–150, Seoul, Korea, Septem-ber 2006.
    [151] X. Xiao and Y. Tao. Personalized privacy preservation. In Proc. of the 2006 ACM SIGMODinternational conference on Management of data, pages 229–240, 2006.
    [152] X. Xiao and Y. Tao. m-invariance: Towards privacy preserving re-publication of dynamic datasets.In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages689– 700, Beijing, China, 2007.
    [153] J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A. W. Fu. Utility-based anonymizationfor privacypreservation with less information loss. In Proc. of the 2nd International Workshop on Utilty-BasedData Mining, 2006.
    [154] Y. Xu, K. Wang, A. W. Fu, R. She, and J. Pei. Privacy-preserving data stream classification. InPrivacy-Preserving Data Mining, volume 34 of Advances in Database Systems, pages 487–510.Springer US, 2008.
    [155] A.C. Yao. How to generate and exchange secrets. In Proc. of the 27th Annual Symposium onFoundations of Computer Science, pages 162–167, 1986.
    [156] C. Yao, L. Wang, S. X. Wang, and S. Jajodia. Indistinguishability: the other aspect of privacy. InProceedings of the 3rd VLDB Workshop on Secure Data Management 2006, pages 1–17, Seoul,Korea, September 2006.
    [157] H. Yu, X. Jiang, and J. Vaidya. Privacy-preserving svm using nonlinear kernels on horizontallypartitioned data. In Proceedings of the 2006 ACM symposium on Applied computing, pages 603–610, 2006.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700