On-Average KL-Privacy and Its Equivalence to Generalization for Max-Entropy Mechanisms
详细信息    查看全文
  • 关键词:Differential privacy ; Generalization ; Stability ; Information theory ; Maximum entropy ; Statistical learning theory
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2016
  • 出版时间:2016
  • 年:2016
  • 卷:9867
  • 期:1
  • 页码:121-134
  • 全文大小:456 KB
  • 参考文献:1.Akaike, H.: Likelihood of a model and information criteria. J. Econometrics 16(1), 3–14 (1981)MathSciNet CrossRef MATH
    2.Altun, Y., Smola, A.J.: Unifying divergence minimization and statistical inference via convex duality. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 139–153. Springer, Heidelberg (2006)CrossRef
    3.Anderson, N.: “anonymized” data really isn’t and here’s why not (2009). http://​arstechnica.​com/​tech-policy/​2009/​09/​your-secrets-live-online-in-databases-of-ruin/​
    4.Barber, R.F., Duchi, J.C.: Privacy and statistical risk: Formalisms and minimax bounds. arXiv preprint arXiv:​1412.​4451 (2014)
    5.Bassily, R., Nissim, K., Smith, A., Steinke, T., Stemmer, U., Ullman, J.: Algorithmic stability for adaptive data analysis. arXiv preprint arXiv:​1511.​02513 (2015)
    6.Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)
    7.Bousquet, O., Elisseeff, A.: Stability and generalization. J. Mach. Learn. Res. 2, 499–526 (2002)MathSciNet MATH
    8.Duncan, G.T., Elliot, M., Salazar-González, J.J.: Statistical Confidentiality: Principle and Practice. Springer, New York (2011)CrossRef MATH
    9.Duncan, G.T., Fienberg, S.E., Krishnan, R., Padman, R., Roehrig, S.F.: Disclosure limitation methods and information loss for tabular data. In: Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 135–166 (2001)
    10.Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)CrossRef
    11.Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., Roth, A.: Generalization in adaptive data analysis and holdout reuse. In: Advances in Neural Information Processing Systems (NIPS 2015), pp. 2341–2349 (2015)
    12.Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., Roth, A.L.: Preserving statistical validity in adaptive data analysis. In: ACM on Symposium on Theory of Computing (STOC 2015), pp. 117–126. ACM (2015)
    13.Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves: privacy via distributed noise generation. In: Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 486–503. Springer, Heidelberg (2006)CrossRef
    14.Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)CrossRef
    15.Ebadi, H., Sands, D., Schneider, G.: Differential privacy: now it’s getting personal. In: ACM Symposium on Principles of Programming Languages, pp. 69–81. ACM (2015)
    16.Fienberg, S.E., Rinaldo, A., Yang, X.: Differential privacy and the risk-utility tradeoff for multi-dimensional contingency tables. In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 187–199. Springer, Heidelberg (2010)CrossRef
    17.Hall, R., Rinaldo, A., Wasserman, L.: Random differential privacy. arXiv preprint arXiv:​1112.​2680 (2011)
    18.Hardt, M., Ullman, J.: Preventing false discovery in interactive data analysis is hard. In: IEEE Symposium on Foundations of Computer Science (FOCS 2014), pp. 454–463. IEEE (2014)
    19.Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E.S., Spicer, K., De Wolf, P.P.: Statistical Disclosure Control. Wiley (2012)
    20.Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4), 620 (1957)MathSciNet CrossRef MATH
    21.Kearns, M., Ron, D.: Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Comput. 11(6), 1427–1453 (1999)CrossRef
    22.Liu, Z., Wang, Y.X., Smola, A.: Fast differentially private matrix factorization. In: ACM Conference on Recommender Systems (RecSys 2015), pp. 171–178. ACM (2015)
    23.McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: IEEE Symposium on Foundations of Computer Science (FOCS 2007), pp. 94–103 (2007)
    24.Mir, D.J.: Information-theoretic foundations of differential privacy. In: Garcia-Alfaro, J., Cuppens, F., Cuppens-Boulahia, N., Miri, A., Tawbi, N. (eds.) FPS 2012. LNCS, vol. 7743, pp. 374–381. Springer, Heidelberg (2013)CrossRef
    25.Mosteller, F., Tukey, J.W.: Data analysis, including statistics (1968)
    26.Mukherjee, S., Niyogi, P., Poggio, T., Rifkin, R.: Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv. Comput. Math. 25(1–3), 161–193 (2006)MathSciNet CrossRef MATH
    27.Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: IEEE Symposium on Security and Privacy, pp. 111–125. IEEE, September 2008
    28.Russo, D., Zou, J.: Controlling bias in adaptive data analysis using information theory. In: International Conference on Artificial Intelligence and Statistics (AISTATS 2016) (2016)
    29.Shalev-Shwartz, S., Shamir, O., Srebro, N., Sridharan, K.: Learnability, stability and uniform convergence. J. Mach. Learn. Res. 11, 2635–2670 (2010)MathSciNet MATH
    30.Steinke, T., Ullman, J.: Interactive fingerprinting codes and the hardness of preventing false discovery. arXiv preprint arXiv:​1410.​1228 (2014)
    31.Tishby, N., Pereira, F.C., Bialek, W.: The information bottleneck method. arXiv preprint arXiv:​physics/​0004057 (2000)
    32.Uhlerop, C., Slavković, A., Fienberg, S.E.: Privacy-preserving data sharing for genome-wide association studies. J. Priv. Confidentiality 5(1), 137 (2013)
    33.Van Erven, T., Harremoës, P.: Rényi divergence and kullback-leibler divergence. IEEE Trans. Inf. Theor. 60(7), 3797–3820 (2014)CrossRef
    34.Wang, Y.X., Fienberg, S.E., Smola, A.: Privacy for free: posterior sampling and stochastic gradient monte carlo. In: International Conference on Machine Learning (ICML 2015) (2015)
    35.Wang, Y.X., Lei, J., Fienberg, S.E.: Learning with differential privacy: stability, learnability and the sufficiency and necessity of erm principle. J. Mach. Learn. Res. (to appear, 2016)
    36.Wang, Y.X., Lei, J., Fienberg, S.E.: On-average kl-privacy and its equivalence to generalization for max-entropy mechanisms (2016). preprint http://​www.​cs.​cmu.​edu/​~yuxiangw/​publications.​html
    37.Yau, N.: Lessons from improperly anonymized taxi logs (2014). http://​flowingdata.​com/​2014/​06/​23/​lessons-from-improperly-anonymized-taxi-logs/​
    38.Yu, F., Fienberg, S.E., Slavković, A.B., Uhler, C.: Scalable privacy-preserving data sharing methodology for genome-wide association studies. J. Biomed. Inform. 50, 133–141 (2014)CrossRef
    39.Zhou, S., Lafferty, J., Wasserman, L.: Compressed and privacy-sensitive sparse regression. IEEE Trans. Inf. Theor. 55(2), 846–866 (2009)MathSciNet CrossRef
  • 作者单位:Yu-Xiang Wang (15) (16)
    Jing Lei (16)
    Stephen E. Fienberg (15) (16) (17)

    15. Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
    16. Department of Statistics, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
    17. Heinz College, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
  • 丛书名:Privacy in Statistical Databases
  • ISBN:978-3-319-45381-1
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
  • 卷排序:9867
文摘
We define On-Average KL-Privacy and present its properties and connections to differential privacy, generalization and information-theoretic quantities including max-information and mutual information. The new definition significantly weakens differential privacy, while preserving its minimal design features such as composition over small group and multiple queries as well as closeness to post-processing. Moreover, we show that On-Average KL-Privacy is equivalent to generalization for a large class of commonly-used tools in statistics and machine learning that samples from Gibbs distributions—a class of distributions that arises naturally from the maximum entropy principle. In addition, a byproduct of our analysis yields a lower bound for generalization error in terms of mutual information which reveals an interesting interplay with known upper bounds that use the same quantity.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700