一种高效安全的去中心化数据共享模型
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:An Efficient and Secure Decentralizing Data Sharing Model
  • 作者:董祥千 ; 郭兵 ; 沈艳 ; 段旭良 ; 申云成 ; 张洪
  • 英文作者:DONG Xiang-Qian;GUO Bing;SHEN Yan;DUAN Xu-Liang;SHEN Yun-Cheng;ZHANG Hong;College of Computer Science,Sichuan University;Chengdu Neusoft University;College of Control Engineering,Chengdu University of Information Technology;
  • 关键词:数据共享 ; 区块链 ; 域索引 ; 安全多方计算 ; 差分隐私
  • 英文关键词:data sharing;;blockchain;;domain index;;secure multi-party computation;;differential privacy
  • 中文刊名:JSJX
  • 英文刊名:Chinese Journal of Computers
  • 机构:四川大学计算机学院;成都东软学院;成都信息工程大学控制工程学院;
  • 出版日期:2018-03-05 13:49
  • 出版单位:计算机学报
  • 年:2018
  • 期:v.41;No.425
  • 基金:国家自然科学基金(61332001,61772352,61472050);; 四川省科技计划项目(2014JY0257,2015GZ0103);; 成都市科技惠民技术研发项目(2014-HM01-00326-SF)资助~~
  • 语种:中文;
  • 页:JSJX201805004
  • 页数:16
  • CN:05
  • ISSN:11-1826/TP
  • 分类号:55-70
摘要
数据开放共享是推动数据相关产业发展的源动力,然而,现有的数据共享模型,如数据市场,数据提供方将数据上传至数据存储中心,数据需求方下载数据以实现分析.这种模型存在如下缺陷:(1)以关键字为基础的数据检索无法高效发现可连接数据集;(2)数据交易缺乏透明性,无法有效检测及防患交易参与方串谋等舞弊行为;(3)数据所有者失去数据的控制权、所有权,数据安全无法保障.为此,该文借助区块链技术建立一种全新的去中心化数据共享模型.首先从共享数据集中提取多层面元数据信息,通过各共识节点建立域索引,以解决可连接数据集的高效发现问题;其次,从交易记录格式及共识机制入手,建立基于区块链的数据交易,实现交易的透明性及防串谋等舞弊行为;最后,依据数据需求方的计算需求编写计算合约,借助安全多方计算及差分隐私技术保障数据所有者的计算和输出隐私.实验结果表明,该文提出的域索引机制在可接受的召回率范围内,连接数据集查准率平均提高22%.而以时间及交易区块数相结合的共识机制则能兼顾低交易频率与高交易频率双重需求.同时,与加密方式相比,在保证数据安全的前提下,该文提出的安全计算模型平均节省了近6秒的处理时间.
        Data opening and sharing is the source power for driving the development of data-related industries.However,the typical data sharing model available at present,e.g.,data market,in which data providers upload their data to a centralized repository and data demanders download their requested data to carry out analysis,has the following flaws:(1)As only considering the frequency of keyword in each dataset(or dataset name),the keyword-based dataset retrieval method,which widely used nowadays,cannot efficiently find the linkable datasets.(2)Being lack of transparency in the process of data transactions,the current data trading model does not take full account of detecting the transaction collusion or other frauds among the involved parties.(3)The data owners lose the power of controlling their own data,which causes no guarantee ofdata ownership and data security.We found out that these problems exposed in the process of data sharing could be attributed to three factors:linkable dataset discovery,data transaction management,computing security and output security.For the purpose of solving them efficiently and effectively,we proposed a novel blockchain-based decentralization data sharing model,which characterized by followings:(1)It was inspired by restoring data providers greater control over their own data by means of DataSpace(DS).(2)The computation or analysis was completed confidentially among the data providers,instead of in the data demanders,or in the third parties,as the latter two needed to download data into their own spaces which become the source of privacy leak.(3)It obtained computing datasets or tasks through domain indexing and interface mechanisms,and controlled user behavior and data flow by the blockchain technology.Concretely,in this paper,we first introduced the basic conception of the decentralized data sharing model based on the analysis of the traditional data sharing model.Then,we showed the hierarchical structure diagram of decentralized data sharing model,which included interface,transaction,index,and data layers.Finally,we analyzed the related technologies and implementation details of each layer respectively:In the interface layer,we obtained computing datasets through domain searching mechanism,and compiled the computation contract according to the requirements of the data demanders.In the index layer,we extracted multi-aspect metadata information from the shared dataset,and had the consensus nodes set up domain index to search linkable datasets efficiently.In the transaction layer,with the help of consensus mechanism,we implemented data transaction based on blockchain to achieve transparency and to prevent conspiracy.In the data layer,we introduced the computation contract,which assembled the secure multi-party computation and differential privacy,to ensure the computation and output privacy of the data providers.The experimental results show that the domain index mechanism proposed in this paper increases the average precision by 22% without substantially reducing the recall rate.And the modified consensus mechanism,which combines time and transaction block number,takes both low trading frequency and high trading frequency into account.At the same time,on the premise of ensuring data security,comparing with the encryption method,our method saves the processing time of nearly 6 s.
引文
[1]Guo Bing,Li Qiang,Duan Xu-Liang,et al.Personal data bank:A new mode of personal big data asset management and value-added services based on bank architecture.Chinese Journal of Computers,2017,40(1):126-143(in Chinese)(郭兵,李强,段旭良等.个人数据银行---一种基于银行架构的个人大数据资产管理与增值服务的新模式.计算机学报,2017,40(1):126-143)
    [2]Bogdanov D,Kamm L,Kubo B,et al.Students and taxes:Aprivacy-preserving study using secure computation.Proceedings on Privacy Enhancing Technologies,2016,(3):117-135
    [3]Bogdanov D.Sharemind:Programmable Secure Computations with Practical Applications[Ph.D.dissertation].University of Tartu,Estonia,2013
    [4]Talviste R.Applying Secure Multi-Party Computation in Practice[Ph.D.dissertation].University of Tartu,Estonia,2016
    [5]Schwab K,Marcus A,Oyola J O,et al.Personal data:The emergence of a new asset class.Geneva:Forum World Economic,Technical Report,2011
    [6]Zhang Wen-Li,Guo Bing,Shen Yan,et al.Computation offloading on intelligent mobile terminal.Chinese Journal of Computers,2016,39(5):1021-1038(in Chinese)(张文丽,郭兵,沈艳等.智能移动终端计算迁移研究.计算机学报,2016,39(5):1021-1038)
    [7]Franklin M,Halevy A,Maier D.From databases to dataspaces.ACM SIGMOD Record,2005,34(4):27-33
    [8]Zyskind G,Nathan O,Pentland A.Decentralizing privacy:Using blockchain to protect personal data//Proceedings of the IEEE Security and Privacy Workshops.California,USA,2015:180-184
    [9]Zhang H,Wen Y,Xie H,et al.Distributed Hash Table:Theory,Platforms and Applications.New York,USA:Springer-Verlag,2013
    [10]Rao B C,Zhu E.Searching Web data using MinHash LSH//Proceedings of the International Conference on Management of Data.San Francisco,USA,2016:2257-2258
    [11]Zamora J,Mendoza M,Allende H.Hashing-based clustering in high dimensional data.Expert Systems with Applications,2016,62:202-211
    [12]Andrychowicz M,Dziembowski S,Malinowski D,et al.Secure multiparty computations on bitcoin.Communications of the ACM,2016,59(4):76-84
    [13]Blunschi L,Dittrich J P,Girard O R,et al.A dataspace odyssey:The iMeMex personal dataspace management system(Demo)//Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research.Asilomar,USA,2007:114-119
    [14]Li Y,Hu C.Process materials scientific data for intelligent service using a dataspace model.Data Science Journal,2016,15(7):1-17
    [15]de Montjoye Y A,Shmueli E,Wang S S,Pentland A.openPDS:Protecting the privacy of metadata through SafeAnswers.PLoS One,2014,9(7):1-9
    [16]Reid F,Harrigan M.An analysis of anonymity in the bitcoin system.Security and Privacy in Social Networks.New York,USA:Springer,2013:1318-1326
    [17]Dennis R,Owen G.Rep on the block:A next generation reputation system based on the blockchain//Proceedings of the Internet Technology and Secured Transactions.Barcelona,Spain,2016:131-138
    [18]Yuan Yong,Wang Fei-Yue.Blockchain:The state of the art and future trends.Acta Automatica Sinica,2016,42(4):481-494(in Chinese)(袁勇,王飞跃.区块链技术发展现状与展望.自动化学报,2016,42(4):481-494)
    [19]Hazay C,Lindell Y.Efficient Secure Two-Party Protocols:Techniques and Constructions.Berlin Heidelberg,Germany:Springer,2010
    [20]Dwork C,Roth A.The algorithmic foundations of differential privacy.Foundations&Trends?in Theoretical Computer Science,2014,9(3):211-407
    [21]Rajaraman A,Ullman J D.Mining of Massive Datasets.Cambridge,UK:Cambridge University Press,2016
    [22]Shrivastava A,Li P.Asymmetric minwise hashing for indexing binary inner products and set containment//Proceedings of the International Conference on World Wide Web.Florence,Italy,2015:981-991
    [23]Kosub S.A note on the triangle inequality for the Jaccard distance.eprint arXiv:1612.02696,2016
    [24]Fernandez R C,Abedjan Z,Madden S,et al.Towards large-scale data discovery:Position paper//Proceedings of the International Workshop on Exploratory Search in Databases and the Web.CA,USA,2016:3-5
    [25]Halevy A,Korn F,Noy N F,et al.Goods:Organizing Google’s datasets//Proceedings of the International Conference on Management of Data.San Francisco,USA,2016:795-806
    [26]Ferrucci D A,Brown E W,Chu-Carroll J,et al.Building Watson:An overview of the DeepQA project.AI Magazine,2010,31(3):59-79
    [27]Litvak M,Last M.Graph-based keyword extraction for single-document summarization//Proceedings of the Mmies 08Workshop on Multi-Source Multilingual Information Extraction&Summarization.Manchester,UK,2008:17-24
    [28]Mihalcea R,Tarau P.TextRank:Bringing Order into Texts.UNT Scholarly Works,2004:404-411
    [29]Stoica I,Morris R,Karger D,et al.Chord:A scalable peerto-peer lookup service for Internet applications//Proceedings of the Conference on Applications,Technologies,Architectures,and Protocols for Computer Communications.Toronto,Canada,2001:149-160
    [30]Zhu E,Pu K Q,Pu K Q.LSH ensemble:Internet-scale domain search.Proceedings of the VLDB Endowment,2016,9(12):1185-1196
    [31]Huh S,Cho S,Kim S.Managing IoT devices using blockchain platform//Proceedings of the International Conference on Advanced Communication Technology.GW,Korea,2017:464-467
    [32]Christidis K,Devetsikiotis M.Blockchains and smart contracts for the Internet of Things.IEEE Access,2016,4:2292-2303
    [33]Mattila J,SepplT.Industrial blockchain platforms:An exercise in use case development in the energy industry.ETLA Working Papers,2016
    [34]Flajolet P,Fusy E,Gandouet O,et al.HyperLogLog:The analysis of a near-optimal cardinality estimation algorithm//Proceedings of the 2007International Conference on Analysis of Algorithms.Antibes and Nice,France,2007:127-146
    [35]Sweeney L.k-anonymity:A model for protecting privacy.International Journal on Uncertainty,Fuzziness and KnowledgeBased Systems,2002,10(5):557-570
    [36]Machanavajjhala A,Gehrke J,Kifer D,et al.l-diversity:Privacy beyond k-anonymity//Proceedings of the International Conference on Data Engineering.Atlanta,USA,2006:24-24
    [37]Liu Y,Cui J,Huang Z,et al.SK-LSH:An efficient index structure for approximate nearest neighbor search.Proceedings of the VLDB Endowment,2014,7(9):745-756
    [38]Moniz H,Neves N F,Correia M.Byzantine fault-tolerant consensus in wireless Ad Hoc networks.IEEE Transactions on Mobile Computing,2013,12(12):2441-2454
    [39]Ioannidis Y E.Query optimization.ACM Computing Surveys,1996,28(1):121-123
    [40]Kamm P,Laud L.Applications of Secure Multiparty Computation.Amsterdam,Netherlands:IOS Press,2015
    [41]Akter M,Hashem T.Computing aggregates over numeric data with personalized local differential privacy//Proceedings of the Australasian Conference on Information Security and Privacy.Auckland,New Zealand,2017:249-260
    [42]Kifer D.Privacy and the price of data//Proceedings of the30th Annual ACM/IEEE Symposium on Logic in Computer Science(LICS2015).Kyoto,Japan,2015:16-16
    [43]Laur S,Talviste R,Willemson J.From Oblivious AES to Efficient and Secure Database Join in the Multiparty Setting Applied Cryptography and Network Security.Berlin Heidelberg,Germany:Springer,2013:84-101
    [44]Laur S,Willemson J,Zhang B.Round-Efficient Oblivious Database Manipulation Information Security.Berlin Heidelberg,Germany:Springer,2011:262-277
    [45]Hamada K,Kikuchi R,Dai I,et al.Practically efficient multi-party sorting protocols from comparison sort algorithms//Proceedings of the International Conference on Information Security and Cryptology(ICISC 2012).Berlin Heidelberg,Germany:Springer,2012:202-216
    [46]Dan B,Laur S,Talviste R.A practical analysis of oblivious sorting algorithms for secure multi-party computation//Proceedings of the 19th Nordic Conference on Secure ITSystems(NordSec).Troms,Norway,2014:59-74
    [47]Baum C,Damgrd I,Orlandi C.Publicly auditable secure multi-party computation.Security and Cryptography for Networks,2014:175-196
    [48]Doerner J,Shelat A.Scaling ORAM for secure computation//Proceedings of the 24th ACM Conference on Computer and Communications Security.Dallas,USA,2017:523-535
    [49]Zahur S,Wang X,Raykova M,et al.Revisiting square-root ORAM:Efficient random access in multi-party computation//Proceedings of the 37th IEEE Symposium on Security and Privacy.San Jose,USA,2016:218-234
    [50]Gascón A,Schoppmann P,Balle B,et al.Privacy-preserving distributed linear regression on high-dimensional data.Privacy Enhancing Technologies Symposium,2017,(4):345-364
    [51]Doerner J,Evans D,Shelat A.Secure stable matching at scale//Proceedings of the ACM SIGSAC Conference on Computer and Communications Security.Vienna,Austria,2016:1602-1613
    [52]Zyskind G,Nathan O,Pentland A.Enigma:Decentralized Computation Platform with Guaranteed Privacy.White Paper,2015
    [53]Xiong Ping,Zhu Tian-Qing,Wang Xiao-Feng.A survey on differential privacy and applications.Chinese Journal of Computers,2014,37(1):101-122(in Chinese)(熊平,朱天清,王晓峰.差分隐私保护及其应用.计算机学报,2014,37(1):101-122)
    [54]Pettai M,Laud P.Combining differential privacy and secure multiparty computation//Proceedings of the 31st Annual Computer Security Applications Conference.Los Angeles,USA,2015:421-430
    [55]Nget R,Cao Y,Yoshikawa M.How to balance privacy and money through pricing mechanism in personal data market//Proceedings of the 2017 SIGIR Workshop on eCommerce.Tokyo,Japan,2017:52-62
    [56]Shen Y,Guo B,Shen Y,et al.A pricing model for big personal data.Tsinghua Science&Technology,2016,21(5):482-490
    [57]Wang H,Chen K,Xu D.A maturity model for blockchain adoption.Financial Innovation,2016,2(1):12-17
    [58]Yuan Y,Wang F Y.Towards blockchain-based intelligent transportation systems//Proceedings of the International Conference on Intelligent Transportation Systems.Leblon Rio de Janeiro,Brazil,2016:2663-2668
    (1)http://opendatahandbook.org/
    (2)https://www.data.gov/
    (1)http://www.moojnn.com
    (2)https://www.quandl.com/
    (3)https://sharemind.cyber.ee/
    (1)资源目录的编排可参照《政务信息资源目录编制指南》.https://www.gzdata.com.cn/c69/20170714/i1963.html
    (2)数据集生成属于自然语言处理范畴,实验中假设这些域信息是已知的,并只处理英文字符集.
    (1)https://github.com/AntShares/AntShares
    (2)A Byzantine Fault Tolerance Algorithm for Block Chains.http://docs.neo.org/zh-cn/node/consensus/whitepaper.html
    (1)Student performance data set.https://archive.ics.uci.edu/ml/datasets/Student+Performance
    (2)Relative location of ct slices on axial axis data set.https://archive.ics.uci.edu/ml/datasets/Relative+location+of+CT+slices+on+axial+axis
    (1)https://ekzhu.github.io/datasketch/lshensemble.html
    (2)https://github.com/ethereum/go-ethereum
    (3)https://github.com/samee/obliv-c(4)http://shcci.eastday.com/c/20160202/u1a9207794.html
    (1)https://open.taobao.com