用户名: 密码: 验证码:
国内外大数据质量研究述评
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Review of Domestic and International Research on Big Data Quality
  • 作者:刘冰 ; 庞琳
  • 英文作者:Liu Bing;Pang Lin;Management School, Tianjin Normal University;
  • 关键词:大数据质量 ; 大数据质量评价 ; 大数据质量管理 ; 大数据质量应用
  • 英文关键词:big data quality;;big data quality evaluation;;big data quality management;;big data quality application
  • 中文刊名:QBXB
  • 英文刊名:Journal of the China Society for Scientific and Technical Information
  • 机构:天津师范大学管理学院;
  • 出版日期:2019-02-24
  • 出版单位:情报学报
  • 年:2019
  • 期:v.38
  • 基金:国家社会科学基金重点项目“基于用户与情境视域的网络学术信息资源评价研究”(14ATQ007)
  • 语种:中文;
  • 页:QBXB201902011
  • 页数:10
  • CN:02
  • ISSN:11-2257/G3
  • 分类号:111-120
摘要
作为前沿性研究领域,大数据质量研究是大数据研究的核心内容之一,也是各界关注的焦点问题。本文以国内外大数据质量研究文献为对象,从基本内涵、质量管理、质量评价、应用实践等角度对相关研究进行梳理与综述,分析国内外相关研究进展。研究发现,大数据质量诸方面研究均是以大数据特征为基础,以大数据质量基本属性为核心,与其应用目标及适用情境相结合,形成有别于常规数据质量理论的、具有复杂性和多维度的理论体系。同时发现,大数据质量本质研究、与技术环境和人文环境相结合研究、基于宏观视角的国家层面和战略层面研究等将是大数据质量研究领域未来的研究趋势与研究重点。
        As a frontier research field, big data quality research is one of the core contents of big data research; it is also the focus of attention from all walks of life. Based on the literature on big data quality, this paper uses synthesis methods to examine the progress of relevant domestic and international research in terms of its basic implications, quality management, quality evaluation, and application practice. The results show that the study of big data quality is based on big data characteristics, with the basic attributes of big data quality as the core, combined with its application goals and applicable scenarios. It finally forms a complex and multidimensional theoretical system that is different from the conventional data quality theory. At the same time, the results indicate that the study of the essence of big data quality, the combination of technical and human environment, and research on the national and strategic levels based on a macro perspective will be the future research trends and research focus of big data quality research.
引文
[1]Lohr S.The change of big data[N].New York Times,2012-02-11.
    [2]Laney D.3D data management:Controlling data volume,veloci‐ty and variety[J].META Group Research Note,2001,6:70.
    [3]Gantz J,Reinsel D.Extracting value from chaos[J].IDC iView,2011,1142(2011):1-12.
    [4]Gudivada V N,Baeza-Yates R,Raghavan V V.Big data:Promis‐es and problems[J].IEEE Computer,2015,48(3):20-23.
    [5]Franks B.驾驭大数据[M].北京:人民邮电出版社,2013.
    [6]Kulkarni A.A study on metadata management and quality evalua‐tion in big data management[J].Engineering Technology&Ap‐plied Science Research,2016,4(7):455-459.
    [7]Lee Y W,Pipino L L,Funk J D,et al.数据质量征途[M].黄伟,王嘉寅,苏秦,等译.北京:高等教育出版社,2015.
    [8]汪应洛,黄伟,朱志祥.大数据产业及管理问题的一些初步思考[J].科技促进发展,2014(1):15-19.
    [9]Immonen A,P??kk?nen P,Ovaska E.Evaluating the quality of so‐cial media data in big data architecture[J].IEEE Access,2015,3:2028-2043.
    [10]Liu J,Li J,Li W,et al.Rethinking big data:A review on the data quality and usage issues[J].ISPRS Journal of Photogrammetry and Remote Sensing,2016,115:134-142.
    [11]Boyd D,Crawford K.Critical questions for big data provocations for a cultural,technological,and scholarly phenomenon[J].Infor‐mation,Communication and Society,2012,15:662-679.
    [12]Sukumar R,Ramachandran N,Ferrell R K.‘Big Data’in health care:How good is it?[J].International Journal of Health Care Quality Assurance,2015:2-9.
    [13]Caballero I,Serrano M,Piattini M.A data quality in use model for big data[C]//Proceedings of the International Conference on Conceptual Modeling.Heidelberg:Springer,2014:65-74.
    [14]Cai L,Zhu Y Y.The challenges of data quality and data quality assessment in the big data era[J].Data Science Journal,2015,14:Article No.2.
    [15]Wahyudi A,Kuk G,Janssen M.A process pattern model for tack‐ling and improving big data quality[J].Information Systems Fron‐tiers,2018,20:457-469.
    [16]Haryadi A F,Hulstijn J,Wahyudi A,et al.Antecedents of big data quality:An empirical examination in financial service organiza‐tions[C]//Proceedings of 2016 IEEE International Conference on Big Data.IEEE,2016:116-121.
    [17]Gao J,Xie C,Tao C.Big data validation and quality assuranceIssuses,challenges,and needs[C]//Proceedings of 2016 IEEESymposium on Service-Oriented System Engineering.IEEE,2016:433-441.
    [18]Batini C,Rula A,Scannapieco M,et al.From data quality to big data quality[J].Journal of Database Management,2015,26(1):60-82.
    [19]Rao D,Gudivada V N,Raghavan V V.Data quality issues in big data[C]//Proceedings of IEEE International Conference on Big Data.IEEE,2015:2654-2660.
    [20]Haryadi A F.Requirements on and antecedents of big data quali‐ty:An empirical examination to improve big data quality in finan‐cial service organizations[D].Delft:Delft University of Technolo‐gy,2016:13.
    [21]Glowalla P,Balazy P,Basten D,et al.Process-driven data quality management-An application of the combined conceptual life cy‐cle model[C]//Proceedings of the 2014 47th Hawaii International Conference on System Sciences.Washington DC:IEEE Comput‐er Society,2014:4700-4709.
    [22]Clarke.The OECD guidelines[EB/OL].[2017-4-4].http://www.rogerclarke.com/DV/PaperOECD.html.
    [23]Soares S.Big data governance[M]//An Emerging Imperative.MC Press,2012.
    [24]Aggarwal A.Data quality evaluation framework to assess the di‐mensions of 3V’s of big data[J].International Journal of Emerg‐ing Technology and Advanced Engineering,2017,7(10):503-506.
    [25]Toivonen M.Big data quality challenges in the context of busi‐ness analytics[D].Helsinki:University of Helsinki,2015:47-48.
    [26]Kl?s M,Trendowicz A,Jedlitschka A.What makes big data dif‐ferent from a data quality assessment perspective?Practical chal‐lenges for data and information quality research[R].ODQ201530 March 2015,Garching,Germany.
    [27]Ardagna D,Cappiello C,SamáW,et al.Context-aware data qual‐ity assessment for big data[J].Future Generation Computer Sys‐tems,2018,89:548-562.
    [28]张绍华,潘蓉,宗宇伟.大数据治理与服务[M].上海:上海科学技术出版社,2016:120.
    [29]Juddoo S.Overview of data quality challenges in the context of Big Data[C]//Proceedings of the 2015 International Conference on Computing,Communication and Security.IEEE,2015:1-9.
    [30]Sneed H M,Erdoes K.Testing big data(assuring the quality of large databases)[C]//Proceedings of the 2015 IEEE Eighth Inter‐national Conference on Software Testing,Verification and Valida‐tion Workshops.IEEE,2015:1-6.
    [31]Liedtke C A.Quality,analytics,and big data[R].Strategic Im‐provement Systems,2016.
    [32]蔡莉,朱扬勇.大数据质量[M].上海:上海科学技术出版社,2017:5.
    [33]Federal D A S.Data quality framework,version 1.0[R].Justice Sector Information Strategy,Ministry of Justice,US,2008.
    [34]Parkinson J.Six big data challenges[EB/OL].[2017-02-01].http://www.cioinsight.com/c/a/Expert-Voices/Managing-Big-Data-SixOperational-Challenges-484979.
    [35]Loshin D.Big data analytics:From strategic planning to enter‐prise integration with tools,techniques,NoSQL,and graph[M].Morgan Kaufmann Publishers,2013:13.
    [36]Ge M,Dohnal V.Quality management in big data[J].Informatics,2018,5:19.
    [37]Calder A.ISO/IEC 38500:The IT governance standard[M].ITGovernance Publishing,2008.
    [38]Data Governance Institute.The DGI data governance framework[R].2009.
    [39]IBM Corporation.IBM data governance council maturity model:Building a roadmap for effective data governance[R].2007.
    [40]ISACA.COBIT 5:Enabling information[M].ISA,2013.
    [41]Gartner Group.Big data[EB/OL].http://www.gartner.com/itglossary/big-data.
    [42]DAMA International.DAMA数据管理知识体系指南[M].马欢,刘晨,等译.北京:清华大学出版社,2012.
    [43]Taleb I,Dssouli R,Serhani M A.Big data pre-processing:A qual‐ity framework[C]//Proceedings of the IEEE International Con‐gress on Big Data.IEEE,2015:191-198.
    [44]Taleb I,Serhani M A,Dssouli R.Big data quality:A survey[C]//Proceedings of the 2018 IEEE International Congress on Big Da‐ta.IEEE,2018:166-173.
    [45]Chen Y T,Sun E W,Lin Y B.Coherent quality management for big data systems:a dynamic approach for stochastic time consis‐tency[J].Annals of Operations Research,2018:Article No.2795.
    [46]Cheah Y W,Canon R,Plale B,et al.Milieu:Lightweight and con‐figurable big data provenance for science[C]//Proceedings of the2013 IEEE International Congress on Big Data.IEEE,2013:46-53.
    [47]Beckеr D,King T D,McMullеn B.Big data,big data quality pro‐blеm[C]//Proceedings of the 2015 IEEE Intеrnational Confer‐encеon Santa Clara.IEEE,2015:2644-2653.
    [48]Pawar S H,Thakore D.An assessment model to evaluate quality attributes in big data quality[J].International Journal of Comput‐er Science Trends and Technology,2017,5(2):373-376.
    [49]Reddy G M,Deshmukh G,Kumar R A,et al.Enhanced big data quality frame work[J].International Journal of Computer Science and Information Technologies,2016,7(3):1408-1409.
    [50]Saha B,Srivastava D.Data quality:The other face of Big Data[C]//Proceedings of the International Conference on Data Engi‐neering.IEEE,2014:1294-1297.
    [51]金范.数据质量管理与安全管理[M].上海:上海科学技术出版社,2016:47.
    [52]Soares S.大数据治理[M].匡斌,译.北京:清华大学出版社,2014.
    [53]Taleb I,El Kassabi H T,Serhani M A,et al.Big data quality:Aquality dimensions evaluation[C]//Proceedings of the 2016 Inter‐national IEEE Conferences on Ubiquitous Intelligence&Com‐puting,Advanced and Trusted Computing,Scalable Computing and Communications,Cloud and Big Data Computing,Internet of People,and Smart World Congress.IEEE,2016:759-765.
    [54]Merino J,Caballero I,Rivas B,et al.A data quality in use model for big data[J].Future Generation Computer Systems,2016,63:123-130.
    [55]Krogstie J,Gao S.A semiotic approach to investigate quality is‐sues of open big data ecosystems[M]//Information and Knowl‐edge Management in Complex Systems.Springer International Publishing,2015:41-50.
    [56]Bizer C.Quality-driven information filtering-in the context of web-based information systems[M].Saarbrücken:VDM Verlag,2007:1-22.
    [57]Desai K Y.Big data quality modeling and validation[D].San Jo‐se:San JoséState University,2018,5:18-58.
    [58]Fabijan A,Helena H O,Bosch J.Customer feedback and data col‐lection techniques in software R&D:A literature review[C]//Pro‐ceedings of the International Conference of Software Business.Springer:2015,1:139-153.
    [59]Bertino E.Big data-Opportunities and challenges panel position paper[C]//Proceedings of the 2013 IEEE 37th Annual Computer Software and Applications Conference.Washington DC:IEEEComputer Society,2013:479-480.
    [60]莫祖英.大数据质量测度模型构建[J].情报理论与实践,2018,41(3):11-15.
    [61]Floridi L.Big data and information quality[M]//The Philosophy of Information Quality.Springer International Publishing,2014:303-315.
    [62]Abdullah N,Ismail S A,Sophiayati S,et al.Data quality in big data:A review[J].International Journal of Advances in Soft Com‐puting and its Applications,2015:17-27.
    [63]Sukumar S R,Natarajan R,Ferrell R K.Quality of big data in health care[J].International Journal of Health Care Quality Assur‐ance,2015,28(6):621-634.
    [64]Firmani D,Mecella M,Scannapieco M,et al.On the meaningful‐ness of“Big Data Quality”[J].Data Science and Engineering,2016,1(1):6-20.
    [65]Juddoo S.Overview of data quality challenges in the context of Big Data[C]//Proceedings of the 2015 International Conference on Computing,Communication and Security.IEEE,2016.
    [66]Dumbill E.Making sense of big data[J].Big Data,2013,1(1):1-2.
    [67]Becker D,King T D,McMullen B,et al.Big data quality case study preliminary findings[R].U.S.Army Medcom Mods,2013:1-54.
    [68]Kl?s M,Putz W,Lutz T.Quality evaluation for big data:A scal‐able assessment approach and first evaluation results[C]//Pro‐ceedings of the Joint Conference of the International Workshop on Software Measurement&the International Conference on Software Process&Product Measurement.IEEE,2017.
    [69]Yao L,Ge Z.Big data quality prediction in the process industry:A distributed parallel modeling framework[J].Journal of Process Control,2018,68:1-13.
    [70]Farzi S,Dastjerdi A B.Data quality measurement using data min‐ing[J].International Journal of Computer Theory and Engineer‐ing,2010,2(1):115-118.
    [71]Han R,Nie L,Ghanem M M,et al.Elastic algorithms for guaran‐teeing quality monotonicity in big data mining[C]//Proceedings of the 2013 IEEE International Conference on Big Data,2013:45-50.
    [72]Li L L,Li J Z,Gao H.Evaluating entity-description conflict on duplicated data[J].Journal of Combinatorial Optimization,2016,31(2):918-941.
    [73]Lai S T,Leu F Y.An iterative and incremental data preprocessing procedure for improving the risk of big data project[C]//Proceed‐ings of the International Conference on Innovative Mobile and In‐ternet Services in Ubiquitous Computing.Heidelberg:Springer,2017,612:483-492.
    [74]Lin Y M,Wang H Z,Li J Z,et al.Data source selection for infor‐mation integration in big data era[J].Information Sciences,2019,479:197-213.
    [75]Miao D,Li J,Liu X,et al.Vertex cover in conflict graphs:Com‐plexity and a near optimal approximation[C]//Proceedings of the International Conference on Combinatorial Optimization and Ap‐plications.New York:Springer,2015:395-408.
    [76]Heinrich B,Hristova D.A fuzzy metric for currency in the con‐text of Big Data[C]//Proceedings of the Twenty Second Europe‐an Conference on Information Systems,2014:1-15.
    [77]Li M H,Li J Z,Cheng S Y.Uncertain rule based method for evaluat‐ing data currency[J].Journal of Software,2014,25(S2):147-156.
    [78]Endler G,Baumg?rtel P,Wahl A M,et al.Is estimation of data completeness through time series forecasts feasible[C]//Proceed‐ings of the Advances in Databases and Information Systems.Springer International Publishing,2015:261-274.
    [79]Razniewski S,Nutt W.Assessing the completeness of geographi‐cal data[C]//Proceedings of the Big Data.Berlin:Springer,2013:228-237.
    [80]Emran N A,Embury S,Missier P,et al.Measuring data complete‐ness for microbial genomics database[C]//Proceedings of the In‐telligent Information and Database Systems.Berlin:Springer,2013:186-195.
    [81]周傲英,金澈清,王国仁,等.不确定性数据管理技术研究综述[J].计算机学报,2009,32(1):1-16.
    [82]Zhang Y,Wang H Z,Yang Z S,et al.Relative accuracy evaluation[J].PLoS ONE,2014,9(8):e103853.
    [83]Heinrich B,Klier M,Schiller A,et al.Assessing data quality-Aprobability-based metric for semantic consistency[J].Decision Support Systems,2018,110:95-106.
    [84]罗纳德·巴赫曼,吉多·肯珀,托马斯·格尔策.大数据时代下半场:数据治理、驱动与变现[M].刘志则,刘源,译.北京:北京联合出版公司,2017:101.
    [85]Sidi F,Panahy P H S,Affendey L S,et al.Data quality:A survey of data quality dimensions[C]//Proceedings of the 2012 Interna‐tional Conference on Information Retrieval&Knowledge Man‐agement.IEEE,2012:300-304.
    [86]Ganapathi A,Chen Y,Ganapathi A,et al.Data quality:Experienc‐es and lessons from operationalizing big data[C]//Proceedings of the IEEE International Conference on Big Data.IEEE,2017.
    [87]叶焕倬,吴迪.相似重复记录清理方法研究综述[J].现代图书情报技术,2010,26(9):56-66.
    [88]蒋勋,刘喜文.大数据环境下面向知识服务的数据清洗研究[J].图书与情报,2013(5):16-21.
    [89]庞雄文,姚占林,李拥军.大数据量的高效重复记录检测方法[J].华中科技大学学报(自然科学版),2010(2):8-11.
    [90]Williamson A.Big data and the implications for government[J].Legal Information Management,2014,14(4):253-257.
    [91]Ciancarini P,Poggi F,Russo D.Big data quality:a roadmap for open data[C]//Proceedings of the 2016 IEEE Second Internation‐al Conference on Big Data Computing Service and Applications.IEEE,2016:210-215.
    [92]洪学海,王志强,杨青海.面向共享的政府大数据质量标准化问题研究[J].大数据,2017(3):44-52.
    [93]马一鸣.政府大数据质量评价体系构建研究[D].长春:吉林大学,2016.
    [94]Juddoo S,George C,Duquenoy P,et al.Data governance in the health industry:Investigating data quality dimensions within a big data context[J].Applied System Innovation,2018,1(4):43;
    [95]Juddoo S,George C.Discovering the most important data quality dimensions in health big data using latent semantic analysis[C]//Proceedings of the IEEE International Conference on Advances in Big Data,Computing and Data Communication Systems,Dur‐ban,South Africa,2018.
    [96]Hoffman S.Medical big data and big data quality problems[J].Social Science Electronic Publishing,2014:289-316.
    [97]马国耀,孙勇韬,马玉玲.数据校验技术在医疗健康大数据质量控制中的应用分析[J].中国卫生信息管理杂志,2016,13(4):417-419.
    [98]陈超.电力大据质量评价模型及动态探查技术研究[J].现代电子技术,2014(4):153-155.
    [99]Hazen B,Boone C,Ezell J,et al.Data quality for data science,predictive analytics,and big data in supply chain management:An introduction to the problem and suggestions for research and applications[J].International Journal of Production Economics,2014,154:72-80.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700