网络调查数据质量控制研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
网络调查是现代网络技术和传统调查技术相结合的产物。随着互联网的飞速发展和网络普及程度的不断提高,网络调查实施得越来越广泛。与传统调查相比,网络调查在组织实施、信息采集、信息处理、调查效果等方面具有鲜明的优势,但也正是由于网络的特性使得网络调查存在独特的缺陷,如网络的覆盖率、网络的低控制性、网络的开放性以及网络的安全性等问题。这些问题成为控制网络调查数据质量的障碍。
     本研究从数据误差产生机理深入研究网络调查的数据质量,以全面质量管理理论、优化控制理论和数据误差理论为基础,界定网络调查数据质量等相关概念;提出了网络调查数据质量控制理论;构建了以网络调查数据的内生质量、传递质量及控制质量影响因素为二级指标的网络调查数据质量影响因素指标体系;对网络调查数据质量影响因素进行分析;分析表明,网络调查数据误差是网络调查数据质量的主要影响因素。在对网络调查数据误差效应进行全面分析后,分别引入加权调整法、二级抽样法、热卡插补法以及随机化回答模型等国际上先进的误差修正技术并加以改进,从控制网络调查数据质量的微观层面,对由网络特性导致的网络调查数据误差进行研究,从数据质量管控的方法层面寻找有效控制网络调查数据质量的方法。在理论分析的基础上,经过强假设,构建仿真样本集,设计网络调查无回答误差修正与计量误差修正仿真流程,利用S-Plus和SPSS统计软件设计并实施仿真程序,验证各误差修正技术的可行性和有效性。
     研究认为,推动调查组织的全面参与、提升网民的总体素质、设计科学的调查方案、增强网络调查的可信任程度、加强网络调查的过程监控、采用必要的数据修正技术、纳入丰富的先验辅助信息、明确适合的网络调查范围并以混合方式辅助实施调查、采用多学科交叉的技术与方法等是网络调查数据质量控制的有效途径。
     本研究的创新之处主要有:
     第一,科学界定了网络调查数据质量以及网络调查数据质量控制的概念,提出了网络调查数据质量控制理论,构建了网络调查数据质量影响因素指标体系,拓展了网络调查数据质量的研究领域。
     第二,系统研究了控制网络调查数据质量的误差修正技术,在全面分析网络调查数据质量误差效应的基础上,引入数据误差领域的研究成果,并适当改进之以符合网络调查特征,分别对网络调查的覆盖误差、抽样误差、无回答误差和计量误差进行修正,从技术层面控制网络调查工作中误差因素带来的数据质量问题,其中包括参与调查的“人”的问题,调查本身的设计问题以及数据搜集等问题。提出可操作的数据质量保障方法。
     第三,有效验证了网络调查误差修正的可行性,在合适的网络调查数据误差仿真样本集的基础上,设计网络调查无回答误差修正与计量问题修正仿真步骤,利用S-Plus和SPSS软件设计并实施仿真程序,从误差角度实现控制网络调查数据质量的技术,开拓了现有理论和方法的应用领域,提高了研究的深度和精度。
Web survey combines modern network technology with traditional survey techniques. With the rapid development of the Internet and the improvement of network popularization, this kind of survey is applied more and more widely. Compared with the traditional survey, web surveys has many advantages in the survey organization, information collection, information processing and investigate results. However, because of the characteristics of the network, web survey still have some disadvantages, such as low coverage, low controlling, high openness and low security of Internet. These problems become the obstacles to stop us controlling data quality in the web survey.
     In this paper, we begin with the data error to research the data quality of the web survey. Based on the theories of Total Quality Management, Optimal Control and Statistic Error, we define some conceptsaboutdata quality in web survey.We also construct the index system of influence factors of the web survey using the definition of data quality of web survey, which are endogenous quality, transmission quality and control qualityof the web survey data. After analyzing the error effects of the web survey data, we use some error control techniques to solve the error problems in web survey. And try to find some effective ways to control the sampling error, coverage error, nonresponse error and measurement error. Research was found that using the error adjustment methods, we can ease the participants who may refuse to be investigated and also the design of the investigation itself. With the same purpose, these good methods can help reducing the problems of data collection and administration, legal supervision and web moral in web survey. Based on the theoretical analysis, after strong hypothesis, we use the data as sample sets of the simulation from Leiniz Institute for the Social Sciences, Germany. This investigate was finished in July,2011 named social inequality. Then, we design the simulation process of nonresponse and measurement error correction. Use the statistical software of S-Plus and SPSS to design and implementation of the simulation program and test the effectiveness of the methods we study above.
     Finally, there are some advisements for how to control the data quality in web survey. Such as advising the survey organization should stay during the process of investigation, promoting the web users'quality, designing survey scheme scientificly, enhancingthe credibility of web survey, strengthening the process control of web survey, supplying enough prior auxiliary information, making clear about the survey coverage, using mixed survey way to carry out investigation, anddealing with interdisciplinary technique and method for the data quality of web survey.
     The innovation of this dissertationincludes:
     First, define scientificly about what is data quality of the web survey and control the data quality of the web survey. Then construct the index system about the quality factors of the data in the web survey. According to this index system, study which fators are more important to influence the data quality in the web survey.
     Second, study some useful method to control the data quality in the web survey. These methods are not only using statistical area of research achievements, but also joining the network characteristics elements. Sowe can reduce the data error made by "people", questionair designing and data collection.
     Finally, use simulation way to check the effectiveness of the correction of the data errors in the web survey. Then find the proper data sets and design the processes of the nonresponse error and measurement error of the web survey. Using S-Plus and SPSS software to design and implementation of the simulation program, then gives us some conclutions. This developes the theory and the application with the existing methods, and improve the new field of study data quality of the web survey.
引文
[1]Robert J. Bonometti and Jun Tang. A Dynamic Technique for Conducting Online Survey Based Research[J]. Competitiveness Review, Vol.16(2):97-105,2006.
    [2]张凯昀,郑涛,李克强.基于本体的网络调查问卷生成系统[J].计算机工程与应用,2006(9):208-211
    [3]邢苗条,段安平.基于Web的网络调查统计信息系统的设计与实现[J].空军l:程大学学报,2005(3):74-76
    [4]张涛.利用ASP技术建立网络调查投票系统[J].电脑编程技巧与维护,2009(23):5-8
    [5]张清,石柳.基于网络调查的旅游电子商务使用者满意度分析[J].企业家天地,2009(11):23-24
    [6]姜晓洁.基于JSP技术的通用网络调查系统设计[J].电脑知识与技术,2009(8):34-36
    [7]张玲,王琼.我国高校图书馆信息素质教育现状—基于网络调查的分析[J].图书情报工作,2009(11):32-35
    [8]朱庆华,钱晓明.日本信息通信政策研究及其对中国的启示(Ⅲ)——基于网络调查的中国信息通信政策现状分析[J].情报科学,2009(9):1424-1429
    [9]朱明芳.电子邮件式的网络调查在旅游企业中的应用研究[J].特区经济,2005(10):98-100
    [10]刘晨.中美高校图书馆网络信息服务的网络调查和实证比较研究[J].图书情报工作,2004(11):94-97
    [11]方佳明,邵培基.一种评估网络调查适用度的方法[J].清华大学学报(自然科学版),2006(51):1160-1164
    [12]李锐,宋铁英.国内网络调查研究分析[J].情报科学,2005(6):591-895
    [13]曾五一,林飞.网络时代话网络调查[J].中国统计,2002(5):78-80
    [14]徐浪,向蓉美.目前中国网络调查的局限性和适用范围[J].统计与信息论坛,2006(01):84-88
    [15]曾五一,袁加军.网络调查安全问题研究[J].中国统计,2007(11):10-11
    [16]曾鸿.正确处理网络调查与隐私权保护[J].商业时代,2004(26):61
    [17]耿修林,谢兆茹.应用统计学[M].科学出版社,2002年7月
    [18]杜婷,庞东.网络抽样调查数据质量的评估与控制[J].统计与决策,2004(2):4-5
    [19]浦国华,高玲芬.网络调查方法的质量评价及减少调查误差的措施研究[J].浙江统计,2004(8):30-32
    [20]方佳明,邵培基.基于网络的问卷调查回复率影响因素实证研究[J].管理评论,2006(10):12-17
    [21]刘权,朱胜,何源.网络调查数据质量的多级模糊综合评价方法[J].中国统计,2007(5):23-25
    [22]Don A Dillman. The Effect of Computer Assisted Interviewing on Data Quality: A Review[J] Journal of the Marketing Research Society,1978(37):325-344
    [23]Don A Dillman. Mail and Internet Surveys[M],2nd Edition, New York:Wiley,2007.
    [24]Gunar E. Liepins and V.R.R. Uppuluri, Data Quality Control—Theory and Pragmatics[M]. ACM Portal Press,1990.
    [25]Hanscom, Brett; Lurie, Jon D.; Homa, Karen; Weinstein, James N, Computerized Questionnaires and the Quality of Survey Data [J]. Spine Journal of Health Services Research, August 2002.
    [26]Roy K. Lowry & Stephen G. Loch .Transfer and SERPLO: powerful data quality control tools developed by the British Oceanographic Data Centre[J]. Geological Society, London, Special Publications; 1995(97):109-115
    [27]Richard Y. Wang,Henry B. Kon&Stuart E. Madnick.Data Quality Requirements Analysis and Modeling[J]. Proceedings of the Ninth International Conference of Data Engineering.1993:670-676
    [28]韩京宇,徐立臻,董逸生.数据质量研究综述[J].计算机科学,2008,35(2):1-5
    [29]Wang R Y,D M Strong,B K Kahn,et al.An Information Quality Assessment Methodology[J].Proceedings of the International Conference on Information Quality.Cambridge,MA,1999
    [30]Yang L,D M Strong,B K Kahn,et al.AIMQ:a Methodology for Information Quality Assessment[J].Information & Management,2002;(2)
    [31]Bobrowski M, M Marre, D Yankelevich.A Homogeneous Framework to Measure Data Quality[J]. Proceedings of the International Conference on Information Quality.Cambridge,MA,1999
    [32]Naumann F,C Rolker.Assessment Methods for Information Quality Criteria[J]. Proceedings of 5th International Conference on Information Quality.2000
    [33]PipinoL,Y Lee,R Y Wang.Data Quality Assessment[J]. Communications of the ACM,2002;(5)
    [34]Cappiello C,C Francalanci,B Pernici.Data Quality Assessment from the User's Perspective[J]. Proceedings of IQIS'04 in Conjunction with the 23rd ACM SIGMOD International Conference on Management of Data.Paris,2004
    [35]Eppler M,P Muenzenmayer.Measuring Information Quality in the Web Context: A Survey of State-of-the-art Instruments and an Application Methodology[J]. Proceedings of the 7th International Conference on Information Quality,2002
    [36]刘洪,黄燕.统计数据质量及其评估方法[J].统计与决策,2006(2):30-31
    [37]王华,金勇进.统计数据准确性评估:方法分类机制适用性分析[J].统计研究,2009(1):32-39
    [38]卢二坡,黄炳艺.基于稳健MM估计的统计数据质量评估方法[J].统计研究, 2010(12):16-22
    [39]刘永璋,朱胜.基于VEC模型的四川省GDP统计数据质量分析[J].经济研究导刊,2010(4):96-98
    [40]蒋萍,田成诗.全方位、立体性数据质量概念的建立与实施[J].统计研究,2010(12):8-15
    [41]赵喜仓,李盼.政府统计数据质量的实证检验分析[J].统计与决策,2010(15):9-13
    [42]杨清.统计数据质量研究新思路——误差研究[J].统计研究,2000(8):33-37
    [43]孙伶莉.网络调查中的非抽样误差[J].统计与决策,2003(8):49-50
    [44]王菲,曾五一.网络调查中的非抽样误差及其预防措施[J].统计教育,2003(3):9-11
    [45]金勇进,朱琳.不同插补方法的比较[J].数理统计与管理,2000(7):50-54
    [46]金勇进.缺失数据的插补调整[J].数理统计与管理,2001(5):47-53
    [47]冯士雍.关于样本对总体代表性问题的认识与讨论——兼论抽样调查中辅助变量的作用[J].统计研究,2001(9):30-34
    [48]邹国华,冯士雍.超总体模型下有限总体的估计[J].系统科学与数学,2007(2):27-38
    [49]艾小青,金勇进.有限总体的估计——基于超总体模型[J].统计教育,2009(2):3-6
    [50]俞纯权.系统样本差估计量的优良性[J].统计与信息论坛,2004(1):30-32
    [51]严洁,任莉颖.政治敏感问题无回答的处理:多重插补法的应用[J].华中师范大学学报(人文社会科学版),2010(3):29-34
    [52]Rebecca R. Andridge, Roderick J. A. Little. Proxy Pattern-Mixture Analysis for Survey Nonresponse[J]. JSM2008:3261-3268
    [53]LittleR. J. A. Pattern-Mixture Models for Multivariate Incomplete Data[J]. Journal of the American Statistical Association,1993(88):125-134.
    [54]Little R. J. A. A Class of Pattern-Mixture Models for Normal Incomplete Data[J]. Biometrika, 1994 (81):471-483.
    [55]Little R. and Vartivarian, S. Does Weighting for Nonresponse Increase the Variance of Survey Means?[J].Survey Methodology,2005(31):161-168.
    [56]Linda J. Sax, Shannon K. Gilmartin and Alyssa N. Bryant. Assessing Response Rates And Nonresponse Bias In Web And Paper Surveys[J]. Research in Higher Education, 2003(44/4):409-432
    [57]Jelke Bethlehem. Selection Bias in Web Surveys[J]. International Statistical Review, 2010(78/2):161-188
    [58]郭强等.网络调查手册[M].北京:中国时代经济出版社.2004
    [59]周东.数据质量:应用系统的成功保障[J].中国信息界,2006(12):39-40
    [60]陈远,罗琳,沈祥兴.信息系统中的数据质量问题研究[J].中国图书馆学报,2004(01):50-52
    [61]Jack E. Olson. Data Quality. The Accuracy Dimension [M]. USA: Morgan Kaufmann Publishers,2003
    [62]Holly Hyland, Lisa Elliott, Federal Student Aid. No Data Left Behind:Federal Student Aid- A Case History[Z].2008-03
    [63]Wim Helmer, Dun & Bradstreet. Data Quality: It's a Family Affair[Z].2007
    [64]宋立荣,李思经.从数据质量到信息质量的发展[J].情报科学,2010(2):182-186
    [65]Craig W. Fisher, InduShobha Chengalur-Smith, Donald P. Ballou. The Impact of Experience and Time on the Use of Data Quality Information in Decision Making[J]. Information Systems Research, Vol.14, No.2, June 2003, p 170-188
    [66]Parssian, Sarkar, Jacob. Assessing Data Quality for Information Products[J]. Management Science Vol.50, No.7, July 2004, p 967-982
    [67]Antti Jakobsson. Data Quality and Quality Management-Examples of Quality Evaluation Procedures and Quality Management in European National Mapping Agencies[J]. Spatial Data Quality, London, Taylor & Francis.2002:216-229.
    [68]许涤龙,张芳.统计信息质量的评价标准与模糊评价方法.统计与信息论坛,2003,18(5):12-16
    [69]张芳.政府统计数据质量及其管理研究[D].湖南:湖南大学,2004
    [70]马元三.基于全面质量管理的统计数据质量研究[J].宏观经济研究,2010(11):64-69
    [71]郑家亨.统计大辞典[S].北京:中国统计出版社.1995
    [72]胡帆.统计调查数据的全面质量管理[J].统计研究,2010(11):53-56
    [73]《统计数据质量控制体系研究》课题组.统计数据质量控制体系研究[J].调研世界,2010(8):6-9
    [74]李晶,陈思.浅谈最优控制[J].黑龙江科技信息,2008(27):18
    [75]孔杰.统计数据质量控制策略[J].现代商业,2010(14):184
    [76]刘洪,黄燕.基于经典计量模型的统计数据质量评估方法[J].统计研究,2009(3):91-96
    [77]王华,金勇进.统计数据质量与用户满意度:测评量表设计与实证研究[J].统计研究,2010(7):9-17
    [78]黄恒君,傅德印.对统计调查质量特性的探讨[J].统计研究,2009(11):3-7
    [79]刘海清,熊祖辕.统计信用与统计数据质量研究[J].统计研究,2009(12):19-22
    [80]龚曙明.统计误差分类与数据质量评估和控制的思考[J].统计与决策,2006(11):71-72
    [81]耿修林.社会调查中样本容量的确定[M].北京:社会科学出版社,2008
    [82]车文博主编.心理咨询大百科全书[M].杭州:浙江科学技术出版社.2001:145
    [83]Judith T.Lessler, William D.Kalsbeek调查中的非抽样误差[M].北京:中国统计出版社,1997:11-12
    [84]赵绍忠.对丢夫目标总体单元抽样框误差一种补救方法的探讨[J].统计研究,2004(12):40-42
    [85]罗薇.抽样框缺陷及误差控制[J].统计与决策,2008(18):14-15
    [86]方匡南,王斌会.抽样框误差测量及其控制[J].统计与决策,2007(3):15-17
    [87]向洪,张文贤,李开兴.人口科学大辞典.成都:成都科技大学出版社.1994:606-607
    [88]李剑华,范定九.社会学简明辞典.兰州:甘肃人民出版社.1984:299
    [89]金勇进,杜子芳,蒋研编著.抽样技术[M].北京:中国人民大学出版社,2008第二版
    [90]查先进.信息分析与预测[M].武汉:武汉大学出版社,2009
    [91]中国互联网络中心(CNNIC)第27次中国互联网络发展状况统计报告[Z].2011.1.19
    [92]傅元略,曾爱民,南星恒.调查研究法的运用问题评析——揭示它在我国会计研究中的运用现状[J].财经理论与实践,2009(11):68-73
    [93]Linda J. Sax, Shannon K. Gilmartin, Alyssa N. Bryant. Assessing Response Rates And Nonresponse Bias In Web And Paper Surveys[J]. Research in Higher Education, Vol.44, No. 4,2003:409-432
    [94]Francois Coderre,Anne Mathieu. Comparison of the Quality of Qualitative Data Obtained Through Telephone, Postal and Email Surveys[J].International Journal of Market Research Vol.46,2004:347-357
    [95]Seymour Sudman, Norman M. Bradburn. Response Effects in Surveys:A Review and Synthesis [M]. Chicago:Aldine,1974
    [96]Kish L.Survey Sampling[M].New York:Wiley and Sons,1965
    [97]Raj D. Sampling Theory [M]. NewYork:McGraw-Hill,1968
    [98]Moser C.A., Kalton G. Survry Methods in Social Investigation [M].New York:Basic Books, 2nded,1972
    [99]Deming W. E. Sample Design in Business Research [M].New York: Wiley,1960
    [100]http://www.socialresearchmethods.net/kb/measerr.php
    [101]Tourangeau R., Yan T. Sensitive Questions in Surveys [J]. Psychological Bulletin, 2007(133):859-883
    [102]Couper M. P., Traugott M., Lamias M. Effective Survey Administration on the Web [R]. Midwest Association for Public Opinion Research,1999(11)
    [103]Dillman D. A.,Tortora R. D., Conradt J., Bowker D. Influence of Plain vs. Fancy Design on Response Rates for Web Surveys [R]. The Joint Statistical Meetings of the American Statistical Association,1998(8)
    [104]郭世琪,冯士雍.事后分层方法及其在交通运输抽样中的运用[J].数理统计与管理,1994(11):1-5
    [105]李智文,任爱国.倾向评分加权分析法[J].中国生育健康杂志,2010(8):4-6
    [106]Little R. J. A. and Rubin D. B. Statistical Analysis with Missing Data[M]Wiley. New York, 2002
    [107]Dowling T. A., Shachtman R. H. On the Relative Efficiency of Radomized Response Models [J]. Journal of the American Statitical Association,1975(70):84-87
    [108]Maddala G.S. Limited Department and Qualitative Variables in Econometrics [M]. UK: Cambrige University Press,1983
    [109]Clark S.J., Desharnais R.A. Honest answers to embarrassing questions:Detecting cheating in the randomized response model [J].Psychological Methods,1998(3):160-168
    [110]Bockenholt U., van der Heijden P.G.M. Item randomized-response models for measuring noncompliance:Risk-return perceptions, social influences, and self-protective responses[J]. Psychometrika,2007(72):245-262
    [111]J.P,Fox. Bayesian Item Response Modeling: Theory and Applications, Statistics for Social and Behavioral Sciences[M]. Springer Science+Business Media,2010
    [112]王丽君,等.敏感性问题中的均方误差与模型比较[J].山东大学学报(工学版),2006(12):51-57
    [113]Jean-Paul F. Bayesian Item Response Modeling [M]. New York:Springer-Verlag,2010
    [114]刘建平等著.辅助信息在抽样调查中的应用模型与方法[M].北京:中国统计出版社,2008
    [115]马德峰.影响抽样调查方法的若干因素[J].社会,2004(01):4143
    [116]曾五一,汪彩玲,王菲.网络调查的误差及其处理[J].统计与信息论坛,2008(2):5-10
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.