数量特征敏感问题两阶段整群抽样的统计方法及其应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
目的:在抽样调查中,若关心的变量或特征是涉及个人隐私或不被社会舆论认可的敏感性问题,则采用直接调查的方法就会使部分被调查者出于保护自我隐私的心理而产生一定程度的不合作甚至拒绝回答或虚假回答,从而使调查结果难以反映总体的真实特征。1965年,Warner通过引入随机化装置,成功实现了在不暴露应答者隐私的情况下获得人群中某敏感性问题的发生比例,开创了随机应答技术(Randomized Response Technique,RRT)的先河。几十年来,随机化回答技术经过不断的发展,不断地被改进并出现了一些新的调查方法。然而,在本项目组研究之前,国内外对敏感问题抽样调查设计研究,主要局限于简单随机抽样,实际应用也主要局限于小范围特殊人群小样本的简单随机抽样调查,或将复杂抽样方法的调查资料误用简单随机抽样调查的有关公式来统计分析,且对于敏感问题抽样调查的信度与效度评价也极少研究。本文选定了加法模型、乘法模型、无关联问题模型三种数量特征敏感问题随机应答技术,旨在探讨在二阶段整群抽样条件下应用随机应答技术调查数量特征敏感问题的统计方法,科学估计北京市艾滋病高危人群——男同性恋人群的有关总体特征,并通过应用实例和计算机模拟调查,对本文研究的调查方法及其统计公式进行信度评价,为大规模复杂抽样条件下进行数量特征敏感性问题的调查提供科学的、可靠的调查方法及其统计量计算公式,为制订艾滋病、性病预防控制规划、措施提供科学的调查数据。
     方法:本文根据数理统计学的基本理论、方法,全概率公式以及随机应答技术理论,推导在二阶段整群抽样条件下应用加法模型、乘法模型、无关联问题模型三种RRT调查数量特征敏感性问题时总体均值的估计量及其方差的计算公式。于2010年8至10月,采用二阶段整群抽样方法,随机抽取北京市6个区30个男同性恋活动场所,对其1523名男同性恋者应用RRT加法模型进行男男性行为情况的调查,使用本文推导出的数量特征敏感性问题二阶段整群抽样调查的有关公式对此调查资料作统计计算,且首次通过对抽样过程的蒙特卡洛法计算机模拟调查来评价本文所研究统计方法的可靠性。
     结果:本文推导出数量特征敏感问题加法模型、乘法模型及无关联问题模型在二阶段整群抽样调查条件下总体均值的估计量及其方差的计算公式。应用本文提供的数量特征敏感问题二阶段整群抽样的调查方法及统计公式,调查计算得北京市男同性恋人群:首次发生男男性行为的平均年龄为20.24岁;每月发生男男性行为的不同男性性伴的平均个数为2.09个;每月发生男男性行为的平均次数为4.72次。蒙特卡洛计算机模拟抽样调查结果与实际调查结果的差别,经假设检验P值均大于0.1,无统计学意义。
     结论:本研究将抽样技术的理论和随机应答技术的理论相结合,首次推导出在二阶段整群抽样条件下应用RRT模型调查数量特征敏感性问题时总体参数的估计量及其方差的计算公式,具有创新意义;并成功应用于北京市男男性行为发生情况的调查;蒙特卡洛计算机模拟抽样调查结果表明本文研究的调查方法及其统计公式信度较高,在复杂抽样条件下应用随机应答技术调查敏感性问题具有广泛的应用前景。
Objective:If a question in a sampling survey is sensitive or highly personal, it is likely to lead either to refusals or to untruthful answers by using the traditional method of direct interview because of the respondent’s concern about revealing their privacy, which makes it difficult to acquire the real character of the population. By ingenious use of a randomizing device, Warner (1965) showed that it is possible to estimate the proportion without the respondents revealing their personal status with respect to the sensitive questions and thus introduced a new method for the sensitive questions survey—randomized response technique(RRT). Over the past few decades, a number of modifications of Warner’s method as well as several other new methods have been emerged in the literature of randomized response. But, before our research project, most of the RR procedures available in the literature are developed and studied with the restriction that the sample is selected by simple random sampling. In the applications of RRT on sensitive questions, the formulas for simple random sampling are abused when the sample is selected by stratified sampling, cluster sampling or other relatively complicated sampling methods. What’s more, the study on assessing the reliability and validity of the investigation on sensitive questions with RRT is seldom reported. In this regard, we select three RRT methods of Additive model, Multiplicative model, and Unrelated model, and aim to explore the feasibility of the methods to investigate quantitative sensitive issues with the sample selected by two-stage cluster sampling, and to estimate the population character of MSM of Beijing city. Meanwhile, the reliability of the methods is assessed by the application example as well as simulative sampling by computer.
     Method: Total probability formula and the theory of RRT was employed to deduce the formula for the estimator of the population proportion and its variance when the three RRT methods are applied to investigate quantitative sensitive issues with the sample selected by two-stage cluster sampling. In the following survey, from August to October, 2010, 30 chambers of MSM from 6 districts of Beijing city are randomly selected by two-stage cluster sampling, and all the 1523 MSMs from these chambers are surveyed by Additive model of RRT. Monte-Carlo simulative survey is performed to evaluate the reliability of the methods above.
     Results: In the condition of two-stage cluster sampling and three RRT models above-mentioned, the formulas to calculate the estimator of population’s parameter and its variance are conducted. And the results of the three RRT models are consistent on the whole. The results of our application sample are: in Beijing city, the average age when MSM had sex with a man is 20.24; the average of sexual partners of MSM per month is 2.09; and the average of sexual behavior between men is 4.72 for every MSM of Beijing city per month. The difference between Monte-Carlo simulative survey and application sample is not significant in statistical test (P>0.1).
     Conclusion: With the RRT models and the formulas we deduced, we provide the method for the first time to calculate the estimator of the population parameter and its variance in quantitative sensitive issue survey under the situation of relatively complicated sampling method such as two-stage cluster sampling. The survey about MSM of Beijing city by two-stage cluster sampling and RRT Additive model is performed successfully and the result of Monte-Carlo simulative survey show that our survey methods and formulas are reliable. RRT has an extensive application in sensitive issue investigation on a large scale.
引文
[1]冯士雍.中国抽样调查应用中的若干问题[J].中国统计, 2001(11):5-7.
    [2]温长松.试述抽样调查方法在历史研究中的应用[J].沈阳大学学报,2006, 18(1):40-41.
    [3]郑温冰.抽样调查在社会研究中应用的若干问题探讨[J].温州职业技术学院学报,2005, 5(3): 62-65.
    [4]郭秀华.实用医学调查分析技术[M].人民军医出版社,2005:39-40.
    [5]孙山泽,孙明举,段钢.二项选择敏感性问题调查的基本方法[J].数理统计与管理,2000,19(1):58-64.
    [6]洪志敏,闫在在.一种定量敏感性问题的随机化调查方法[J].内蒙古工业大学学报,2006,25(3):178-182.
    [7] Stephen EE, Samuel H, Karen LD. Validity of Forced Responses in a Randomized Response Model [J]. Sociological Methods & Research, 1982, Vol.11, No.1:89-100.
    [8]王建华.实用医学科研方法[M].人民卫生出版社,2003:440-442.
    [9]曹萍,张伟.随机化回答技术的应用[J].统计与咨询,2004,(2):12-13.
    [10]石艳芬.敏感性问题调查的基本方法与比较[J].统计与信息论坛,2002,17(5):17-22
    [11]孙蕾,李道孟.随机化回答技术及其应用[J].江苏煤炭,2000,(1):55-56.
    [12]顾震环,解燕,顾莉洁.随机截尾的Warner和Simmons模型[J].数理统计与管理,2003, 22(5):38-42.
    [13]高歌,范玉波,王冕.敏感问题随机应答技术模型分层整群抽样下参数的估计[J]第二军医大学学报, 2009,30(2):170-177.
    [14]高歌,范玉波.敏感问题改进的随机应答技术模型分层整群抽样研究及应用[J].苏州大学学报(医学版), 2008,28(5) :750-754.
    [15]高歌,范玉波.分层整群抽样的Warner模型RRT技术及其对大学生婚前性行为调查中的应用[J].数理统计与管理,2009,28(2):1-5.
    [16]王冕,高歌.数量特征敏感问题的整群抽样调查方法研究及应用[J].中国卫生统计,2008,25(6):586-589.
    [17] Cochran W.G. Sampling Techniques, 3rd Edition[M]. Wiley, New York,1977: 233-242
    [18]王岩,隋思涟,王爱青.数理统计与MATLAB工程数据分析[M].清华大学出版社,2006: 9-10.
    [19] Wang JF, Gao G. The estimation of sampling size in multi-stage sampling and its application in medical survey [J]. Applied Mathematics and Computation, 178(2006): 239-249.
    [20] Cochran W.G. Sampling Techniques, 3rd Edition [M]. Wiley, New York,1977: 233-242.
    [21] CDC. Trends in Diagnosis HIV/AID-33 States, 2001-2004[J].MMWR,weekly Nov 18th 2005,54(45):l149-1153
    [22]世界银行政策研究报告.正视艾滋病——针对这一全球性流行病的公共政策重点(修订版)[R].牛津大学出版社,1998,15-16,51.
    [23]中国国务院防治艾滋病工作委员会办公室,联合国艾滋病中国专题组,等.中国艾滋病防治联合评估报告(2004)[R].2004:7.
    [24]中国卫生部,联合国艾滋病规划署,世界卫生组织.2005年中国艾滋病疫情与防治工作进展[R].2006:2-3.
    [25]王丽艳,夏冬艳,吴玉华等.乘数法估计北京、哈尔滨两市男性同性恋人群规模的研究[J].华南预防医学,2006,32(3): 9-11.
    [26]北京市统计局.北京统计年鉴(2009)[M].2010.
    [27]黄体乾.多级整群抽样样本大小的简便估计[J].中国卫生统计, 1997, 14 (6) : 17-19.
    [28]高歌,范玉波.敏感问题改进的随机应答技术模型分层整群抽样研究及应用[J].苏州大学学报(医学版),2008,28(5):750-754.
    [29] Wang M, Gao G. Quantitative sensitive question sur-vey in cluster sampling and its application [A].Recent Advance in Statistics Application and Related Areas[C]. Sydney: Aussino Academic Publishing House, 2008:648-652.
    [30] Gerty JLM, Lensvelt-Mulders JJH, Peter GM, et al. Meta-Analysis of Randomized Response Research: Thirty-Five Years of Validation [J]. SociologicalMethods & Research, 2004, 33: 319-348.
    [31]张付志.邯郸市男男同性恋艾滋病知识态度行为调查[J].实用预防医学,2008,l5(2):424-425.
    [32]侯建星,WU L, Terence S,等.上海市部分男男同性恋者社会学特征及艾滋病知识调查[J].上海预防医学杂志,2010,22(2): 83-85.
    [33]曾惠芳,秦彦珉,叶宝英,等.深圳男同性恋性病和艾滋病感染状况调查[J].中国热带医学,2006,6(9):1686-1688.
    [34]冯福,王召乾,黄淑平,等.海口市男男性行为人群艾滋病知识、态度、行为特征及HIV/梅毒感染状况调查分析[J].现代预防医学,2009,15 :2902-2903.
    [35]张丹丹,张琰,李怀亮,等. MSM人群性行为特征及血液感染检测结果分析[J].浙江预防医学,2010,22(8):1-3.
    [36]杨慧,阮师漫,朱艳文等.济南市男男性行为人群的特征和高危行为分析[J].中国艾滋病性病,2009,4 :420-421.
    [37]孙振球,主编.医学统计学(第二版)[M].北京:人民卫生出版社,2005: 486-487.
    [38]于明润,高歌,朱宏儒等.敏感问题Simmons模型分层二阶段整群抽样的统计方法[J].苏州大学学报(医学版),2009,29(4):664-667.
    [39]张波,朱珠.蒙特卡罗模拟在抗生素药动学和药效学中的应用[J].中国药学杂志,2008,43(4):241-244.
    [40]符丽媛,宋凌浩,陆永贵等.应用蒙特卡罗模拟开展传染病爆发早期预警的研究[J].口岸卫生控制,2009,14(2):48-51.
    [1]王建华.实用医学科研方法[M].北京:人民卫生出版社,2003:440-442.
    [2]郭秀华.实用医学调查分析技术[M].北京:人民军医出版社,2005:39-40.
    [3]方积乾.卫生统计学(第五版)[M].北京:人民卫生出版社,2003:276-277.
    [4] Warner SL. Randomized response: a survey technique for eliminating evasive answer bias [J]. J Am Stat Assoc. 1965, 60(309): 63-66.
    [5] Simmons WR, Horvitz DG, Shah BV. The Unrelated Question Randomized Response Model. Proceedings in the Social Statistics Section, American Statistical Association. 1967, 65-72.
    [6] Greenberg BG, Abernathy JR, Horvitz DG. A new survey technique and its application in the field of public health [J]. Milbank Mem Fund Q, 1970, 48(4):39-55.
    [7] Cochran WG. Sampling Techniques, 3rd Edition [M]. Wiley, New York, 1977.
    [8] Mangat NS, Singh R. An alternative randomized response procedure [J]. Biometrika, 1990, 77: 439-442.
    [9]孙明举,孙山泽,段钢.二项选择敏感性问题调查的改进方法[J].数理统计与管理, 2000, 19(2):60-63.
    [10]高歌,范玉波.敏感问题Simmons模型的(分层)整群抽样研究[J].中国卫生统计, 2008, 25(6):562-565,569.
    [11]高歌,范玉波,王冕.敏感问题RRT模型分层整群抽样下参数的估计[J].第二军医大学学报,2009,30(2):170-177.
    [12]高歌,范玉波.分层整群抽样的Warner模型RRT技术及其对大学生婚前性行为调查中的应用[J].数理统计与管理,2009,28(2):1-5.
    [13]高歌,范玉波.敏感问题改进的随机应答技术模型分层整群抽样研究及应用[J].苏州大学学报(医学版), 2008,28(5):750-754.
    [14]范大茵.多元随机变量的某些问题[J].数学的实践与认识. 1992(4): 15-20.
    [15]吕恕.有多种备选的敏感问题抽样调查方法分析[J].电子科技大学学报,1994, 23(3): 333-336.
    [16]孙明举,段钢,孙山泽.多项选择随机化调查的多样本模型[J].数理统计与管理,2000,19(3): 61-63.
    [17] Greenberg BG, Horvitz DG, Abernathy JR. Application of the randomized response technique in obtaining quantitative data [J]. J Am Stat Assoc, 1975, 66: 243-250.
    [18] Himmelfarb S, Edgell SE. Additive constants model: A randomized response technique for eliminating evasiveness to quantitative response questions [J]. Psychol Bull, 1980, 87(3): 525-530.
    [19]金莹,梁小筠.对定量的敏感性问题的一种改进调查法及其估计量[J].统计研究, 2000, 11: 58-61.
    [20]顾震环,解燕,顾莉洁.随机截尾的Warner和Simmons模型[J].数理统计与管理, 2003, 22(5): 38-42.
    [21]王冕,高歌.数量特征敏感问题的整群抽样调查方法研究及应用[J].中国卫生统计, 2008, 25(6):586-589.
    [22] Wang M, Ge G. Quantitative Sensitive Question Survey in Cluster Sampling and Its Application[C], 2008 International Institute of Applied Statistics Studies, 2008, YANTAI, CHINA.
    [23]朱宏儒,高歌.数量特征敏感问题分层两阶段整群抽样的统计方法[J].南京医科大学学报, 2009(6): 909-912.
    [24] Zhu HR, GAO G. Stratified two-stage cluster sampling on unrelated question model for quantitative sensitive questions. Recent Advance in Statistics Application and Related Areas, 2009,Ι:814-818. Aussino Academic Publishing House.
    [25] Bond L, Lauby J, Batson H. HIV testing and the role of individual-And structural-level barriers and facilitators [J]. AIDS Care, 2004, 17(2): 125-140.
    [26] Houston J, Tran A. A Survey of Tax Evasion Using the Randomized Response Technique [J]. Advances in Taxation, 2001, 13: 69-94.
    [27] Komatsu R, Kamakura M, Choi K-H, et al. AIDS, HIV And STD among Japanese and Japanese-Americans in San Francisco, California, USA [J]. Int J STD AIDS, 2003, 14(10): 704-709.
    [28]胡利人,陈玮. 2769例未婚女青年婚前性行为状况浅析[J].中国公共卫生, 1998, 14(03): 15-17.
    [29]李爱兰,李立明,张于成,等.北京市大学生性病艾滋病知识、认知及性行为的调查分析[J].中国公共卫生, 1999, 15(6): 545-547.
    [30]刘华清,张培琰.同性恋者的心理状况分析及其影响因素[J].中国性科学, 2000, 9(1): 15-18.
    [31]许毅,施卫星,胡少华,魏尔清.杭州市男性同性恋浮现率调查和人群发生率推算[J].中华预防医学杂志, 2004, 38(5): 313-315.
    [32] Lamb, Charles W, Donald ES. An Empirical Validation of the Randomized Response Technique [J]. Journal of Marketing Research, 1978, 15: 616-21.
    [33]吴擢春,高尔生.敏感问题的调查与统计处理技术(RRT)及其在未婚性行为与人工流产调查上的应用[J].中国卫生统计, 1990, 7(6): 14-18.
    [34]周国宏,李加芙.敏感问题的调查与统计处理技术及其在学生考试作弊行为调查上的应用[J].郧阳医学院学报, 1997, 16(4): 187-190.
    [35] Wang JF, Gao G, Fan YB, et al. The estimation of sampling size in multi-stage sampling and its application in medical survey [J]. Appl Math Comput,2006, 178: 239-249.
    [36]江剑平,黄键,黄浩,等.大学生婚前性行为和性态度现状分析[J].中国学校卫生, 2001, 2(1):11-12.
    [37]林丹,高源,杨洋,等.成都市某重点高校大学生的婚前性行为和相关认知[J].现代预防医学, 2006, 33(5): 717-719.
    [38] Gerty JLM, Lensvelt-Mulders JJH, Peter GM, et al. Meta-Analysis of Randomized Response Research: Thirty-Five Years of Validation [J]. Sociological Methods & Research, 2004, 33: 319-348.
    [39]杨琦.蒙塔卡罗模拟抽样[J].中国卫生统计,1986,3:13.
    [40]张淑梅,李勇.计算机产生随机数的方法[J].数学通报.2006, 45(3):44-45.
    [41]尚明生,王庆先.指定概率的随机数发生器[J].计算机应用.1999, 19(11):44-45.
    [42]黄梅,林咏梅.随机问题的计算机模拟[J].湖南教育学院学报.2001, 11:180-183.
    [43]王素珍,杨蕾. VFP系统实现单纯随机抽样程序的开发及应用[J].电脑学习,2004,6:57-58.
    [44]李国春,沈其君.SPSS编程在随机化技术中的应用[J].数理医药学杂志,2003,16(6):545-546.
    [45]李国春. SPSS编程在医学统计学模拟抽样中的应用[J].数理医药学杂志,2005,18(5):483-485.
    [46]陈卫中. SPSS 13.0中随机抽样的实现[J].现代预防医学, 2007, 34(23): 4485-4486.
    [47]刘玉秀.随机化临床试验及随机化的SAS实现[J].中国临床药理学与治疗学, 2001,6(3):193-195.
    [48]莫传伟,王宁生,梁进权,等. SAS程序在随机化实验设计中的应用.中医新药与临床药理,2001,12(4):298-301.
    [49]杨静,籍艳丽. SAS在抽样调查分析中的应用[J].江苏统计,2003,3:31-32.
    [50]史迎曦,孙玉环.利用SAS编程实现随机抽样[J].统计教育,2005,11:61-64.
    [51]王睿,贺佳.随机抽样方法的SAS实现[J].中国卫生统计,2002,24(1):85-93.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700