摘要
文章提出了数据缺失机制识别联合模型,并运用R 3.4.1软件、采用Bootstrap法重复模拟对所提出的联合模型在不同缺失机制、不同缺失比例下的识别效果进行评价。从重复模拟结果可知,联合模型在不同缺失比例下对完全随机缺失(MCAR)机制的识别效果较好(正确识别率为94.79%~95.29%),对随机缺失(MAR)机制的识别效果尚可(正确识别率为77.64%~78.72%)。联合模型在两种缺失机制下在各缺失比例下的正确识别率均较为稳健。
This paper proposes a data missing mechanism recognition combined model, and uses R 3.4.1 software and Bootstrap method to duplicate the simulation to evaluate the recognition effect of the combined model under different missing mechanisms and different missing proportions. The repeated simulation results show that under different missing proportion, the combined model has better recognition effect on the missing completely at random(MCAR) mechanism(correct recognition rate:94.79%~95.29%), and the recognition effect for the missing at random(MAR) mechanism is also acceptable(correct recognition rate:77.64%~78.72%). In the two missing mechanisms, the combined model is robust to the missing proportion.
引文
[1]Netten A P, Dekker F W, Rieffe C, et al. Missing Data in the Field of Otorhinolaryngology and Head&Neck Surgery:Need for Improvement[J]. Ear&Hearing, 2017, 38(1).
[2]Beaulieu-Jones B K, Lavage D R, Snyder J W, et al. Characterizing and Managing Missing Structured Data in Electronic Health Records:Data Analysis[J]. Jmir Medical Informatics, 2018, 6(1).
[3]沈琳,陈千红,谭红专.缺失数据的识别与处理[J].中南大学学报(医学版), 2013, 38(12).
[4]周静,周正松,高旸等.神经网络模型应用于数据缺失机制识别的可行性分析[J].现代预防医学, 2017,44(21).
[5]Li J, Yu Y. A Nonparametric Test of Missing Completely at Random for Incomplete Multivariate Data[J]. Psychometrika, 2015, 80(3).
[6]邱建青,杜春霖,周婷等.多变量数据缺失机制的识别方法[J].中国卫生统计, 2017,(6).
[7]Rubin D B. Inference and Missing Data[J]. Biometrika, 1976, 63(3).
[8]孙婕,金勇进,戴明锋.关于数据缺失机制的检验方法探讨[J].数学的实践与认识, 2013, 43(12).
[9]李春林,高玉鹏,李圣瑜.不完全数据多重插补的Bootstrap方差估计[J].统计与决策, 2017,(18).