基于属性相关性的K N N近邻填补算法改进
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:KNN nearest neighbor filling algorithm based on attribute correlation
  • 作者:谢霖铨 ; 赵楠 ; 徐浩 ; 毕永朋
  • 英文作者:XIE Linquan;ZHAO Nan;XU Hao;BI Yongpeng;Faculty of Science, Jiangxi University of Science and Technology;
  • 关键词:KNN填补 ; 主成分分析 ; 协方差 ; 离差 ; 属性影响量
  • 英文关键词:KNN filling;;PCA;;covariance;;deviation;;attributes impact
  • 中文刊名:NFYX
  • 英文刊名:Journal of Jiangxi University of Science and Technology
  • 机构:江西理工大学理学院;
  • 出版日期:2019-02-15
  • 出版单位:江西理工大学学报
  • 年:2019
  • 期:v.40;No.197
  • 基金:国家重点研发计划重点专项项目(2016YFB0800700);; 国家自然科学基金资助项目(61762047)
  • 语种:中文;
  • 页:NFYX201901016
  • 页数:7
  • CN:01
  • ISSN:36-1289/TF
  • 分类号:98-104
摘要
为了进一步提高缺失数据的填补效果和降低数据缺失比例带来的影响,提出了基于属性相关的KNN近邻填补算法.将主成份分析算法应用到KNN填补算法中,先用KNN算法计算得到的数值作为主体填补值,然后使用主成分分析过程中产生的协方差矩阵作为整体属性的相关性.由缺失项和K个近邻的离差和相应相关性算出属性影响量,最后并入到KNN计算值之中,得到的数值就是算法改进后的最终估算数值.经数据集仿真实验,算法改进后填补效果更好准确度更高.
        In order to further improve the filling effect of missing data and reduce the impact of data missing ratio, a KNN filling algorithm based on attribute correlation is proposed. The principal component analysis algorithm is applied to the KNN padding algorithm. The value calculated by the KNN algorithm is used as the subject padding value, and then the covariance matrix generated in the principal component analysis process is used as the correlation of the overall attribute. The attribute impact is calculated from the deviation of the missing item and the K nearest neighbor. Finally, the impact is incorporated into the value of the KNN calculation, and the result is the final estimate of the improved algorithm which has a better filling effect and higher degree of accuracy.
引文
[1]王凤梅,胡丽霞.一种基于近邻规则的缺失数据填补方法[J].计算机工程,2012,38(21):53-55.
    [2]Marsh H W.Pairwise deletion for missing data in structural equation models:Nonpositive definite matrices,parameter estimates,goodness of fit,and adjusted sample sizes[J].Structural Equation Modeling A Multidisciplinary Journal,1998,5(1):22-36.
    [3]金勇进.缺失数据的插补调整[J].数理统计与管理,2001,20(6):47-53.
    [4]武森,冯小东,单志广.基于不完备数据聚类的缺失数据填补方法[J].计算机学报,2012,35(8):1726-1738.
    [5]于力超,金勇进,王俊.缺失数据插补方法探讨---基于最近邻插补法和关联规则法[J].统计与信息论坛,2015,172(1):35-40.
    [6]毛玫静,鄂旭,谭艳,等.基于属性相关度的缺失数据填补算法研究[J].计算机工程与应用,2016,52(6):74-79.
    [7]Pan R,Yang T,Cao J,et al.Missing data imputa-tion by Knearest neighbours based on grey relational structure and mutual information[J].Applied Intelligence,2015,43(3):614-632.
    [8]郝胜轩,宋宏,周晓锋.基于近邻噪声处理的K-NN缺失数据填补算法[J].计算机仿真,2014,31(7):264-268.
    [9]Bernstein H J,Andrews L C.Accelerating k-nearest-neighbor searches[J].Journal of Applied Crystallography,2016,49(5):1471-1477.
    [10]唐明田,王允艳.异方差回归模型的经验似然拟合优度检验[J].江西理工大学学报,2012,33(5):74-77.
    [11]Mander A,Clayton D.Hotdeck imputation[J].Stata Technical Bulletin,2000,9(51):1-44.
    [12]Moon T K.The expectation-maximization algorithm[J].IEEE Signal Processing Magazine,1996,13(6):47-60.
    [13]Silva-Ramírez E L,Pino-Mejías R,López-Coello M,et al.Missing value imputation on missing completely at random data using multilayer perceptrons[J].Neural Networks,2011,24(1):121-129.
    [14]Peterson L.K-nearest neighbor[J].Scholarpedia,2009,4(2):1883.
    [15]Ralescu A,Visa S.On filling-in missing attribute values for Bayes and fuzzy classifiers[C]//Fuzzy Information Processing Society,2008.Nafips 2008 Meeting of the North Americ-an.IEEE,2008:1-6.
    [16]于力超,金勇进,王俊.缺失数据插补方法探讨---基于最近邻插补法和关联规则法[J].统计与信息论坛,2015,172(1):35-40.
    [17]Cover T,Hart P.Nearest neighbor pattern classification[J].IEEETrans.inf.theory,1967,13(1):21-27.
    [18]郝胜轩,宋宏,周晓锋.基于近邻噪声处理的KNN缺失数据填补算法[J].计算机仿真,2014,31(7):264-268.
    [19]郑奇斌,刁兴春,曹建军,等.结合局部敏感哈希的k近邻数据填补算法[J].计算机应用,2016,36(2):397-401.
    [20]张国英,沙芸,江慧娜.基于粒子群优化的快速KNN分类算法[J].山东大学学报(理学版),2006,41(3):120-123.
    [21]李蓉,叶世伟,史忠植.SVM-KNN分类器---一种提高SVM分类精度的新方法[J].电子学报,2002,30(5):745-748.
    [22]Deshpande Y,Montanari A.Sparse PCA via Covariance Thresholding[J].Journal of Machine Learning Research,2013,17(1):4913-4953.
    [23]谢明文.关于协方差、相关系数与相关性的关系[J].数理统计与管理,2004,23(3):33-36.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700