重复数据中关键属性值缺失填补的改进ROUSTIDA算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Improved ROUSTIDA Algorithm for Missing Data Imputation with Key Attribute in Repetitive Data
  • 作者:樊哲宁 ; 杨秋辉 ; 翟宇鹏 ; 万莹 ; 王帅
  • 英文作者:FAN Zhe-ning;YANG Qiu-hui;ZHAI Yu-peng;WAN Ying;WANG Shuai;College of Computer Science(Software Engineering),Sichuan University;
  • 关键词:数据预处理 ; 重复数据 ; 缺失填补 ; ROUSTIDA算法
  • 英文关键词:Data pre-processing;;Repeated data;;Missing data imputation;;ROUSTIDA algorithm
  • 中文刊名:JSJA
  • 英文刊名:Computer Science
  • 机构:四川大学计算机学院(软件学院);
  • 出版日期:2019-02-15
  • 出版单位:计算机科学
  • 年:2019
  • 期:v.46
  • 语种:中文;
  • 页:JSJA201902009
  • 页数:5
  • CN:02
  • ISSN:50-1075/TP
  • 分类号:39-43
摘要
随着数据分析研究的兴起,数据预处理越来越得到研究者的重视,其中缺失数据填补问题的重要性也逐渐显现。在ROUSTIDA数据补齐算法的基础上,针对具有关键属性的重复数据的特点,文中提出了一种改进的ROUSTIDA算法——Key&Rpt_RS算法。Key&Rpt_RS算法继承了ROUSTIDA算法的优势,同时考虑了目标数据的重复性特点,分析了关键属性对填补效果的影响,得到了更加准确且有效的填补结果。
        With the rise of data analysis,the importance of data pre-processing has attracted more and more attention,especially the imputation of missing data.Based on the ROUSTIDA algorithm,this paper proposed an improved ROUSTIDA algorithm-Key&Rpt_RS algorithm.Key&Rpt_RS algorithm inherits the advantages of ROUSTIDA algorithm,considers the characteristic of repeatability in objective data,and analyzes the influence of key attribute on imputation effect.At last,this paper conducted the experiments based on the alarm data in communication network.The results show that Key&Rpt_RS algorithm outperforms the traditional ROUSTIDA algorithm in terms of the imputation effect for missing data.
引文
[1] RUBIND B.Multiple imputation for nonresponse in surveys[J].Journal of Marketing Research,1987,137(1):180.
    [2] SHUAI P,LI X S,ZHOU X H,et al.Theresearchprocesson statistical processing of missing data[J].Chinese Journal of Health Statistics,2013,30(1):135-139.(in Chinese)帅平,李晓松,周晓华,等.缺失数据统计处理方法的研究进展[J].中国卫生统计,2013,30(1):135-139.
    [3] YUE Y,TIAN K C.Review of data missing and its imputation method[J].Journal of Preventive Medicine Information,2005,21(6):683-685.(in Chinese)岳勇,田考聪.数据缺失及其填补方法综述[J].预防医学情报杂志,2005,21(6):683-685.
    [4] JIN Y J.Imputation adjustment method for missing data[J].Journal of applied statistics and management,2001,20(6):47-53.(in Chinese)金勇进.缺失数据的插补调整[J].数理统计与管理,2001,20(6):47-53.
    [5] DEMPSTER A P.Maximum likelihood estimation from incomplete data via the EM algorithm[J].Journal of the Royal Statistical Society,1977,39(1):1-38.
    [6] JIN Y J.Adjusting for Missing Data by Weighting in Survey Analysis[J].Journal of applied statistics and management,2001(5):61-64.(in Chinese)金勇进.缺失数据的加权调整(系列之IV)[J].数理统计与管理,2001(5):61-64.
    [7] ROBINS J M,ROTNITZKY A,ZHAO L P.Estimation of Regression Coefficients When Some Regressors Are Not Always Observed[J].Journal of the American Statistical Association,1994,89(427):846-866.
    [8] ZHANG Z H,LIU W Q.An Improved Algorithm Based on the Incomplete Data of the Rough Set Theory[J].Computer Engineering&Science,2002,24(4):41-42.(in Chinese)张振华,刘文奇.一种基于粗集理论不完备数据的改进算法[J].计算机工程与科学,2002,24(4):41-42.
    [9] DUAN P,ZHUANG H L,HE L,et al.Improved algorithm based on incomplete data analysis method[J].Computer Engineering and Design,2009,30(7):1681-1684.(in Chinese)段鹏,庄红林,何磊,等.不完备数据分析方法(ROUSTIDA)的改进算法[J].计算机工程与设计,2009,30(7):1681-1684.
    [10]TIAN S X,WU X P,WANG H X.Improved method for data reinforcement based on ROUSTIDA[J].Journal of Naval University of Engineering,2011,23(5):11-15.(in Chinese)田树新,吴晓平,王红霞.一种基于改进的ROUSTIDA算法的数据补齐方法[J].海军工程大学学报,2011,23(5):11-15.
    [11]DING C R,LI L S.Improved ROUSTIDA algorithm based on similarity relation vector[J].Computer Engineering and Applications,2014,50(13):133-136.(in Chinese)丁春荣,李龙澍.基于相似关系向量的改进ROUSTIDA算法[J].计算机工程与应用,2014,50(13):133-136.
    [12]PAWLAK Z.Rough set[J].International Journal of Computer&Information Sciences,1982,11(5):341-356.
    [13]张文修.粗糙集理论与方法[M].北京:科学出版社,2001.
    [14]SKOWRON A,RAUSZER C.The Discernibility Matrices and Functions in Information Systems[M]∥Intelligent Decision Support.Springer,Dordrecht,1992:331-362.
    [15]王国胤.Rough集理论与知识获取[M].西安:西安交通大学出版社,2001.
    [16]ZHANG W,LIAO X F,WU Z F.An incomplete data analysis approach based on rough set theory[J].Pattern Recognition and Artificial Intelligence,2003,16(2):158-163.(in Chinese)张伟,廖晓峰,吴中福.一种基于Rough集理论的不完备数据分析方法[J].模式识别与人工智能,2003,16(2):158-163.
    [17]MENG J,LIU Y C,MO H B.New method of packing missing data based on rough set theory[J].Computer Engineering and Applications,2008,44(6):175-177.(in Chinese)孟军,刘永超,莫海波.基于粗糙集理论的不完备数据填补方法[J].计算机工程与应用,2008,44(6):175-177.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700