摘要
数据缺失在时间序列采集过程中频繁发生,已经严重阻碍了精确的数据分析。然而,现有的缺失数据预测算法多是从采集到的数据中发现某种规律,从而预测缺失的数据,并不适用于缺失数据较多的情况。基于此,提出了一种基于压缩感知的缺失数据预测算法。首先,该算法利用时间序列的时域平滑特性设计稀疏表示基,从而将缺失数据预测问题转化成稀疏向量恢复问题。其次,根据未缺失数据的位置特点设计了与稀疏表示基相关性低的观测矩阵,从而保证了算法的重构性能。仿真结果表明,即使数据缺失率高达90%,所提方法依然可以非常有效地预测出缺失数据。
The frequent occurrence of data loss in time series acquisition process has seriously hindered the accurate data analysis. However,most of the existing methods mainly find a certain pattern from the collected data to predict the missing data,which are only feasible to be applied to the case where only a low ratio of collected data are missing. In view of the problem above,this paper proposed an algorithm of missing data prediction based on compressive sensing. The missing data prediction problem is formulated as the multiple sparse vectors recovery problem. Firstly,the sparse representation basis is designed by making use of the temporal smoothness of time series,thus transforming the missing data prediction problem into the problem of the sparse vector recovery. Secondly,the observation matrix is designed based on the location characteristics of the data that are not missing,which is lowly coherent with the designed representation bases,thus ensuring the reconstruction performance of the proposed algorithm. The simulation results show that the proposed algorithm can predict the missing data very effectively even if the ratio of data loss is as high as 90%.
引文
[1] SHI W,ZHU Y,ZHANG J,et al.Improving Power Grid Monitoring Data Quality:An Efficient Machine Learning Framework for Missing Data Prediction [C]//IEEE International Con-ference on High Performance Computing and Communications.IEEE,2015:417-422.
[2] BATINI C,CAPPIELLO C,FRANCALANCI C,et al.Methodo- logies for data quality assessment and improvement [J].Acm Computing Surveys,2009,41(3):1-52.
[3] LUEBBERS D,GRIMMER U,JARKE M.Systematic Development of Data Mining-Based Data Quality Tools[C]//Procee-dings of the 29th VLDB Conference.Morgan Kaufmann:San Francisco,2003:548-559.
[4] WU S F,CHANG C Y,LEE S J.Time series forecasting with missing values[C]//2015 1st International Conference on Industrial Networks and Intelligent Systems (INISCom).2015:151-156.
[5] BALOUJI E,SALOR Q,ERMIS M.Exponential smoothing of multiple reference frame components with GPUs for real-time detection of time-varying harmonics and interharmonics of EAF currents [C]//IEEE Industry Applications Society Meeting.IEEE,2017:1-8.
[6] KOZERA R,WILKOLAZKA M.Natural spline interpolation and exponential parameterization for length estimation of curves [C]//International Conference of Numerical Analysis & Applied Mathematics.AIP Publishing LLC,2017:1-140.
[7] JUNNINEN H,NISKA H,TUPPURAINEN K,et al.Methods for imputation of missing values in air quality data sets[J].Atmospheric Environment,2004,38(18):2895-2907.
[8] HONG S T,CHANG J W.A New Data Filtering Scheme Based on Statistical Data Analysis for Monitoring Systems in Wireless Sensor Networks[C]//IEEE International Conference on High Performance Computing and Communications.IEEE,2011:635-640.
[9] FUNG D S.Methods for the estimation of missing values in time series[J/OL].Theses Doctoratos & Masters,2006.http://ro.ecu.edu.au/theses/63.
[10] LAO W,WANG Y,PENG C,et al.Time series forecasting via weighted combination of trend and seasonality respectively with linearly declining increments and multiple sine functions[C]//2014 International Joint Conference on Neural Networks (IJCNN).2014:832-837.
[11] NEWSHAM G R,BIRT B J.Building-level occupancy data to improve arima-based electricity use forecasts[C]//Proceedings of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building.ACM,New York,USA,2010:13-18.
[12] SHI W,ZHU Y,ZHANG J,et al.Improving power grid monitoring data quality:An efficient machine learning framework for missing data prediction[C]//2015 IEEE 17th International Conference on High Performance Computing and Communications.IEEE,2015:417-422.
[13] WEI G,KUN N,MAN C,et al.A data prediction algorithm based on BP neural network in telecom industry[C]//2011 International Conference on Computer Science and Service System (CSSS).2011.
[14] LI L,LI Y,LI Z.Efficient missing data imputing for traffic flow by considering temporal and spatial dependence [J].Transportation Research Part C,2013,34(9):108-120.
[15] QU L,LI L,ZHANG Y,et al.PPCA-based missing data imputation for traffic flow volume:a systematical approach[J].IEEE Transactions on Intelligent Transportation Systems,2009,10(3):512-522.
[16] SHI W,ZHU Y,YU P,et al.Effective Prediction of Missing Data on Apache Spark over Multivariable Time Series[J].IEEE Transactions on Big Data,2017,PP(99):1.
[17] CAI Y,TONG H,FAN W,et al.Fast mining of a network of coevolving time series[C]//The 2015 SIAM International Conference on Data Mining.2015:298-306.
[18] FONOLLOSA J,SHEIK S,HUERTA R,et al.Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring[J].Sensors & Actuators,2015,215:618-629.
[19] RHEE I,SHIN M.Mobility traces[OL].http://carwdad.org/ncsu/mobilitymodels.
[20] WU X,LIU M.In-situ soil moisture sensing:Measurement scheduling and estimation using compressive sensing [C]//Proceedings of the 11th ACM International Conference on Information Processing in Sensor Networks.IEEE,2012:1-12.
[21] CHEN S S,DONOHO D L,SAUNDERS M A.Atomic decomposition by basis pursuit[J].SIAM Review,2001,43(1):129-159.
[22] TROPP J A,GILBERT A C.Signal recovery from random measurements via orthogonal matching pursuit[J].IEEE Transactions Information Theory,2007,53(12):4655-4666.
[23] ZHANG Z,RAO B D.Sparse Signal Recovery With Temporally Correlated Source Vectors Using Sparse Bayesian Learning [J].IEEE Journal of Selected Topics in Signal Processing,2011,5(5):912-926.
[24] Al-SHOUKAIRI M,SCHNITER P,RAO B D.A GAMP Based Low Complexity Sparse Bayesian Learning Algorithm [J].IEEE Transactions on Signal Processing,2018,66(2):294-308.