摘要
属性选择是一种有效的数据预处理方法。为了移除多变量时间序列属性集中的冗余属性和噪声属性,选择出包含足够原始信息并能提高精度的属性子集,提出一种基于相关性密度的属性选择算法。该算法使用相关性矩阵表示原多变量时间序列,定义每个属性的局部密度来表示属性的代表性,定义属性的判别距离作为该属性与其他属性间的区分度。最后根据决策图的分布来筛选具有较大代表性和区分度的属性。使用SVM分类器对UCI数据库中的4种不同数据集进行实验,实验结果表明该算法相比已有算法在分类准确度和时间效率上均有一定的优越性。
Attribute selection is an effective data preprocessing method. Aiming at removing redundant or noisy attributes from the multivariate time series attribute set and selecting an attribute subset containing enough original information to improve accuracy,an attribute selection algorithm based on correlation density is proposed. The algorithm employed in the correlation matrix to represent the original multivariate time series,the local density of each attribute to show its representative ability,the distance discriminant between attributes as their discriminant degree. Moreover,attributes with larger representativeness and discriminant degree were filtered according to the distribution of the decision graph. Experiments with SVM classifier on four different datasets from the UCI repository were performed. The experimental results demonstrate the great improvement of the proposed algorithm in classification accuracy and time efficiency when compared with the existing algorithms.
引文
[1]Sun X,Chen H,Wu Z,et al.Multifractal analysis of Hang Seng index in Hong Kong stock market[J].Physica A Statistical Mechanics&Its Applications,2001,291(1-4):553-562.
[2]Peng C K,Havlin S,Stanley H E,et al.Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time series[J].Chaos An Interdisciplinary Journal of Nonlinear Science,1995,5(1):82-87.
[3]Temme C,Ebinghaus R,Einax J W,et al.Time series analysis of long-term data sets of atmospheric mercury concentrations[J].Analytical&Bioanalytical Chemistry,2004,380(3):493.
[4]张延华,王国刚,李朋辉.基于时间序列的挖掘算法在流程工业产品质量控制模型中的应用[J].数学的实践与认识,2010,40(5):87-90.
[5]Vlachos M,Hadjieleftheriou M,Gunopulos D,et al.Indexing Multidimensional Time-Series[J].The VLDB Journal,2006,15(1):1-20.
[6]Wang X,Mueen A,Ding H,et al.Experimental comparison of representation methods and distance measures for time series data[J].Data Mining and Knowledge Discovery,2013,26(2):275-309.
[7]Mao Y,Zhou X B,Xia Z,et al.Survey for study of feature selection algorithms[J].Pattern Recognition&Artificial Intelligence,2007,20(2):211-218.
[8]郑宝芬,苏宏业,罗林.无监督特征选择在时间序列数据挖掘中的应用[J].仪器仪表学报,2014,35(4):834-840.
[9]吴虎胜,张凤鸣,徐显亮,等.多变量时间序列的无监督属性选择算法[J].模式识别与人工智能,2013,26(10):916-923.
[10]Lal T N,Schr9der M,Hinterberger T,et al.Support vector channel selection in BCI.[J].IEEE Transactions on Biomedical Engineering,2004,51(6):1003-1010.
[11]Yoon H,Yang K,Shahabi C.Feature Subset Selection and Feature Ranking for Multivariate Time Series[J].IEEETransactions on Knowledge&Data Engineering,2005,17(9):1186-1198.
[12]Li H.Accurate and efficient classification based on common principal components analysis for multivariate time series[J].Neurocomputing,2015,171(C):744-753.
[13]Han M,Liu X.Feature selection techniques with class separability for multivariate time series[J].Neurocomputing,2013,110(8):29-34.
[14]Sakar C O,Kursun O.A method for combining mutual information and canonical correlation analysis:Predictive Mutual Information and its use in feature selection[J].Expert Systems with Applications,2012,39(3):3333-3344.
[15]Bacciu D.Unsupervised feature selection for sensor time-series in pervasive computing applications[J].Neural Computing and Applications,2016,27(5):1-15.
[16]Rodriguez A,Laio A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[17]Liu P,Liu Y,Hou X,et al.A Text Clustering Algorithm Based on Find of Density Peaks[C]//International Conference on Information Technology in Medicine and Education,2015:348-352.
[18]Chen P,Fan X,Liu R,et al.Fiber segmentation using a density-peaks clustering algorithm[C]//IEEE,International Symposium on Biomedical Imaging.IEEE,2015:633-637.
[19]Sun K,Geng X,Ji L.Exemplar Component Analysis:A Fast Band Selection Method for Hyperspectral Imagery[J].IEEEGeoscience&Remote Sensing Letters,2015,12(5):998-1002.
[20]Keogh E,Ratanamahatana C A.Exact indexing of dynamic time warping[J].Knowledge&Information Systems,2005,7(3):358-386.
[21]王娟,慈林林,姚康泽.特征选择方法综述[J].计算机工程与科学,2005,27(12):68-71.
[22]Chang C C,Lin C J.LIBSVM:A library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology,2011,2(3):1-27.