摘要
高维特征数据包含大量的无关信息和冗余信息,这些信息可能会极大降低学习算法的效率。对于加速机器学习算法,提升学习模型泛化能力和避免维数灾难的影响,特征选择算法在很多应用场景下扮演重要角色。在数据特征空间未知,动态变化的场景下,传统的基于静态特征空间场景的特征选择算法因效率低而不适用。为解决特征空间动态未知的流特征场景下的特征选择问题,提出基于?2,1范数的在线流特征选择算法。利用?2,1范数的行稀疏性质和噪声不敏感的特性实现特征选择模型的构建。实验表明,在多个高维数据集上,新提出的流特征选择算法相比较其他的流特征选择算法具有较高的分类识别率和稳定性。
High dimensional streaming feature data contain a mass of irrelevant and redundant information,which may greatly reduce the efficiency of learning algorithms. Feature selection algorithms play an important role in many application scenarios for speeding up machine learning algorithms,improving the generalization ability of learning models and avoiding the curse of dimensionality. In the scene where the feature space is unknown and dynamic,the traditional feature selection algorithm based on the static feature space is not suitable for low efficiency. In order to solve streaming feature selection problem that feature space is dynamic and unknown,the paper proposes the online streaming feature selection regularized by ?2,1-norm. The paper constructes the feature selection model using the sparse property of the ?2,1-norm and the insensitivity of the noise. Experimental results demonstrate that,compared with other streaming feature selection algorithms,the proposed feature selection algorithm has higher recognition performance and stability in multiple high-dimensional datasets.
引文
[1]Beezer R A,Hastie T,Tibshirani R,et al. The Elements of Statistical Learning:Data Mining,Inference and Prediction. By[J]. Journal of the Royal Statistical Society,2006,167(1):192-192.
[2]Peng H,Long F,Ding C. Feature Selection Based on Mutual Information:Criteria of Max-Dependency,Max-Relevance,and Min-Redundancy[M]. IEEE Computer Society,2005.
[3]Song L,Smola A,Gretton A,et al. Feature selection via dependence maximization[J]. Journal of Machine Learning Research,2012,1(1):1393-1434.
[4] Urbach E R,Stepinski T F. Automatic detection of sub-km craters in high resolution planetary images[J].Planetary&Space Science,2009,57(7):880-887.
[5]Ding W,Stepinski T F,Mu Y,et al. Subkilometer crater discovery with boosting and transfer learning[J]. Acm Transactions on Intelligent Systems&Technology,2011,2(4):1-22.
[6]Perkins S,Theiler J. Online Feature Selection using Grafting[C]//2003:592--599.
[7]Glocer K,Eads D,Theiler J. Online feature selection for pixel classification[C]//International Conference on Machine Learning. ACM,2005:249-256.
[8]Zhou J,Foster D,Stine R,et al. Streaming feature selection using alpha-investing[C]//Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM,2005:384-393.
[9]Zhou J,Foster D P,Stine R A,et al. Streamwise Feature Selection[J]. Journal of Machine Learning Research,2006,7(1):1861-1885.
[10]Kohavi R,John G H. Wrappers for feature subset selection[J]. Artificial Intelligence,1996,97(1-2):273-324.
[11]Yu L,Liu H. Efficient Feature Selection via Analysis of Relevance and Redundancy[J]. Journal of Machine Learning Research,2004,5(12):1205-1224.
[12]Wu X,Yu K,Ding W,et al. Online Feature Selection with Streaming Features[J]. IEEE Transactions on Pattern Analysis&Machine Intelligence,2013,35(5):1178-1192.
[13]Yu K,Wu X,Ding W,et al. Towards Scalable and Accurate Online Feature Selection for Big Data[J]. Acm Transactions on Knowledge Discovery from Data,2016,11(2):16.
[14]Li J,Hu X,Tang J,et al. Unsupervised Streaming Feature Selection in Social Media[J]. 2015:1041-1050.
[15]Yang Y,Shen H T,Ma Z,et al. l 2,1-norm regularized discriminative feature selection for unsupervised learning[C]//International Joint Conference on Artificial Intelligence. AAAI Press,2011:1589-1594.
[16]Nie F,Huang H,Cai X,et al. Efficient and robust feature selection via joint?2,1-norms minimization[C]//International Conference on Neural Information Processing Systems. Curran Associates Inc. 2010:1813-1821.
[17]Wen J,Lai Z,Wong W K,et al. Optimal Feature Selection for Robust Classification via?2,1-Norms Regularization[J]. 2014:517-521.
[18]Fletcher R. Practical Methods of Optimization 2nd edn[J]. Journal of the Operational Research Society,2000,32(5):417-417.
[19]Yu K,Ding W,Wu X. LOFS:A library of online streaming feature selection[J]. Knowledge-Based Systems,2016,113:1-3.