Unsupervised feature selection with ensemble learning

详细信息查看全文

作者：Haytham Elghazel ; Alex Aussem
关键词：Unsupervised learning ; Feature selection ; Ensemble methods ; Random forest
刊名：Machine Learning
出版年：2015
出版时间：January 2015
年：2015
卷：98
期：1-2
页码：157-180
全文大小：942 KB
参考文献：1. Barkia, H., Elghazel, H., & Aussem, A. (2011). Semi-supervised feature importance evaluation with ensemble learning. In / 11th IEEE international conference on data mining, ICDM-1, Vancouver (pp. 31-0). CrossRef
2. Bellal, F., Elghazel, H., & Aussem, A. (2012). A semi-supervised feature ranking method with ensemble learning. / Pattern Recognition Letters, / 33(10), 1426-432. CrossRef
3. Blake, C., & Merz, C. (1998). / Uci repository of machine learning databases.
4. Breiman, L. (2001). Random forests. / Machine Learning, / 45(1), 5-2. CrossRef
5. Breiman, L., & Cutler, A. (2003). / Random forests manual v4.0. Technical report, UC Berkeley. http://oz.berkeley.edu/users/breiman/Using_random_forests_v4.0.pdf.
6. Cattell, R. B. (1966). The scree test for the number of factors. / Multivariate Behavioral Research, / 2, 245-76. CrossRef
7. Chen, X., Ye, Y., Xu, X., & Huang, J. Z. (2012). A feature group weighting method for subspace clustering of high-dimensional data. / Pattern Recognition, / 45(1), 434-46. CrossRef
8. Dash, M., Choi, K., Scheuermann, P., & Liu, H. (2002). Feature selection for clustering-a filter solution. In / IEEE international conference on data mining (pp. 115-22).
9. Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. / Journal of Machine Learning Research, / 7, 1-0.
10. Dudoit, S., & Fridlyand, J. (2003). Bagging to improve the accuracy of a clustering procedure. / Bioinformatics, / 19(9), 1090-099. CrossRef
11. Dy, J., & Brodley, C. (2004). Feature selection for unsupervised learning. / Journal of Machine Learning Research, / 5, 845-89.
12. Elghazel, H., & Aussem, A. (2010). Feature selection for unsupervised learning using random cluster ensembles. In / IEEE International Conference on Data Mining (pp. 168-75).
13. Fred, A., & Jain, A. (2002). Data clustering using evidence accumulation. In / 16th international conference on pattern recognition (pp. 276-80).
14. Fred, A., & Jain, A. (2005). Combining multiple clusterings using evidence accumulation. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 27(6), 835-50. CrossRef
15. Frigui, H., & Nasraoui, O. (2004). Unsupervised learning of prototypes and attribute weights. / Pattern Recognition, / 37(3), 567-81. CrossRef
16. Ghaemi, R., Sulaiman, N., Ibrahim, H., & Mustapha, N. (2009). A survey: clustering ensembles techniques. In / Engineering and technology (Vol.?38). Singapore: World Scientific.
17. Golub, T., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., & Coller, H. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. / Science, / 286, 531-37. CrossRef
18. Grozavu, N., Bennani, Y., & Lebbah, M. (2009). From variable weighting to cluster characterization in topographic unsupervised learning. In / IEEE international joint conference on neural network (pp. 1005-010).
19. Gullo, F., Talukder, A. K. M. K., Luke, S., Domeniconi, C., & Tagarelli, A. (2012). Multiobjective optimization of co-clustering ensembles. In / Genetic and evolutionary computation conference, GECCO-2 (pp.?1495-496).
20. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. / Journal of Machine Learning Research, / 3, 1157-182.
21. Hong, Y., Kwong, S., Chang, Y., & Qingsheng, R. (2008a). Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. / Pattern Recognition, / 41(9), 2742-756. CrossRef
22. Hong, Y.,
刊物类别：Computer Science
刊物主题：Artificial Intelligence and Robotics
Automation and Robotics
Computing Methodologies
Simulation and Modeling
Language Translation and Linguistics
出版者：Springer Netherlands
ISSN：1573-0565

文摘

In this paper, we show that the way internal estimates are used to measure variable importance in Random Forests are also applicable to feature selection in unsupervised learning. We propose a new method called Random Cluster Ensemble (RCE for short), that estimates the out-of-bag feature importance from an ensemble of partitions. Each partition is constructed using a different bootstrap sample and a random subset of the features. We provide empirical results on nineteen benchmark data sets indicating that RCE, boosted with a recursive feature elimination scheme (RFE) (Guyon and Elisseeff, Journal of Machine Learning Research, 3:1157-182, 2003), can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art supervised and unsupervised algorithms, with a very limited subset of features. The method shows promise to deal with very large domains. All results, datasets and algorithms are available on line (http://perso.univ-lyon1.fr/haytham.elghazel/RCE.zip).

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700