Unsupervised feature selection with ensemble learning
详细信息    查看全文
  • 作者:Haytham Elghazel ; Alex Aussem
  • 关键词:Unsupervised learning ; Feature selection ; Ensemble methods ; Random forest
  • 刊名:Machine Learning
  • 出版年:2015
  • 出版时间:January 2015
  • 年:2015
  • 卷:98
  • 期:1-2
  • 页码:157-180
  • 全文大小:942 KB
  • 参考文献:1. Barkia, H., Elghazel, H., & Aussem, A. (2011). Semi-supervised feature importance evaluation with ensemble learning. In / 11th IEEE international conference on data mining, ICDM-1, Vancouver (pp. 31-0). CrossRef
    2. Bellal, F., Elghazel, H., & Aussem, A. (2012). A semi-supervised feature ranking method with ensemble learning. / Pattern Recognition Letters, / 33(10), 1426-432. CrossRef
    3. Blake, C., & Merz, C. (1998). / Uci repository of machine learning databases.
    4. Breiman, L. (2001). Random forests. / Machine Learning, / 45(1), 5-2. CrossRef
    5. Breiman, L., & Cutler, A. (2003). / Random forests manual v4.0. Technical report, UC Berkeley. http://oz.berkeley.edu/users/breiman/Using_random_forests_v4.0.pdf.
    6. Cattell, R. B. (1966). The scree test for the number of factors. / Multivariate Behavioral Research, / 2, 245-76. CrossRef
    7. Chen, X., Ye, Y., Xu, X., & Huang, J. Z. (2012). A feature group weighting method for subspace clustering of high-dimensional data. / Pattern Recognition, / 45(1), 434-46. CrossRef
    8. Dash, M., Choi, K., Scheuermann, P., & Liu, H. (2002). Feature selection for clustering-a filter solution. In / IEEE international conference on data mining (pp. 115-22).
    9. Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. / Journal of Machine Learning Research, / 7, 1-0.
    10. Dudoit, S., & Fridlyand, J. (2003). Bagging to improve the accuracy of a clustering procedure. / Bioinformatics, / 19(9), 1090-099. CrossRef
    11. Dy, J., & Brodley, C. (2004). Feature selection for unsupervised learning. / Journal of Machine Learning Research, / 5, 845-89.
    12. Elghazel, H., & Aussem, A. (2010). Feature selection for unsupervised learning using random cluster ensembles. In / IEEE International Conference on Data Mining (pp. 168-75).
    13. Fred, A., & Jain, A. (2002). Data clustering using evidence accumulation. In / 16th international conference on pattern recognition (pp. 276-80).
    14. Fred, A., & Jain, A. (2005). Combining multiple clusterings using evidence accumulation. / IEEE Transactions on Pattern Analysis and Machine Intelligence, / 27(6), 835-50. CrossRef
    15. Frigui, H., & Nasraoui, O. (2004). Unsupervised learning of prototypes and attribute weights. / Pattern Recognition, / 37(3), 567-81. CrossRef
    16. Ghaemi, R., Sulaiman, N., Ibrahim, H., & Mustapha, N. (2009). A survey: clustering ensembles techniques. In / Engineering and technology (Vol.?38). Singapore: World Scientific.
    17. Golub, T., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., & Coller, H. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. / Science, / 286, 531-37. CrossRef
    18. Grozavu, N., Bennani, Y., & Lebbah, M. (2009). From variable weighting to cluster characterization in topographic unsupervised learning. In / IEEE international joint conference on neural network (pp. 1005-010).
    19. Gullo, F., Talukder, A. K. M. K., Luke, S., Domeniconi, C., & Tagarelli, A. (2012). Multiobjective optimization of co-clustering ensembles. In / Genetic and evolutionary computation conference, GECCO-2 (pp.?1495-496).
    20. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. / Journal of Machine Learning Research, / 3, 1157-182.
    21. Hong, Y., Kwong, S., Chang, Y., & Qingsheng, R. (2008a). Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. / Pattern Recognition, / 41(9), 2742-756. CrossRef
    22. Hong, Y.,
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Automation and Robotics
    Computing Methodologies
    Simulation and Modeling
    Language Translation and Linguistics
  • 出版者:Springer Netherlands
  • ISSN:1573-0565
文摘
In this paper, we show that the way internal estimates are used to measure variable importance in Random Forests are also applicable to feature selection in unsupervised learning. We propose a new method called Random Cluster Ensemble (RCE for short), that estimates the out-of-bag feature importance from an ensemble of partitions. Each partition is constructed using a different bootstrap sample and a random subset of the features. We provide empirical results on nineteen benchmark data sets indicating that RCE, boosted with a recursive feature elimination scheme (RFE) (Guyon and Elisseeff, Journal of Machine Learning Research, 3:1157-182, 2003), can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art supervised and unsupervised algorithms, with a very limited subset of features. The method shows promise to deal with very large domains. All results, datasets and algorithms are available on line (http://perso.univ-lyon1.fr/haytham.elghazel/RCE.zip).

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700