文摘
Feature selection is an important step for large-scale image data analysis, which has been proved to be difficult due to large size in both dimensions and samples. Feature selection firstly eliminates redundant and irrelevant features and then chooses a subset of features that performs as efficient as the complete set. Generally, supervised feature selection yields better performance than unsupervised feature selection because of the utilization of labeled information. However, labeled data samples are always expensive to obtain, which constraints the performance of supervised feature selection, especially for the large web image datasets. In this paper, we propose a semi-supervised feature selection algorithm that is based on a hierarchical regression model. Our contribution can be highlighted as: (1) Our algorithm utilizes a statistical approach to exploit both labeled and unlabeled data, which preserves the manifold structure of each feature type. (2) The predicted label matrix of the training data and the feature selection matrix are learned simultaneously, making the two aspects mutually benefited. Extensive experiments are performed on three large-scale image datasets. Experimental results demonstrate the better performance of our algorithm, compared with the state-of-the-art algorithms.