文摘
Penalized logistic regression is useful for classification that not only provides class probability estimates but also can overcome overfitting problem. Traditionally, supervised classifier learning has required a lot of labeled data. Due to technical innovation, it is easy to collect large amounts of unlabeled data, while labeling is usually expensive and difficult. Active learning aims to select the most informative subjects for labeling to decrease the amount of labeling requests. Recently, active learning using experimental design techniques have attracted considerable attention. The typical criteria attempt to reduce the generalization error of a model by minimizing either its estimation variance or estimation bias. However, they fail to take into account both components simultaneously. In this article, we introduce a new algorithm of active learning using penalized logistic regression. The most informative subjects are selected as those with the smallest mean squared estimation error. This criterion, integrated with the idea of sequential design, is exploited in our algorithms to guide a procedure for a new subject selection. Experiments on extensive real-world data sets demonstrate the effectiveness and efficiency of the proposed method compared to several state-of-the-art active-learning alternatives.