Nonparametric variable selection and classification: The CATCH algorithm
详细信息    查看全文
文摘
The problem of classifying a categorical response is considered in a nonparametric framework. The distribution of depends on a vector of predictors , where the coordinates of may be continuous, discrete, or categorical. An algorithm is constructed to select the variables to be used for classification. For each variable , an importance score is computed to measure the strength of association of with . The algorithm deletes if falls below a certain threshold. It is shown in Monte Carlo simulations that the algorithm has a high probability of only selecting variables associated with . Moreover when this variable selection rule is used for dimension reduction prior to applying classification procedures, it improves the performance of these procedures. The approach for computing importance scores is based on root Chi-square type statistics computed for randomly selected regions (tubes) of the sample space. The size and shape of the regions are adjusted iteratively and adaptively using the data to enhance the ability of the importance score to detect local relationships between the response and the predictors. These local scores are then averaged over the tubes to form a global importance score for variable . When confounding and spurious associations are issues, the nonparametric importance score for variable is computed conditionally by using tubes to restrict the other variables. This variable selection procedure is called CATCH (Categorical Adaptive Tube Covariate Hunting). Asymptotic properties, including consistency, are established.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700