Identifying a small set of marker genes using minimum expected cost of misclassification
详细信息查看全文 | 推荐本文 |
摘要

Objectives

This paper presents a model independent feature selection approach to identify a small subset of marker genes.

Methods and material

An evaluation measure, minimum expected cost of misclassification (MEMC), is used to estimate the discriminative power of a feature subset without building a model. The MECM measure is combined with sequential forward search for feature selection. This approach was applied to a breast cancer profiling problem, with the goal of identifying a small number of marker genes whose expression can be used to predict cancer molecular subtype (p53 gene status). Furthermore, the method was also applied to find a small set of single-nucleotide polymorphisms (SNPs) that can be used to predict molecular phenotype of a different type, namely alleles (genetic variants) of human leukocyte antigen genes that play an important roles in autoimmunity.

Results

Two marker genes were identified based on p53 status, which achieved a p-value of 7.53 脳 10鈭? (vs. 6 脳 10鈭? with 32 genes identified by previous research) in survival analysis. Six SNP loci were identified that achieved a leave-one-out cross-validation accuracy of 92.8%(vs. 90.6%and 89.5%with 18 SNPs selected using 2 statistics and information gain, respectively).

Conclusion

The MECM-based feature selection approach is capable of identifying a smaller subset of market genes with comparable or even better performance than that obtained using conventional filter methods.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700