A comparative study of improvements Pre-filter methods bring on feature selection using microarray data
详细信息    查看全文
  • 作者:Yingying Wang (1)
    Xiaomao Fan (1)
    Yunpeng Cai (1)

    1. Research Center for Biomedical Information
    ; Shenzhen Institutes of Advanced Technologies ; Chinese Academy of Sciences ; Shenzhen ; China
  • 关键词:Comparative study ; Feature selection ; Microarray
  • 刊名:Health Information Science and Systems
  • 出版年:2014
  • 出版时间:December 2014
  • 年:2014
  • 卷:2
  • 期:1
  • 全文大小:927 KB
  • 参考文献:1. Saeys, Y, Inza, I, Larranaga, P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23: pp. 2507-2517 CrossRef
    2. Ma, S, Huang, J (2008) Penalized feature selection and classification in bioinformatics. Brief Bioinform 9: pp. 392-403 CrossRef
    3. Zhou, W, Dickerson, JA (2014) A novel class dependent feature selection method for cancer biomarker discovery. Comput Biol Med 47: pp. 66-75 CrossRef
    4. Martinez, E, Alvarez, MM, Trevino, V (2010) Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm. Comput Biol Chem 34: pp. 244-250 CrossRef
    5. Arisi, I, D鈥橭nofrio, M, Brandi, R, Felsani, A, Capsoni, S, Drovandi, G, Felici, G, Weitschek, E, Bertolazzi, P, Cattaneo, A (2011) Gene expression biomarkers in the brain of a mouse model for Alzheimer鈥檚 disease: mining of microarray data by logic classification and feature selection. J Alzheimers Dis 24: pp. 721-738
    6. Schaub, MA, Kaplow, IM, Sirota, M, Do, CB, Butte, AJ, Batzoglou, S (2009) A Classifier-based approach to identify genetic similarities between diseases. Bioinformatics 25: pp. i21-i29 CrossRef
    7. Teschendorff, AE, Naderi, A, Barbosa-Morais, NL, Pinder, SE, Ellis, IO, Aparicio, S, Brenton, JD, Caldas, C (2006) A consensus prognostic gene expression classifier for ER positive breast cancer. Genome Biol 7: pp. R101 CrossRef
    8. Aguiar-Pulido, V, Seoane, JA, Rabunal, JR, Dorado, J, Pazos, A, Munteanu, CR (2010) Machine learning techniques for single nucleotide polymorphism鈥揹isease classification models in schizophrenia. Molecules 15: pp. 4875-4889 CrossRef
    9. Aerts, S, Lambrechts, D, Maity, S, Van Loo, P, Coessens, B, De Smet, F, Tranchevent, LC, De Moor, B, Marynen, P, Hassan, B, Carmeliet, P, Moreau, Y (2006) Gene prioritization through genomic data fusion. Nat Biotechnol 24: pp. 537-544 CrossRef
    10. Ma, X, Lee, H, Wang, L, Sun, F (2007) CGI: a new approach for prioritizing genes by combining gene expression and protein-protein interaction data. Bioinformatics 23: pp. 215-221 CrossRef
    11. Qiu, YQ, Zhang, S, Zhang, XS, Chen, L (2010) Detecting disease associated modules and prioritizing active genes based on high throughput data. BMC Bioinformatics 11: pp. 26 CrossRef
    12. Cho, SB, Kim, J, Kim, JH (2009) Identifying set-wise differential co-expression in gene expression microarray data. BMC Bioinformatics 10: pp. 109 CrossRef
    13. Watson, M (2006) CoXpress: differential co-expression in gene expression data. BMC Bioinformatics 7: pp. 509 CrossRef
    Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther 69: pp. 89-95 CrossRef
    14. Azuaje, F, Devaux, Y, Wagner, D (2009) Computational biology for cardiovascular biomarker discovery. Brief Bioinform 10: pp. 367-377 CrossRef
    15. Hilario, M, Kalousis, A (2008) Approaches to dimensionality reduction in proteomic biomarker studies. Brief Bioinform 9: pp. 102-118 CrossRef
    16. Maulik, U, Mukhopadhyay, A, Chakraborty, D (2013) Gene-expression-based cancer subtypes prediction through feature selection and transductive SVM. IEEE Trans Biomed Eng 60: pp. 1111-1117 CrossRef
    17. Jafari, P, Azuaje, F (2006) An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors. BMC Med Inform Decis Mak 6: pp. 27 CrossRef
    18. Baldi, P, Long, AD (2001) A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes. Bioinformatics 17: pp. 509-519 CrossRef
    19. Inza, I, Larranaga, P, Blanco, R, Cerrolaza, AJ (2004) Filter versus wrapper gene selection approaches in DNA microarray domains. Artif Intell Med 31: pp. 91-103 CrossRef
    20. Jirapech-Umpai, T, Aitken, S (2005) Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes. BMC Bioinformatics 6: pp. 148 CrossRef
    21. Diaz-Uriarte, R, de Alvarez Andres, S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7: pp. 3 CrossRef
    22. Duval, B, Hao, JK (2010) Advances in metaheuristics for gene selection and classification of microarray data. Brief Bioinform 11: pp. 127-141 CrossRef
    23. Lee, HW, Lawton, C, Na, YJ, Yoon, S (2013) Robustness of chemometrics-based feature selection methods in early cancer detection and biomarker discovery. Stat Appl Genet Mol Biol 12: pp. 207-223
    24. Assawamakin, A, Prueksaaroon, S, Kulawonganunchai, S, Shaw, PJ, Varavithya, V, Ruangrajitpakorn, T, Tongsima, S (2013) Biomarker selection and classification of 鈥?omics鈥?data using a two-step bayes classification framework. Biomed Res Int 2013: pp. 148014 CrossRef
    25. Ashburner, M, Ball, CA, Blake, JA, Botstein, D, Butler, H, Cherry, JM, Davis, AP, Dolinski, K, Dwight, SS, Eppig, JT, Harris, MA, Hill, DP, Issel-Tarver, L, Kasarskis, A, Lewis, S, Matese, JC, Richardson, JE, Ringwald, M, Rubin, GM, Sherlock, G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: pp. 25-29 CrossRef
    26. Rapaport, F, Zinovyev, A, Dutreix, M, Barillot, E, Vert, JP (2007) Classification of microarray data using gene networks. BMC Bioinformatics 8: pp. 35 CrossRef
    27. Wei, Z, Li, H (2007) A Markov random field model for network-based analysis of genomic data. Bioinformatics 23: pp. 1537-1544 CrossRef
    28. Li, C, Li, H (2008) Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24: pp. 1175-1182 CrossRef
    29. Bandyopadhyay, N, Kahveci, T, Goodison, S, Sun, Y, Ranka, S (2009) Pathway-based feature selection algorithm for cancer microarray data. Adv Bioinformatics 2009: pp. 532989 CrossRef
    30. Wei, P, Pan, W (2008) Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model. Bioinformatics 24: pp. 404-411 CrossRef
    31. Edgar, R, Domrachev, M, Lash, AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30: pp. 207-210 CrossRef
    32. Rebhan, M, Chalifa-Caspi, V, Prilusky, J, Lancet, D (1997) GeneCards: integrating information about genes, proteins and diseases. Trends Genet 13: pp. 163 CrossRef
    33. Becker, KG, Barnes, KC, Bright, TJ, Wang, SA (2004) The genetic association database. Nat Genet 36: pp. 431-432 CrossRef
    34. Nishimura, D (2001) BioCarta. Biotech Software Internet Report 2: pp. 117-120 CrossRef
    35. Kanehisa, M, Goto, S, Sato, Y, Kawashima, M, Furumichi, M, Tanabe, M (2014) Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res 42: pp. D199-D205 CrossRef
    36. Schaefer, CF, Anthony, K, Krupa, S, Buchoff, J, Day, M, Hannay, T, Buetow, KH (2009) PID: the Pathway Interaction Database. Nucleic Acids Res 37: pp. D674-D679 CrossRef
    37. Croft, D, Mundo, AF, Haw, R, Milacic, M, Weiser, J, Wu, G, Caudy, M, Garapati, P, Gillespie, M, Kamdar, MR, Jassal, B, Jupe, S, Matthews, L, May, B, Palatnik, S, Rothfels, K, Shamovsky, V, Song, H, Williams, M, Birney, E, Hermjakob, H, Stein, L, D'Eustachio, P (2014) The Reactome pathway knowledgebase. Nucleic Acids Res 42: pp. D472-D477 CrossRef
    38. Hsu, SD, Lin, FM, Wu, WY, Liang, C, Huang, WC, Chan, WL, Tsai, WT, Chen, GZ, Lee, CJ, Chiu, CM, Chien, CH, Wu, MC, Huang, CY, Tsou, AP, Huang, HD (2011) miRTarBase: a database curates experimentally validated microRNA-target interactions. Nucleic Acids Res 39: pp. D163-D169 CrossRef
    39. Cai, Y, Sun, Y, Cheng, Y, Li, J, Goodison, S (2010) Fast Implementation of l1 Regularized Learning Algorithms Using Gradient Descent Methods. The 10th SIAM International Conference on Data Mining (SDM10). SIAM (Society of Industrial and Applied Mathematics), Columbus, Ohio, USA
    40. Linden, A (2006) Measuring diagnostic and predictive accuracy in disease management: an introduction to receiver operating characteristic (ROC) analysis. J Eval Clin Pract 12: pp. 132-139 CrossRef
    41. Sayed, D, Hong, C, Chen, IY, Lypowy, J, Abdellatif, M (2007) MicroRNAs play an essential role in the development of cardiac hypertrophy. Circ Res 100: pp. 416-424 CrossRef
    42. Fang, Y, Shi, C, Manduchi, E, Civelek, M, Davies, PF (2010) MicroRNA-10a regulation of proinflammatory phenotype in athero-susceptible endothelium in vivo and in vitro. Proc Natl Acad Sci U S A 107: pp. 13450-13455 CrossRef
    43. Shi, MA, Shi, GP (2010) Intracellular delivery strategies for microRNAs and potential therapies for human cardiovascular diseases. Sci Signal 3: pp. 40
    44. Eisenberg, I, Eran, A, Nishino, I, Moggio, M, Lamperti, C, Amato, AA, Lidov, HG, Kang, PB, North, KN, Mitrani-Rosenbaum, S, Flanigan, KM, Neely, LA, Whitney, D, Beggs, AH, Kohane, IS, Kunkel, LM (2007) Distinctive patterns of microRNA expression in primary muscular disorders. Proc Natl Acad Sci U S A 104: pp. 17016-17021 CrossRef
    45. Hibino, S, Saito, Y, Muramatsu, T, Otani, A, Kasai, Y, Kimura, M, Saito, H (2014) Inhibitors of enhancer of zeste homolog 2 (EZH2) activate tumor-suppressor microRNAs in human cancer cells. Oncogenesis 3: pp. e104 CrossRef
    46. Cao, L, Kong, LP, Yu, ZB, Han, SP, Bai, YF, Zhu, J, Hu, X, Zhu, C, Zhu, S, Guo, XR (2012) microRNA expression profiling of the developing mouse heart. Int J Mol Med 30: pp. 1095-1104
    47. Wang, Q, Cai, J, Cai, XH, Chen, L (2013) miR-346 regulates osteogenic differentiation of human bone marrow-derived mesenchymal stem cells by targeting the Wnt/beta-catenin pathway. PLoS One 8: pp. e72266 CrossRef
    48. Malekar, P, Hagenmueller, M, Anyanwu, A, Buss, S, Streit, MR, Weiss, CS, Wolf, D, Riffel, J, Bauer, A, Katus, HA, Hardt, SE (2010) Wnt signaling is critical for maladaptive cardiac hypertrophy and accelerates myocardial remodeling. Hypertension 55: pp. 939-945 CrossRef
    49. Wang, Y, Huang, JW, Castella, M, Huntsman, DG, Taniguchi, T (2014) p53 Is Positively Regulated by miR-542-3p. Cancer Res 74: pp. 3218-3227 CrossRef
    50. He, X, He, L, Hannon, GJ (2007) The guardian鈥檚 little helper: microRNAs in the p53 tumor suppressor network. Cancer Res 67: pp. 11099-11101 CrossRef
    51. Predmore, JM, Wang, P, Davis, F, Bartolone, S, Westfall, MV, Dyke, DB, Pagani, F, Powell, SR, Day, SM (2010) Ubiquitin proteasome dysfunction in human hypertrophic and dilated cardiomyopathies. Circulation 121: pp. 997-1004 CrossRef
  • 刊物主题:Health Informatics; Computational Biology/Bioinformatics; Information Systems and Communication Service; Bioinformatics;
  • 出版者:BioMed Central
  • ISSN:2047-2501
文摘
Background Feature selection techniques have become an apparent need in biomarker discoveries with the development of microarray. However, the high dimensional nature of microarray made feature selection become time-consuming. To overcome such difficulties, filter data according to the background knowledge before applying feature selection techniques has become a hot topic in microarray analysis. Different methods may affect final results greatly, thus it is important to evaluate these pre-filter methods in a system way. Methods In this paper, we compared the performance of statistical-based, biological-based pre-filter methods and the combination of them on microRNA-mRNA parallel expression profiles using L1 logistic regression as feature selection techniques. Four types of data were built for both microRNA and mRNA expression profiles. Results Results showed that pre-filter methods could reduce the number of features greatly for both mRNA and microRNA expression datasets. The features selected after pre-filter procedures were shown to be significant in biological levels such as biology process and microRNA functions. Analyses of classification performance based on precision showed the pre-filter methods were necessary when the number of raw features was much bigger than that of samples. All the computing time was greatly shortened after pre-filter procedures. Conclusions With similar or better classification improvements, less but biological significant features, pre-filter-based feature selection should be taken into consideration if researchers need fast results when facing complex computing problems in bioinformatics.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700