Identification of discriminatory variables in proteomics data analysis by clustering of variables

详细信息查看全文

作者：Sadegh Karimi^a ; ^b ; Bahram Hemmateenejad^a ; ^b ; ^{hemmatb@sums.ac.ir}
关键词：Classification ; Proteomics ; Clustering of variables ; Cancer ; Discriminant analysis ; Self-organization map
刊名：Analytica Chimica Acta
出版年：2013
出版时间：12 March, 2013
年：2013
卷：767
期：Complete
页码：35-43
全文大小：593 K

文摘

This article presents a data analysis method for biomarker discovery in proteomics data analysis. In factor analysis-based discriminate models, the latent variables (LV's) are calculated from the response data measured at all employed instrument channels. Since some channels are irrelevant and their responses do not possess useful information, the extracted LV's possess mixed information from both useful and irrelevant channels. In this work, clustering of variables (CLoVA) based on unsupervised pattern recognition is suggested as an efficient method to identify the most informative spectral region and then it is used to construct a more predictive multivariate classification model. In the suggested method, the instrument channels (m/z value) are clustered into different clusters via self-organization map. Subsequently, the spectral data of each cluster are separately used as the input variables of classification methods such as partial least square-discriminate analysis (PLS-DA) and extended canonical variate analysis (ECVA). The proposed method is evaluated by the analysis of two experimental data sets (ovarian and prostate cancer data set). It is found that our proposed method is able to detect cancerous from healthy samples with much higher sensitivity and selectivity than conventional PLS-DA and ECVA methods.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700