特征提取及分类算法在膜蛋白分类预测问题中的应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
基因是能够自我复制,永远保存的单位,它的生理功能是以蛋白质的形式表达出来的。细胞中有大约30%的蛋白质是膜蛋白。膜蛋白作为生物膜的主要组成成分之一,是生物膜功能的主要承担者,在生物体中发挥着极其重要的作用。面对数量庞大的膜蛋白序列信息,利用传统的分子生物学实验方法来预测膜蛋白结构类型不仅费时费力,还会
     遇到一些目前无法解决的困难,已经难以满足现实的要求。膜蛋白序列的特征提取和分类是膜蛋白分类预测研究中最基本的问题之一,也是决定膜蛋白分类质量的关键。本文以膜蛋白序列的分类预测为主题,针对膜蛋白序列的特征选择算法、分类算法进行了相关的研究,现将主要工作和创新之处概括如下:
     (1)本文将线性降维方法应用到膜蛋白分类预测问题中。现今,在膜蛋白特征提取算法中,二肽组成(DC)已逐渐被证明比传统的氨基酸组成(AAC)更有效。然而通过此方法虽然可以取得较高的分类预测精度,但是从膜蛋白序列特征中提取出的属性特征向量的维数一般都很高,它在全面描述膜蛋白序列信息的同时,也带来了“维数灾难”问题,使得膜蛋白预测系统的计算复杂度很高。为了解决这一问题,我们将线性降维方法应用于膜蛋白分类预测问题中。首先采用二肽组成(DC)方法从膜蛋白序列中提取出高维属性特征向量,然后采用线性降维方法从高维DC空间数据中进行二次提取,提取出重要的低维特征向量,接着在降维后的低维特征向量上再进行分类预测,最后预测结果表明采用该方法的预测准确率要高于不采用线性降维方法的预测方法,证明了将线性降维方法应用于膜蛋白类型预测问题中的可行性和有效性,简化了膜蛋白预测系统,提高了预测效率。
     (2)本文提出五种新的基于降维的组合特征提取算法。本文首先引入线性降维的思想,构造了两种基于线性降维的组合特征提取算法:结合二肽组成和主成分分析算法,构造了新的特征提取算法DC_PCA;结合二肽组成和线性判别分析算法,构造了新的特征提取算法DC_LDA。通过实验结果表明,与传统的基于二肽组成(DC)的膜蛋白分类模型以及基于氨基酸组成(AAC)的膜蛋白分类模型相比较,基于线性降维的组合特征提取算法所构造的分类模型所达到的分类预测精度更高。为了得到具有更好分类性能的膜蛋白分类模型,更好的预测膜蛋白序列中所蕴含的结构和功能信息,本文又构造了三种基于非线性降维算法的组合特征提取算法:结合二肽组成和核心主成分分析算法,构造了新的特征提取算法DC_KPCA;结合二肽组成和核心线性判别分析算法,构造了新的特征提取算法DC_KLDA;结合二肽组成和邻域保护嵌入算法,构造了新的特征提取算法DC_NPE。实验结果表明,与传统的基于二肽组成(DC)的膜蛋白分类模型以及基于氨基酸组成(AAC)的膜蛋白分类模型相比较,基于非线性降维的组合特征提取算法所构造的分类模型所达到的分类预测精度更高。为了得到分类精度最好的分类模型,本文对五种组合降维特征提取算法做了比较,结果表明,基于DC_KLDA的模型分类精度最高,针对标准数据集CE2059,经过Jackknife检验,该模型的总体分类精度达到92.71%,比目前常用的基于氨基酸组成的分类模型提高了15.1~30.59个百分点;针对标准数据集CE2625,该模型的独立测试集检验总体分类精度达到94.12%,比目前常用的基于氨基酸组成的分类模型提高了14.69~31.42个百分点。
     (3)基因芯片技术从基础上改善了研究生物技术的方法和效率,对基因组学及后基因组研究产生了重要的影响,但海量信息的获得也对数据的分析及信息特征提取提出了新的挑战。为了解决当基因数据维数急剧升高时无法维持较高的分类准确性和效率的问题,本文在传统近似支持向量机(PSVM)的基础上,提出了降维近似支持向量机(DRPSVM)的基因芯片数据分类器。DRPSVM采用了降维的二次规划算法,不但能将基因数据的分类问题归结为仅含线性等式约束的二次规划问题,同时还在传统近似支持向量机(Proximal Support Vctor Machines, PSVM)的基础上维持了较好的分类准确性,并降低了分类处理的时空复杂度。
Gene is self-replication and preservation unit,Its physiological function is expressed in the form of the protein. There are about 30% of the protein is membrane proteins in Cells. As one of the main components of biomembrane,membrane proteins play a vital role in organisms.With the explosion of protein sequences generated, determination of membrane proteins types by molecule biology experiments is time-consuming ,what’s more,it may encounter some difficulties in the experiments that can’t be solved at present.
     Feature extraction of membrane protein sequences is a basic problem in the research of protein classification based on calculation,and is also a key factor that determines the classification performance.This paper studies Membrane protein sequence’s feature selection algorithm and classification algorithms ,and to predicte membrane proteins. The main work and innovations of this thesis are summarized as follows:
     (1)linear dimensionality reduction algorithms are introduced to Predict membrane protein types. This thesis proposes that linear dimension reduction methods be applied to the membrane protein type prediction. Nowadays, In the membrane protein’s feature extraction algorithm, Dipeptide composition (DC) has gradually been proven more effective than the conventional amino acid composition (AAC).Although using the dimensionality reduction algorithm helps to increase the predicting accuracy. However, a high dimensional disaster may be caused by using this representation method. Thus, a linear dimensionality reduction algorithm is introduced to extract the indispensable features from the high-dimensional DC space, respectively,and identify the types of membrane proteins based on the reduced low-dimensional features. Finally, experiment results show that using the proposed method to cope with prediction of membrane proteins types are very effective.
     (2)This thesis Propose five new Combined feature extraction algorithms . This thesis introduces the idea of linear dimension reduction, and construct two combination of feature extraction algorithm based on linear dimension reduction:combination of Dipeptide composition and the principal component analysis algorithm, we construct a new feature extraction algorithm DC_PCA ; Combination of dipeptide composition and linear discriminant analysis algorithm, we construct a new feature extraction algorithm DC_LDA. The experiment results show that using feature extraction algorithm based on linear dimensionality reduction to predict accuracy of Membrane protein types are higher than the traditional dipeptide composition (DC)and amino acid composition (AAC) methods.In order to obtain better classification performance of the membrane protein classification model and predicte structure and function information of membrane protein sequence, this thesis constructs three combination of feature extraction algorithm based on nonlinear dimensionality reduction algorithm: Combination of Dipeptide composition and the Kernel principal component analysis algorithm, we construct a new feature extraction algorithm DC_KPCA; Combination of the dipeptide composition and Kernel linear discriminant analysis algorithm, we construct a new feature extraction algorithm DC_KLDA; Combination of Dipeptide composition and neighborhood preserving embedding algorithm, we construct a new feature extraction algorithm DC_NPE. The experiment results show that using feature extraction algorithm based on nonlinear dimensionality reduction to predict accuracy of Membrane protein types are higher than the traditional dipeptide composition (DC)and amino acid composition (AAC) methods.To obtain the classification model with best classification accuracy, this paper construct a new feature extraction algorithm DC_KPCA; binding dipeptide composition and core linear discriminant analysis algorithm, we construct a new feature extraction algorithm DC_KLDA; binding dipeptide composition and neighborhood preserving embedding algorithm construct a new feature extraction algorithm DC_NPE.
     (3) DNA microarray technologies have changed the Methods and efficiency of biological technologies, and had a significant impact on the Genomics and post-genome, but it Presented new challenges for data analysis and information extraction to obtain a great deal of information. In order to solve the problem dimension of genetic data can not be sustained when a sharp increase in the higher classification accuracy and efficiency issues, this approximation in the traditional support vector machine (PSVM) based on the proposed dimension reduction proximal support vector machine (DRPSVM) of microarray data classification. DRPSVM using quadratic programming algorithm for dimensionality reduction, not only the classification of genetic data can be reduced to contain only linear equality constrained quadratic programming problem, while also similar to the traditional support vector machine (Proximal Support Vctor Machines, PSVM) based on the maintenance of a better classification accuracy and reduce the classification time and space complexity.
引文
[1]白玄,柳郁.基因的革命.北京:中央文献出版社,2000,15-34
    [2]贺林.解码生命.北京:科学出版社,2000,23-27
    [3]米薇,蔡耘,应万涛等.细胞质膜蛋白质组学研究技术进展.生物技术通讯,2008,19(6):1-2
    [4] AbboU A.And now for the proteome.Nature,2001(409):747
    [5] Marte B.Proteomics.Nature,2003(422):191
    [6] Tyers M,Mann M.From genomics to proteomics.Nature,2003(422):193-195
    [7]钱小红,贺福初.蛋白质组学:理论与方法.北京:科学出版社,2003:86-88
    [8] Krane DE,Raymer ML著,孙啸等译.生物信息学概论.北京:清华大学出版社,2004:33-35
    [9]古练权.生物化学.北京:高等教育出版社,2000:105-113
    [10] Cai Y D,Chou K C.Nearest neighbor algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition.Biochemical and Biophysical Research Communications,2003(305):407-413
    [11] Aebersold R,Mann M.Mass spectrometry—based proteomics.Nature,2003(422):198-201
    [12] Rajagopal I,Ahem K.Protein sequencing in the post-genomic era.Science,2001(294):2571-2572
    [13] Andersen J S,Mann M.Functional genomics by mass spectrometry.FEBS Lett,2000(480):25-30
    [14]来鲁华.蛋白质的结构预测与分子设计.北京:北京大学出版社,1993:28
    [15]胡维新.医学分子生物学.北京:科学出版社,2007:83
    [16] Hanash S. Disease proteomics.Nature,2003(422):226-231
    [17] Santoni V,Molloy M,Rabilloud T.Membrane proteins and proteomics:Un amour impossible?.Electrophoresis,2000(21):1054-1059
    [18] Gilbert W.Towards a paradigm shift in biology.Nature,1991(349): 99
    [19]郝柏林,刘寄星.理论物理与生命科学.上海:上海科学技术出版社,1999:63
    [20] Boguski M S,McIntosh MW.Biomedical informatics for proteomics.Nature, 2003(422):233-235
    [21]陈润生.生物信息学.生物物理学报,1999 (1):5-10
    [22]孙啸.生物信息学—揭示生物分子数据的内涵.电子科技导报,1998(11): 10-15
    [23]夏其昌,曾嵘等.蛋白质化学与蛋白质组学.北京:科学出版社,2004:72
    [24] Gibas C等著,孙超等译.生物信息学中的计算机技术.北京:中国电力出版社,2002:28-31
    [25] Chou K C,Elrod DW.Prediction of membrane protein types and subcellular locations.Proteins:Structure,Function,and Bioinformatics,1999(34):137-151.
    [26] Chou K C,Cai Y D.Prediction of membrane protein types by incorporating amphipathic effects.Journal of Chemical Information and Modeling,2005(45):407-412
    [27] Feng Z P.An overview on predicting the subcellular location of a protein.In Silico Biology,2002(2):291-301
    [28]张振慧.蛋白质分类问题的特征提取算法研究:[国防科技大学博士学位论文].长沙:国防科技大学,2006:15-24
    [29] Chou KC,Cai YD.Using GO-PseAA predictor to identify membrane proteins and their types.Biochemical and Biophyrsical Research Communications,2005(327):845-848
    [30] Chou K C.Review:prediction of protein structural classes and subcellular locations.Current Protein&Peptide Science,2000(1):171-204
    [31] Feng Z P,Zhang C T.Prediction of membrane protein types based on the hydrophobic index of amino acids.Journal of Protein Chemistry,2000,19(4):269-273
    [32] Liu H,Yang J,Wang M,Xue L,Chou KC.Using Fourier spectrum analysis and pseudo amino acid composition for prediction of membrane protein types.The Protein Journal,2005,24(6):385-388
    [33] Cai Y D,Zhou G P,Chou K C.Support vector machines for predicting membrane protein types by using functional domain composition.Biophisical Journal,2003(84):3257-3261
    [34] Shen HB,ChouK C.Using optimized evidence-theoretic k-nearest neighbor classifier and Pseudo-amino acid composition to predict membrane protein types.Biochemical and Biophysical Research Communications,2005(334):288-291
    [35] Shen H B,YangJ,Chou KC.Fuzzy KNN for predicting membrane protein types from Pseudo-amino acid composition.Journal of Theoretical Biology,2006(240):9-16
    [36] Wang M,Yang J,Xu Z J,et a1.SLLE for predicting membrane protein types.Journal of Theoretical Biology,2005(232):7-14
    [37]徐志节,杨杰,王猛.利用非线性降维方法预测膜蛋白类型.上海交通大学学报,2005,39(2):279-282
    [38] Yang XG,Luo RY,Feng Z P.Using amino acid and peptide composition to predict membrane protein types.Biochemical and Biophysical Research Communications,2007(353):164-166
    [39] Liu H,Wang M,Chou KC.Low-frequency Fourier spectrum for predicting membrane protein types.2005(336):737-738
    [40] Cai Y D,Liu XJ,Xu X B,et a1.SVM for predicting membrane protein types by incorporating quasi-sequence-order effect . Internet Electronic Journal of Molecular Design,2002(1):219-224
    [41] Park K J,Kanehisa M.Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs.Bioinformatics,2003(19):1656-1660
    [42]张振慧.蛋白质分类问题的特征提取算法研究.长沙:国防科技大学,2006:14-28
    [43]王正华,张振慧,王勇献.蛋白质亚细胞定位预测中的序列编码技术.生物信息学,2007(2):82-86
    [44] Nakashima H,Nishikawa K.Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies.Journal of Molecular Biology,1994(238):54-62
    [45] Bhasin M,Raghava G P S.ESLpred:SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST.Nucleic Acids Research,2004(32):414-419
    [46] Zhu H Q,She Z S,Wang J.An EDP-based description of DNAsequences and its application in identification of exons in Human genome.北京:第二届中国生物信息学大会论文集,2002,23-25
    [47]朱雪龙.应用信息论基础.北京:清华大学出版社,2001,187
    [48] Guo J,Lin Y L,Sun Z R.A novel method for protein subcellular localization:Combining residue-couple model and SVM.Proceedings of 3rd Asia-Pacific Bioinformatics Conference,Singapore,2005,29
    [49] Cai YD,Ricardo P W,Jen C H,Chou KC.Application of SVM to predict membrane protein types.Journal of Theoretical Biology,2004(226):373-375
    [50] Garg A,Bhasin M,Raghava G P.SVM-based method for subcellular localization of human proteins using amino acid compositions,their order and similarity search.The Journal of Biological Chemistry,2005(280):14427-14431
    [51] Bu W S,Feng Z P,Zhang Z D,Zhang C T.Prediction ofprotein(domain) structural classes based on amino acid index.European Journal of Biochemistry,1999(266):1043-1046
    [52] Kawashima S,Ogata H,Kanehisa M.AAindex:amino acid index database.Nucleic Acids Res,1999,27(1):368-370
    [53] Chou K C.Prediction of protein cellular attributes using pseudo-amino acid composition.Proteins Structure,Function,and Genetics,2001(43):246-253
    [54] Feng Z P,Zhang C T.A graphic representation of protein primary structure and its application in predicting subcellular locations of prokaryotic proteins.Internet Journal of Biochemistry and Cell Biology,2002(34):298-302
    [55] Metressel B A,Saurugger P N,Connelly DP,Rich S S.Cross-validation of protein structural class prediction using statistical clustering and neural networks.Protein Science,1993(2):1171-1183
    [56] Chou K C . Prediction of protein subcellular locations by incorporating quasi-sequence-order effect . Biochemical and Biophysical Research Communications,2000(19):477-481
    [57] Cai YD,Liu XJ,Xu XB,Chou KC.Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect.Journal of Cellular Biochemistry,2002(84):343-345
    [58] Chou K C,Cai YD.Predicting protein structural class by functional domain composition.Biochemical and Biophysical Research Communications,2004(321):1007-1008
    [59] Yu X J,Wang C,Li Y X.Classification of protein quaternary structure by functional domain composition.BMC Bioinformatics,2006(7):187
    [60] Chou K C,Cai Y D.Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition.Journal of Cellular Biochemistry,2004(91):1197-1201
    [61] Phizicky E,Bastiaens P I,Zhu H,et a1.Protein analysis on a proteomic scale[J].Nature,2003(422):208-214
    [62] Chou K C , Cai Y D . Prediction of protein subcellular locations by GO-FunD-PseAA predictor . Biochemical and Biophysical Research Communications,2004(320):1236-1238
    [63] Cai Y D,Chou K C.Predicting subcellular localization of proteins in a hybridization space.Bioinformatics,2004(20):1151-1155
    [64] Chou KC,Cai YD.Using functional domain composition and support vectormachines for prediction of protein subcellular location.The Joumal of Biological Chemistry,2002(277):45765-45767
    [65] Ashbumer M,Ball C A,Blake J A,Botstein D.Gene ontology:tool for the unification ofbiology.Nature Genetics,2000(25):25-28
    [66] Lei Z D,Dai Y.A novel approach for prediction of protein subcellular localization from sequence using Fourier analysis and support vector machines.Seattle:Proceedings of 4th ACM SIGKDD Workshop on Data Mining in Bioinformatics,2004,11-15
    [67] Nakashima H,Nishikawa K.Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies.Joumal of Molecular Biology,1994(238):58-66
    [68] Bhasin M,Garg A,Raghava G P S.PSLpred:prediction of subcellular localization of bacterial proteins.Bioinformatics,2005(21):2522-2523
    [69] Pan Y X,Zhang Z Z,Guo Z M,et a1.Application of pseudo amino acid composition for predicting protein subcellular location:stochastic signal processing approach.Journal of Protein Chemistry,2003(22):395-405
    [70] Pan YX,Li DW,Duan Y,et a1.Predicting protein subcellular location using digital signal processing.Acta Biochimica et Biophysica Sinica,2005(37):88-94
    [71] Chou P Y.Amino acid composition of four classes of proteins.Las Vega:Abastracts of Papers,Part I,Second Chemical Congress of the North American Continent,1980,281
    [72] Chou P Y.Prediction of protein structural classes from amino acid composition.In Prediction of Protein Structure and the Principles of Protein Conformation,1986:549-582
    [73] Nakashima H,Nishikawa K,Ooi T. The folding type of a protein is relevant to the amino acid composition.Journal of Biochemistry,1986(99):152-163
    [74] Chou KC,Zhang C T.Predicting protein folding types by distance functions that make allowances for amino acid interactions.The Journal of Biological Chemistry,1994(269):22014-22016
    [75] Chou K C.A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space.Proteins:Structure,Function,and Genetics,1995(21):319-323
    [76] Chou K C,Maggiora G M.Domain structural class prediction.Protein Engineering,1998(11):523-537
    [77] Chandonia J M,Karplus M.Neural networks for secondary structure and structuralclass predictions.Protein Science,1995(4):275-283
    [78]秦红珊,杨新岐.用BP神经网络基于氨基酸特性预测非同源蛋白质二级结构含量.生物物理学报,2002(18):467-471
    [79] Guo J,Lin YL,Sun Z R.A novel method for protein subcellular localization based on boosting and probabilistic neural network.Dunedin:Proceedings of the second conference on Asia.Pacific bioinformatics,2004,82
    [80]孙豫峰.基于概率神经网络的蛋白质亚细胞定位.太原师范学院学报,2005(4):23-26
    [81] Gao Q B,Wang Z Z.Using nearest feature line and tunable nearest neighbor methods for prediction of protein subcellular locations.Computational Biology and Chemistry,2005(29):388-393
    [82] Yuan Z . Prediction of protein subcellular locations using Markov chain models.FEBS LeRerS,1999(14):23-25
    [83] Cai YD,LiuX J,Xu X B,Chou KC.Using neural networks for prediction of subcellular location of prokaryotic and eukaryotic proteins.Molecular Cell Biology Research Communications,2000(4):172-174
    [84] Cai YD,Liu XJ,Chou K C.Artificial neural network model for predicting protein subcellular location.Computers and Chemistry,2002(26):179-181
    [85] Bairoch A,Apweiler R.The SWISS-PROT protein sequence data bank and its supplement TrEMBL.Nucleic Acids Research,1997(25):31-35
    [86] Chou KC,Liu W,Maggiora GM,Zhang C T.Prediction and classification of domain structural classes.Proteins:Structure,Function,and BioinformatiCS,1998(31):97-102
    [87] Baldi P,Brunak S,Chauvin Y,et a1.Assessing the accuracy of prediction algorithms for classification:an overview.Bioinformatics,2000(16):412-423
    [88] Zhang C T,Zhang R.Q9,a content-balancing accuracy index to evaluate algorithms of protein secondary structure prediction.The International Journal of Biochemistry and Cell Biology,2003(35):1256-1264
    [89] H. Lin and Q. Z. Li. Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components. J Comput Chem. 2007,28(9):1463-1464
    [90] C.Wu,G Whitson,J.McLarty,A.Ermongkonehai and T.C.Chang,Protein classification artificial neural system.,Protein Sci.1(1992):667-675
    [91] Fukunaga, k., Introduction to Statistical Pattern Recognition. First Edition ed. 1972, New York: Academic Press
    [92] Roweis S T,Saul L K. Nonlinear Dimensionality Reduction by Locally Linear Embedding.Science,2000,290(5500):2323-2325
    [93] Belkin M, Niyogi P. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering.Advances in Neural Information Processing Systems 14. Cambridge:MIT Press,2001:585-590
    [94] He X F,CaiD,Yan S C,etal.Neighborhood Preserving Embedding.Proc of the 10th IEEE International Conference on computer Vision.Beijing : IEEE,2005:1208-1211
    [95] R. A. Fisher, The use of multiple measures in taxonomic problems. Ann., Eugenics. 7(1936) :179-186
    [96]肖健华,智能模式识别方法.广州:华南理工大学出版社.2006,855-861
    [97] Yu C S,Lin C J,Hwang J K.Predicting subcellular localization of proteins for gramnegativebacteria by support vector machines based on n-peptide compositions.ProteinSci,2004,13(5):1402-1406
    [98] Matsuda S,Vert J P,Saigo H,et a1.A novel representation of protein sequences for prediction of subcellular location using support vector machines.Protein Sci,2005,14(11):2804-2815
    [99]张振慧,王勇献,王正华.一种预测细胞凋亡蛋白的亚细胞定位的新方法.激光生物学报,2007,16(2):249-252
    [100] Wang M,Yang J,Liu GP.Weighted—support vector machines for predicting membrane protein types based ON Pseudo—amino acid composition.Protein Engineering,Design and Selection,2004,17(6):509-515
    [101] Wang S Q,Yang J,Chou K C.Using stacked generalization to predict membrane protein types based on Pseudo amino acid composition.Journal of Theoretical Biology,2006(242):941-944
    [102] Shen HB,Chou KC.Using ensemble classifier to identify membrane protein types.Amino Acids,2007(32):483-486
    [103] B. Alberts, D. Bray, J. Lewis, M. Raff, K. Roberts, J.D. Watson, Molecular Biology of the Cell, Garland Publishing, New York & London, 1994, chapter 1
    [104] K.C. Chou and D. W. Elrod. Prediction of membrane protein types and subcellular locations. Proteins: Struct. Funct. Genet. 1999,34: 137–151
    [105] P.Y. Chou, Prediction of protein structural classes from amino acid composition. In: Fasman, G.D. (Ed.), Prediction of Protein Structure and the Principles of Protein Conformation. Plenum Press, New York, 1989:549–566
    [106] A .Garg and GP. Raghava. A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search. In Silico Biol. 2008,8(2):129-132
    [107] H. Lin and Q. Z. Li. Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components. J Comput Chem. 2007,28(9):1463-1465
    [108] R. A. Fisher, The use of multiple measures in taxonomic problems. Ann., Eugenics. 7(1936):179-183
    [109] I. T. Jolliffe, Principal Component Analysis. Springer-Verlag, New York, 1986
    [110] Malinowski, E.R. and D.G. Howery, Factor Analysis in Chemistry. 1980, New York: John Wiley
    [111] S Mika, G Ratsch, J Weston, B Scholkopf, KR Mullers. Neural Networks for Signal Processing IX, 1999. Proceedings of the 1999 IEEE Signal Processing Society Workshop (1999):41-44
    [112] Denoeux, T., A k-Nearest Neighbor Classification Rule Based on Dempster-ShaferTheory. IEEE Transactions on Systems, Man, and Cybernetics, 1995. 25(5): 804-811
    [113] Keller, J.M. and M.R. Gray, A fuzzy k-nearest neighbor algorithm. IEEE Transactions on Systems, Man, and Cybernetics, 1985. 15(4): 580-584
    [114] Xiao, X., S. Shao, Y. Ding, Z. Huang, Y. Huang, and K.C. Chou, Using complexity measure factor to predict protein subcellular location. Amino Acids, 2005. 28(1): 57-63
    [115] Niu, B., Y.D. Cai, W.C. Lu, G.Z. Li, and K.C. Chou, Predicting protein structural class with AdaBoost Learner. Protein and Peptide Letters, 2006. 13(5): 489-491.
    [116] Chen, J., H. Liu, J. Yang, and K.C. Chou, Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids, 2007. 33(3): 423-426
    [117] Liu, D.Q., H. Liu, H.B. Shen, J. Yang, and K.C. Chou, Predicting secretory protein signal sequence cleavage sites by fusing the marks of global alignments. Amino Acids, 2007. 32(4): 493
    [118] Wang, M., J. Yang, and K.C. Chou, Using string kernel to predict signal peptide cleavage site based on subsite coupling model. Amino Acids, 2005. 28(4):395-401
    [119] K.C. Chou and D.W. Elrod. Prediction of membrane protein types and subcellular locations. PROTEINS: Struct. Funct. Genet. 1999,34:137–151
    [120] P.Y. Chou, Prediction of protein structural classes from amino acid composition. In: Fasman, G.D. (Ed.), Prediction of Protein Structure and the Principles of ProteinConformation. Plenum Press, New York, (1989):549–576
    [121] A .Garg and GP. Raghava. A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search. In Silico Biol. 2008,8(2):129-136
    [122] H. Lin and Q. Z. Li. Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components. J Comput Chem. 2007,28(9):1463-1465
    [123] K.C. Chou and D.W. Elrod. Prediction of membrane protein types and subcellular locations. PROTEINS: Struct. Funct. Genet. 1999,34, 137–152
    [124] Xiaofei He, Deng Cai, Shuicheng Yan, and Hong-Jiang Zhang. Neighborhood Preserving Embedding. IEEE International Conference on Computer Vision (ICCV), Beijing, China, 2005,15
    [125] Roweis, S.T. and L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding. Science, 2000. 290(5500): 2323-2324
    [126]顾坚磊,周雁.中国基因组生物信息学回顾与展望.中国科学C辑:生命科学, 2008, 38(10): 882-891
    [127] Mark Schena, Dari Shalon, Ronald W. Davis, Patrick O. Brown. Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray[J]. Science, 1995, 270(5235): 467-471
    [128] Julia Handl, Joshua Knowles, Douglas B. Kell. Computational cluster validation in post-genomic data analysis[J]. Bioinformatics, 2005, 21(15):3201-3213
    [129] Purvesh Khatri, Sorin Draghici. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics, 2005, 21(18):3587-3594
    [130] Mark Reimers. Statistical analysis of microarray data. Addiction Biology, 2005, 10(1):23-34
    [131]郭宗明,张治洲,潘宇曦等.利用支持向量机预测生物膜蛋白类型.上海交通大学学报,2004,38(5):806-808
    [132]胡秀珍,李前忠.用离散量的方法识别蛋白质的超二级结构.生物物理学报,2006,22(6):424-427
    [133]荆志伟,王忠.基因芯片数据分析方法研究进展.生物技术通讯, 2007, 18(1):144-147
    [134]陈颖丽,李前忠,杨科利等.基于离散增量结合支持向量机方法的凋亡蛋白亚细胞位置预测.生物物理学报,2007,23(3):192-199
    [135] Charles Kooperberg, Thomas G. Fazzio, Jeffrey J. Delrow, Toshio Tsukiyama. Improved Background Correction for Spotted DNA Microarrays. Journal of Computational Biology, 2002, 9(1): 55-63
    [136] Rine Dudoit, Jane Fridly, Terence P. Speed. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 2002, 97(457):77-84
    [137] Hsu AL, Tang SL, Halgamuge SK. An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data. Bioinformatics, 2003, 19(16):2131-2142
    [138] Reinhard Guthke Wolfgang, Wolfgang Schmidt-heck, Daniel Hahn, Michael Pfaff. Gene expression data mining for functional genomics. In Proc. European Symposium on Intelligent Techniques, Aachen, Germany, 2000, 170-176
    [139] Alexei A. Sharov, Dawood B. Dudekula, Minoru S. H. Ko. A web-based tool for principal component and significance analysis of microarray data.Bioinformatics, 2005, 21(10):2548-2550
    [140] J. Cho, D. Lee, J. Park, I. Lee. Gene selection and classification from microarray data using kernel machine. FEBS Letters, 2004, 571(1):93-97
    [141] Dangond Fernando, Hwang Daehee, Camelo Sandra, et al. Molecular signature of late-stage human ALS revealed by expression profiling of postmortem spinal cord gray matter. Physiological genomics, 2004, 16(2):229-234
    [142] Nir Friedman, Michal Linial, Iftach Nachman, Dana Pe'er. Using Bayesian Networks to Analyze Expression Data. Journal of Computational Biology, 2000, 7(3-4): 601-614
    [143]吕志清,李前忠.一种预测蛋白质结构型的新方法.内蒙古大学学报,2002,33(1):26-32
    [144]陈颖丽,李前忠.用离散量方法预测细胞凋亡蛋白的亚细胞位置.内蒙古大学学报,2004,35(4):413-418
    [145]吕志清,李前忠.用离散量预测蛋白质的结构型.生物物理学报,2001,17(4):703-710
    [146] Imoto S, Higuchi T, Goto T, Tashiro K, Kuhara S, Miyano S. Combining microarrays and biological knowledge for estimating gene networks via bayesian networks. Journal of bioinformatics and computational biology, 2004, 2(1):77-88
    [147] SY Kim, S Imoto, S Miyano. Inferring gene networks from time series microarray data using dynamic Bayesian networks. Briefings in Bioinformatics, 2003, 4(3):228-235
    [148] Douglas T. Ross, Uwe Scherf, Michael B. Eisen, et al. Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics, 2000, 24(3):227-235
    [149] Ying Liu. Active Learning with Support Vector Machine Applied to Gene Expression Data for Cancer Classification. Journal of chemical information and computer sciences, 2004, 44(6):1936-1941
    [150] B. Alberts, D. Bray, J. Lewis, M. Raff, K. Roberts, J.D. Watson, Molecular Biology of the Cell, Garland Publishing, New York & London, 1994, chapter 1
    [151] K.C. Chou and D. W. Elrod. Prediction of membrane protein types and subcellular locations. Proteins: Struct. Funct. Genet. 34: 137–153, 1999
    [152] P.Y. Chou, Prediction of protein structural classes from amino acid composition. In: Fasman, G.D. (Ed.), Prediction of Protein Structure and the Principles of Protein Conformation. Plenum Press, New York, (1989):549-586
    [153] Brown MP, Grundy WN, Lin D, Cristianini N. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Sciences of the United States of America, 2000, 97(1):262-267
    [154] A .Garg and GP. Raghava. A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search. In Silico Biol. 2008,8(2):129-140,
    [155] Terrence S. Furey, Nello Cristianini, Nigel Duffy. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 2000, 16(10):906-914
    [156] H. Lin and Q. Z. Li. Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components. J Comput Chem. 2007,28(9):1463-1466
    [157] Richard D Williams, Sandra N. Hing, Braden T. Greer. Prognostic classification of relapsing favorable histology Wilms tumor using cDNA microarray expression profiling and support vector machines. Genes, Chromosomes and Cancer, 2004, 41(1):65-79
    [158] Malinowski, E.R. and D.G. Howery, Factor Analysis in Chemistry. 1980, New York: John Wiley
    [159] Isabelle Guyon, Jason Weston, Stephen Barnhill, Vladimir Vapnik. Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning, 2002, 46(1-3):389-422
    [160] U. Alon, N. Barkai, D. A. Notterman. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of the United States of America, 1999, 96(12):6745-6750
    [161] Tanya Barrett, Dennis B. Troup, Stephen E. Wilhite. NCBI GEO: mining tens of millions of expression profiles-database and tools update. Nucleic Acids Research, 2007, 35(Database issue):760-765
    [162] B. Scholkopf, A.Smola, K. -R. Muller. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation, 1998,10 (5), 1299-1319
    [163] Golub T R, Slonim D K, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.Science,1999, 286(5439): 531-537
    [164] K.C. Chou and D.W. Elrod. Prediction of membrane protein types and subcellular locations. PROTEINS: Struct. Funct. Genet. 34, 1999,137-153.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700