用户名: 密码: 验证码:
信息度量的蛋白质序列、结构、质谱数据研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
蛋白质是生命机体的基本组成成分,是连结分子运作和生物功能的主要组成部分,因此对蛋白质的研究有助于理解分子机理,更加清晰的了解生命活动的规则。目前,运用数学、信息学、计算机科学等学科的工具对蛋白质进行研究的生物信息学——蛋白质组学已经成为异常活跃的研究领域之一。
     本文以信息论方法和优化方法为工具,以蛋白质序列、蛋白质结构、人体组织的蛋白质组为研究对象,以提取蛋白质序列、蛋白质结构、蛋白质组的可区分表达的特征信息为目的,主要针对蛋白质序列比较及其应用、蛋白质结构比较和质谱数据分类这三个方面进行了研究。本文的主要研究成果如下:
     在第二章,首先针对蛋白质多序列比对问题,建立了多序列比对的整数规划模型,证明了该模型最优解的存在性,并且构造了优化算法用于求解该模型;根据氨基酸的亲疏水性质,构造出蛋白质磷酸化位点周围的亲水残基序列间隔分布来模拟磷酸化位点周围的物理化学环境,同时设计了预测磷酸化位点的算法;再者针对外膜蛋白和其他膜蛋白及球蛋白的区分问题,利用蛋白质的子序列分布和FDOD函数进行了研究,此方法在一些公用数据集上的分类精度高于已有的一些算法。
     在第三章,主要研究了蛋白质结构比较问题。首先基于完全信息集的概念,提出了一种蛋白质结构描述方法——中心碳原子距离序列的子序列分布表示,并基于这种表示方法和FDOD函数,给出了一种蛋白质结构的偏差度量,并设计了一种蛋白质结构比较方法,应用该方法对一些公用数据集进行了聚类分析,取得了较好的聚类结果,表明了该方法的有效性。其次,用间隔为3的中心碳原子的距离分布来近似刻画蛋白质结构的局部几何,用中长程作用的线陛序列分布来刻画蛋白质结构的整体拓扑,给出了一种蛋白质折叠的几何-拓扑混合表示,并基于这种表示和FDOD函数,给出了一个蛋白质结构的偏差度量,设计了一种新的蛋白质结构比较方法和分类方法。应用这种方法对一些公用数据集进行了聚类分析和分类试验,取得了较好的聚类结果和分类结果,表明了该方法的有效性;最后,在功能预测实验平台上,基于蛋白质结构的接触向量表示,系统比较了FDOD函数、交叉熵和欧式距离三种度量,试验结果表明:FDOD函数更适合于度量接触向量表示之间的偏差。
     在第四章,以人体组织的蛋白质组为研究对象,应用基于FDOD方法的分类器对癌症病人和良性携带者的蛋白质质谱数据进行了分类,分类精度令人满意;以分类精度高且使用的特征少为目标,建立了质谱数据特征选择问题的多目标规划模型,将该多目标规划模型转化为了一个单目标规划模型,并简单分析了该模型最优解的存在性。
The protein is a main component of the life organism and is also the main component thatlinking the molecule operation and the biological function. The study on protein facilitates un-derstanding the molecule mechanism and rule of the life activity further. At present, Proteomicsthat study protein based on mathematics, informatics and computer technology has become oneof unusually active research fields.
     In this thesis, several problems related with the protein sequence, structure and proteomein cell or tissue are investigated by methods in informatics and mathematics. The main work in-clude research on protein sequence comparison and its application, protein structure comparisonand mass spectrometry data classification. Our achivements are summarized as following:
     In chaper 2, we firstly formulate the protein multiple sequence alignment problem as ainteger programming model, the existence of optimal solution is also proved in brief. We alsoconstruct a optimization algorithm to solve the integer programming model. Secondly, wepresent a novel computational phosphorylation sites prediction method based on the topologicaldistribution of hydrophilic amino acids surrounding potential phosphorylation sites, in whichthe topological distribution is used to characterize the physical-chemical environment of exper-imental verified phosphorylation site. Finally, a measure based on information discrepancy isapplied to the discrimination of outer membrane protein. Different from the previous aminoacid composition based methods, the approach focuses on the comparisons ofsubsequence distri-butions, which takes into account the effect of residue order in protein primary structure. Theapproach outperforms all previous methods on the same benchmark data set.
     In chapter 3, The work focus on protein structure comparison problem. Fistly, a novelrepresentation of protein structure (subsequence distribution of C_α-C_αdistances, SSD) is for-mulated at first. Then an FDOD score scheme is developed to measure the discrepancy betweentwo representations. Numerical experiments of the new method are conducted in four differentprotein datasets and clustering analyses are given to verify the effectiveness of this new proteinstructure discrepancy measure. Secondly, a novel hybrid representation of protein structureis proposed by utilizing two sources of information. One is the distribution of C_α-C_αdis-tances with sequence separation three, which describes the local-geometry property and is usedto identify contents of regular secondary structures; the other is the linear sequence distancedistribution of medium and long range interactions, which represents packing arrangement and topological connections between secondary structures. Furthermore, we introduce a new proteinstructure comparison method based on information theory. Cluster analysis and structure clas-sification experiments on several data sets demonstrate its effectiveness on measuring proteinfold similarity. Finally, based on contact vector representation, we compared FDOD function,cross entropy and Euclid metric by functional prediction experiment. The experiment resultsshow that FDOD function are more suitable for measuring the discrepancy between contactvector representations.
     In chapter 4, a classifier based on FDOD is used to discriminate mass spectrometry data ofcancer patient from that of normal person, the performance is satisfying. Becauese of the highdimentionality of mass spectrometry data and the need for finding biomaker, it is necessary for usto study the problem of feature selection from mass spectrometry data. The problem is modeledas a multi-objective programming, then it is tranformed into a single objective programmingmodel byεmethod. Finally, the existence of this model's optimal solution is analysised briefly.
引文
[1] 钟扬,王莉,张亮等译.生物信息学.北京:高等教育出版社,2003.
    [2] 阎隆飞,孙之荣.蛋白质分子结构.北京:清华大学出版社,1999.
    [3] 朱浩等译.计算分子学导论.北京:科学出版社,2003.
    [4] 李衍达,孙之荣等译.生物信息学——基因和蛋白质序列分析的实用指南.北京:清华大学出版社,2000.
    [5] 赵国屏等.生物信息学.北京:科学出版社,2002.
    [6] Needlemann S B, Wunch C D. A general method applicable to the search for similarities in the amino acid sequences of two proteins. Journal of Molecular biology, 1970, 48: 443-453.
    [7] Smith T F, Waterman M S. Identification of common molecular subsequences. Journal of Molecular biology, 1981, 147: 195-197.
    [8] Lipman D J, Pearson W R. Rapid and sensitive protein similarity searches. Science, 1985, 227: 1435-1441.
    [9] Altschul S F, Gish W, Miller Wet al. Basic local alignment search tool. Journal of Molecular biology, 1990, 215: 403-410.
    [10] 罗静初等译.生物信息学概论.北京:北京大学出版社,2002.
    [11] Feng D F, Doolittle R F. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution, 1987, 25: 351-360.
    [12] Feng D F, Doolittle R F. Progressive sequence alignment of amino acid sequences and construction of phylogenetic trees from them. Methods in Enzymology, 1996, 266: 368-382.
    [13] Concertina Guerra, Sorin Istrail, editors. Mathematical Methods for Protein Structure Analysis and Design. Berlin:Springer,2003.
    [14] Kawabata T, Nishikawa K. Protein structure comparison using the Markov transition model of evolution. Proteins: Structure, Function and Genetics, 2000, 41: 108-122.
    [15] Yusu Wang. Pairwise structure comparison techniques. Technical report, Duke University, NC, 2002.
    [16] Murzin Alexey G, Brenner Steven E, Hubbard Tim et al. SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Bilogy, 1995, 247: 536-540.
    [17] Hubbard T J, Ailey B, Brenner S E et al. Chothia C. SCOP: A structural classification of proteins database. Nucleic Acids Research, 1999, 27(1): 254-256.
    [18] Orengo C A, Michic A D, Jones S et al. CATH- a hierarchic classification of protein domain structures, Structure, 1997, 5(8): 1093-1108.
    [19] Holm L, Sander C. Protein structure comparison by alignment of distance matrices. Journal of Molecular Biology, 233: 123-138.
    [20] Holm L, Sandcr C. Searching protein structure database has come of age. Proteins, 1994, 19: 165-173.
    [21] Holm L, Sander C. Mapping the protein universe. Science, 1996, 273(2): 595-602.
    [22] Carugo Oliviero, Ponger Sander. Recent progress in protein 3d structure comparison. Current Protein and Peptide Science, 2002, 3: 441-449.
    [23] Eidhammer I, Jonassen I, Taylor W R. Structure comparison and structure patterns. Journal of Computational Bilogy, 2000, 7(7): 685-716.
    [24] Singh A P, Brutlag D L. Protein structure alignment: A comparison of methods. Technical report, Stanford University, CA,1999.
    [25] Sierk Michael L, Kleywegt Gerard J. Deja Vu All Over Again: Finding Ways&Means and Analyzing Protein Structure Similarities. Structure, 2004, 12: 2103-2111.
    [26] Sierk M L, Pearson W R. Sensitivity and selectivity in protein structure comparison. Protein Science, 2004, 13(3): 773-785.
    [27] Novotny Marian, Madsen Dennis, Kleywegt Gerard J. Evaluation of Protein Fold Comparison Servers. Proteins: Structure, Function, and Bioinformatics, 2004, 54: 260-270.
    [28] Carugo Oliviero. Rapid Methods for Comparing Protein Structures and Scanning Structure Databases. Current Bioinformatics, 2006, 1: 75-83.
    [29] 詹钟炜.蛋白质结构比较的建模与优化方法研究:博士学位论文.北京:中国科学院研究生院,2005.
    [30] Cohen F E, Sternberg M J E. On the prediction of protein structure: The significance of the root-mean-square deviation. Journal of Molecular Biology, 1980, 138: 321-333.
    [31] Carugo O, Ponger S. A normalized root-mean-square distance for comparing protein three- dimentional structures. Protein Science, 2001, 10: 1470-1473.
    [32] Carugo Oliviero. How root-mean-square distance values depend on the resolution of protein structures that are compared. Journal of Applied Crystallography, 2003, 36(1):125-129.
    [33] Godzik Adam. The structural alignment between two proteins: Is there a unique answer? Protein Science, 1996, 5: 1325-1328.
    [34] Carugo O, Pongor S. Protein fold similarity estimated by a probabilistic approach based on C(alpha)-C(alpha) distance comparison. Journal of Molecular Biology, 2002, 315(4): 878-898.
    [35] Vlahovicek K, Carugo O, Pongor S. The PRIDE server for protein three-dimentional similarity. Journal of Applied Crystallography, 2002, 35: 648-649.
    [36] Gilbert David, Westhead David, Viksna Junis et al. A computer system to perform structure comparison using TOPS representations of protein structure. Journal of Computers and Chemistry, 2001, 26(1): 23-30.
    [37] Michalopoulos Ioannis, Torrance Gilleain M, Gilbert David R et al. TOPS: An enhanced database of protein structural topology. Nucleic Acids Research, 2004, 32(Supplement 1): D251-254.
    [38] Rogen Peter, Bohr Henrik. A new family of global protein shape descripters. Mathematical Biosciences, 2003, 182: 167-181.
    [39] Rogen Peter, Fain Boris. Automatic classification of protein structure by using Gauss integrals. Proceedings of the National Academy of Sciences, 2003, 100: 119-124.
    [40] Krasnogor N, Pelta D A. Measuring the similarity of protein structures by means of the universal similarity metric. Bioinformatics, 2004, 20(7): 1015-1021.
    [41] Kawabata T. MATRAS: A program for protein 3D structure comparison. Nucleic Acids Research, 2003, 31(13): 3367-3369.
    [42] Lisewski Andreas Martin, Lichtarge Olivier. Rapid detection of similarity in protein structure and function through contact metric distances. Nucleic Acids Research, 2006, 34(22): el52.
    [43] Andreeva A, Howorth D, Brenner S E et al. SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Research, 2004, 32: D226-D229.
    [44] Lo Conte L, Brenner S E, Hubbard T J P et al. SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Research, 2002, 30(1): 264-267.
    [45] Michie A D, orengo C A, Thornton J M. Analysis of domain structural class using an automated class assignment protocol. Journal of Molecular Biology, 1996, 262: 168-185.
    [46] Ma L, Chen Z, Erdjument-Bromage H et al. Phosphorylation and functional inactivation of TSC2 by Erk implications for tuberous sclerosis and cancer pathogenesis. Cell, 2005, 121: 179-193.
    [47] Manning G, Whyte D B, Martinez R et al. The protein kinase complement of the human genome. Science, 2002, 298: 1912-1934.
    [48] Kraft C, Herzog F, Gieffers C et al. Mitotic regulation of the human anaphase-promoting complex by phosphorylation. The EMBO Journal, 2003, 22: 6598-6609.
    [49] Rychlewski L, Kschischo M, Dong L et al. Target specificity analysis of the Abl kinase using peptide microarray data. Journal of Molecular Biology, 2004, 336: 307-311.
    [50] Knight Z A, Schilling B, Row R H et al. Phosphospecific proteolysis for mapping sites of protein phosphorylation. Nature Biotechnology, 2003, 21: 1047-1054.
    [51] Beausoleil S A, Jedrychowski M, Schwartz D et al. Large-scale characterization of HeLa cell nuclear phosphoproteins. Proceedings of the National Academy of Sciences, 2004, 101: 12130-12135.
    [52] Kreegipuu A, Blom N, Brunak S. PhosphoBase, a database of phosphorylation sites: release 2.0. Nucleic Acids Research, 1999, 27: 237-239.
    [53] Manning B D, Cantley L C. Hitting the target: emerging technologies in the search for kinase substrates. Science's STKE, 2002, 162: PE49.
    [54] Obenauer J C, Cantley L C, Yaffe M B. Scansite 2.0: Proteomc-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Research, 2003, 31: 3635-3641.
    [55] Kim J H, Lee J, Oh B et al. Prediction of phosphorylation sites using SVMs. Bioinformatics, 2004, 20: 3179-3184.
    [56] Huang H D, Lee T Y, Tzeng S W et al. KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites. Nucleic Acids Research, 2005, 33: W226-W229.
    [57] Huang H D, Lee T Y, Tzeng S W et al. Incorporating hidden Markov models for identifying protein kinase-specific phosphorylation sites. Journal of Computers and Chemistry, 2005, 26: 1032-1041.
    [58] Pasak S,Andrew R D, Zheng R Y. Redicting the Phosphorylation Sites Using Hidden Markov Models and Machine Learning Methods. Journal of Chemical Information and Modeling, 2005, 45: 1147-1152.
    [59] Xue Y, Zhou F, Zhu M et al. GPS: a comprehensive www server for phosphorylation sites prediction. Nucleic Acids Research, 2005, 33: W184-187.
    [60] Zhou F F, Xue Y, Chen G L et al. GPS: a novel group-based phosphorylation predicting and scoring method. Biochemical and Biophysical Research Communications, 2004, 325: 1443-1448.
    [61] Xue Y, Li A, Wang L et al. PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory. BMC Bioinformatics, 2006, 7: 163.
    [62] Biondi R M, Nebreda A R. Signalling specificity of Ser/Thr protein kinases through docking-site- mediated interactions. Biochemical Journal, 2003, 372: 1-13.
    [63] Holland P M, Cooper J A. Protein modification: docking sites for kinases. Current Biology, 1999, 9(9): R329-R331.
    [64] Brinkworth R I, Breinl R A, Kobe B. Structural basis and prediction of substrate specificity in protein serine/threonine kinases. Proceedings of the National Academy of Sciences, 2003, 100: 74-79.
    [65] Li L, Shakhnovich E I, Mirny L A. Amino acids determining enzyme-substrate specificity in prokaryotic and eukaryotic protein kinases. Proceedings of the National Academy of Sciences, 2003, 100: 4463-4468.
    [66] Pennington S R,Dunn M J编著.钱小红,贺福初等译.蛋白质组学:从序列到功能.北京:科学出版社,2002.
    [67] Merchant M, Weinberger S R. Recent advancements in surface enhanced laser desorption/ionization time of flight mass spectrometry. Electrophoresis, 2000, 21(6): 1164-1177.
    [68] Issaq H J, Veenstra T D, Conrads T P et al. The SELDI-TOF-MS approach to proteomics: protein profiling and biomarker identification. Biochemical Biophysical Research Communications, 2002, 292: 587-592.
    [69] 马晓红,殷福亮,杜晓辉.基于信号处理技术的癌症检测和分类方法研究.中国生物医学工程学报,2002,21(1):84-89.
    [70] 马晓红,杜晓辉,孔祥维.一种利用模糊神经网络对癌症进行分类的方法.大连理工大学学报,1999,39(6):816-819.
    [71] 王晶,卫金茂,由军平.支持向量机及其在癌症诊断中的应用.计算机工程与应用,2005,41(36):220-222.
    [72] 高智勇,龚健雅,秦前清等.支持向量机在早期癌症检测中的应用.生物医学工程学杂志,2005,22(5):1045-1048.
    [73] 王家祥,张蛟,刘秋亮等.基于支持向量机对肾母细胞瘤患者血清蛋白质标记物的检测分析.中华医学杂志,2006,86(42):2982-2985.
    [74] 刘晶,廖志芳,樊晓平等.基于决策树分类技术的大肠早癌诊断系统研究.中国医学工程,2005,13(5):462-465.
    [75] Tibshirani Robert, Hastie Trevor, Narasimhan Balasubramanian et al. Sample classification from protein mass spectrometry by peak probability contrasts. Bioinformatics, 2004, 20(17): 3034-3044.
    [76] Wu Baolin, Abbott Tom, Fishman David et al. Comparison of statiscal methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics, 2003, 19(13): 1636-1643.
    [77] Shannon C E. A mathematical theory of communication. Bell Systems Technical Journal, 1948, 27: 379-423.
    [78] Christoph Adami. Information theory in molecular biology. Physics of Life Reviews, 2004, 1: 3-22.
    [79] Glatin L. Information Theory and the Living System. New York: Columbia University Press, 1972.
    [80] Schmitt Armin O, Herzel Hanspetel. Estimating the Entropy of DNA Sequences. Journal of theoretical Biololy, 1997, 1888: 369-377.
    [81] Loewenstern D, Yianilos P N. Significantly lower entropy estimates for natural DNA sequences. Journal Of Computational Biology, 1999, 6(1): 125-142.
    [82] Vinga Susana Almeida. Renyi continuous entropy of DNA sequences. Journal of Theoretical Biology, 2004, 231: 377-388.
    [83] Krishnamachari K, Krishnamachari A. Sequence variability and long-range dependence in DNA: An information theoretic perspective. Neural Information Processing Lecture Notes In Computer Science, 2004, 3316: 1354-1361.
    [84] Tan Yuan de. A new measure for DNA Sequence Information. Journal of Biomathematics, 1999, 15(1): 45-54.
    [85] Kirillova Olga V. Entropy concepts and DNA investigations. Physics Letters A, 2000, 274: 247-253.
    [86] Ramon Roman-Roldan, Pedro Bernaola galvan, Josieel Oliver. Application of Information Theory to DNA sequence analysis: A review. Pattern Recognition, 1996, 29(7): 1187-1194.
    [87] Schneider Thomas D, Mastronarde David N. Fast multiple alignment of ungapped DNA sequences using information theory and a relaxation method. Discrete Applied Mathematics, 1996, 71: 259-268.
    [88] Yang W L, Pi X J, Zhang L Q. Similarity analysis of DNA sequences based on the relative entropy. Proceddings Lecture Notes In Computer Science, 2005, 3610: 1035-1038.
    [89] 刘军,许甫荣.基于相对熵原理构建生物进化系统树.北京大学学报(自然科学版),2003,39卷(增刊):76-81.
    [90] Fang W W, Roberts F S, Ma E R. An approach of information theory to multiple sequence comparison. Technical Report. DIMACS, 1999: 56.
    [91] Fang WW. The characterization of a measure of information discrepancy. Information Science,2000, 125: 207-232.
    [92] Fang W W, Roberts F, Ma Z. A measure of discrepancy of multiple sequences. Information Science, 2001, 137: 75-102.
    [93] 张文,唐焕文,方伟武等.信息离散性度量方法在SARS病毒研究中的应用.计算机与应用化学,2003,20(6):719-723.
    [94] 张文,唐焕文,方伟武等.基于全蛋白质组的微生物系统发育树构建.大连理工大学学报,2005,45(6):925-930.
    [95] 蔡旭,方伟武,张文.基于线粒体全基因组的非比对方法比较.计算机与应用化学,2005,22(10):837-844
    [96] Wang J, Fang W, ling L et al. Gene's functional arrangement as a measure of the phylogenetic relationships of microorganisms. Journal of Biological Physics, 2002, 28: 55-62.
    [97] Xiao M, Zhu Z Z, Liu J Pet al. A new method based on entropy theory for genomic sequence analysis. ACTA BIOTHEORETICA, 2002, 50(3): 155-165.
    [98] Li Chun, Wang Jun. Relative entropy of DNA andits application. Physics A, 2005, 347: 465-471.
    [99] Krishnamachari A, Mandal Vijnan Moy, Karmeshu. Study of DNA binding sites using the Renyi parametric entropy measure. Journal of Theoretical Biology, 2004, 227: 429-436.
    [100] Strait B J, Dewey T G. The Shannon Information entropy of protein sequences. Biophysical Journal, 1996, 71(1): 148-155.
    [101] Liao H, Yeh W, Chiang D et al. Protein sequence entropy is closely related to packing density and hydrophobicity. Protein Engineering, Design & Selection, 2005, 18(2): 59-64.
    [102] 苏计国,王宝翰,焦雄等.基于HNP三态模型及相对熵方法的蛋白质折叠研究.生物化学与生物物理进展,2006,33(5):479-484.
    [103] Dewey T G, Strait B J. Protein folding and information theory. Biophysical Journal, 1996, 70(2): MP147-MP147.
    [104] Jiao Xiong, Wang Baohan, Su Jiguo et al. Protein design based on the relative entropy. Physical Review E, 2006, 73: 061903.
    [105] 刘埘,王宝翰,王存新等.基于相对熵的蛋白质设计新方法.中国科学(G辑),2003,33(4):348-356.
    [106] 卢本卓,王宝翰,王存新.用于真实蛋白质结构预测的一种新的优化方法.化学物理学报,2003,16(2):117-121.
    [107] Valdar William S. Scoring Residue Conservation. Proteins: Structure, Function, and Genetics, 2002, 48: 227-241.
    [108] Mihalek I, Res I, Lichtarge O. A Family of Evolution-Entropy Hybrid Methods for Ranking Protein Residues by Importance. Journal of Molecular Bilogy, 2004, 336: 1265-1282.
    [109] Liu Xinsheng, Li Jing, Guo Wanlin et al. A new method for quantifying residue conservation and its applications to the protein folding nucleus. Biochemical and Biophysical Research Communications, 2006, 351: 1031-1036.
    [110] Hossein Naderi-Manesh, Mehdi Sadeghi, Shahriar Arab. Prediction of Protein Surface Accessibility with Information Theory. Proteins: Structure, Function, and Genetics, 2001, 42: 452-459.
    [111] Crooks Gavin E, Brenner Steven E. Protein secondary structurc: cntropy, correlations and predic- tion. Bioinformatics, 2004, 20(10): 1603-1611.
    [112] Nckrasov A N. Entropy of protein sequences: An integral approach. Journal of Biomolecular Structure & Dynamics, 2002, 20(1): 87-92.
    [113] Zhang Min, Fang Weiwu, Zhang Junhua et al. MSAID: multiple sequence alignment based on a measure of information discrepancy. Computational Biology and Chemistry, 2005, 29: 175-181.
    [114] Jin Lixia, Fang Weiwu, Tang Huanwen. Prediction of protein structural classes by a new measure of information discrepancy. Computational Biology and Chemistry, 2003, 27: 373-380.
    [115] Song jie, Tang Huanwen. Accurate Classification of Homodimeric vs Other Homooligomeric Proteins Using a New Measure of Information Discrepancy. Journal of Chemical Information and Computer Sciences, 2004, 44: 1324-1327.
    [116] Murata M, Richardson J S, Sussman J L. Simultaneous comparison of three protein sequences. Proceedings of the National Academy of Sciences(USA), 1985, 82: 3073-3077.
    [117] Lawrence C E, Altschul S F, Boguski M S et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 1993, 262: 208-214.
    [118] Kim J, Pramanik S, Chung M J. Multiple sequence alignment using simulated annealing. Computer Applications in the Biosciences, 1994, 10: 419-426.
    [119] Notredame C, Higgins D G. SAGA: sequence alignment by genetic algorithm. Nucleic Acids Research, 1997, 24: 1515-1524.
    [120] Taylor W R. Multiple sequence alignment by a pair wise algorithm. Computer Applications in the Biosciences, 1987, 3: 81-87.
    [121] Thomposon J D, Gibson T J, Higgins D et al. CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 1994, 22: 4673-4680.
    [122] Taylor W R, S(?)lensminde Gibson, Eidhammer Ingvar. Multiple protein sequences alignment using double dynamic programming. Computer and Chemistry, 2000, 24: 3-12.
    [123] 张敏.生物信息学中多序列比对算法的研究:博士学位论文.大连:大连理工大学研究生院,2005.
    [124] Dayhoff M O, Schwartz R M, Orcutt B C. A model of evolutionary change in proteins. In: Dayhoff M O Edit. Atlas of Protein Sequence and Structure. 1982: 345-352.
    [125] 杨福愉主编.生物膜.北京:科学出版社,2004.
    [126] Hirokawa T, Boon-Chieng S, Mitaku S. SOSUI: classification and secondary structures prediction system for membrane proteins. Bioinformatics, 1998, 14: 378-379.
    [127] Wimley W C. Toward genomic identification of beta-barrel membrane proteins: composition and architecture of known structures. Protein Science, 2002, 11: 301-312.
    [128] Martelli P L, Fariselli P, Krogh Aet al. A sequence-profile based HMM for predicting and discriminating beta barrel membrane proteins. Bioinformatics, 2002, 18: S46-S53.
    [129] Liu Q, Zhu Y, Wang Bet al. Identification of beta-barrel membrane proteins based on amino acid composition properties and predicted secondary structure. Computational Biology and Chemistry, 2003, 27: 355-361.
    [130] Bigelow H R, Petrey D S, Liu J et al. Predicting tranmembrane beta-barrels in proteomes. Nucleic Acids Research, 2004, 32: 2566-2567.
    [131] Garrow A G, Agnew A, Westhead D R. TMB-Hunt: a web server to screen sequence sets for tranmembrane beta-barrels proteins. Nucleic Acids Research, 2005, 33: W188-W192.
    [132] Bagos P G, Liakopoulos T D, Spyropoulos I C et al. A Hidden Markov Model method capable of predicting and discriminating beta-barrel outer membrane proteins. BMC Bioinformatics, 2004, 5: 29.
    [133] Natt N K, Kaur H, Raghava G P. Prediction of tranmembrane regions of beta-barrel proteins using ANN- and SVM-based methods. Proteins-Structure Function and Genetics, 2004, 56: 11-18.
    [134] Gromiha M M, Suwa M. A simple statistical method for discriminating outer membrane proteins with better accuracy. Bioinformatics, 2005, 21: 961-968.
    [135] Gromiha M M, Abroad S, Suwa M. Application of residue distribution along the sequence for discriminating outer membrane proteins. Computational Biology and Chemistry, 2005, 29: 135-142.
    [136] Gromiha M M. Motifs in outer membrane protein sequences: applications for discrimination. Biophysical Chemistry, 2005, 117: 65-71.
    [137] Gromiha M M, Suwa M. Discrimination of Outer membrane Proteins Using Machine Learning Algorithms. Proteins: Structure,Function,Bioinformatics, 2006, 63: 1031-1033.
    [138] Gardy J L, Spencer C, Wang K et al. PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Research, 2003, 31: 3613-3617.
    [139] Wang Z X, Yuan Z. How good is prediction of protein structural class by the component-coupled method? Proteins-Structure Function and Genetics, 2000, 38: 165.
    [140] 宋杰.生物信息数据挖掘中的若干方法及其应用研究:博士学位论文.大连:大连理工大学研究生院,2005.
    [141] Li Lewyn, Shakhnovich Eugene L, Mirny Leonid A. Amino acids determining enzyme-substrate specifity in prokaryotic protein kinases. Proceedings of the National Academy of Sciences, 2003, 100(8): 4463-4468.
    [142] 默里R K等著.宋惠萍等译.哈珀生物化学.北京:科学出版社,2003.
    [143] 黑姆斯B D,胡珀N M等著.王镜岩,张新越等译.生物化学.北京:科技出版社,2004.
    [144] 黄慕烈,朱利泉主编.生物化学.北京:中国农业出版社,2004.
    [145] Kyte J, Doolittle R F. A Simple Method for Displaying the Hydropathic Character of a Protein. Molecular Biology, 1982, 157: 105-132.
    [146] Wimley W C, White S H. Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nature Structural Biology, 1996, 3: 842-848.
    [147] Hessa T, Kim H, Bihlmaier K et al. Recognition of transmembrane helices by the endoplasmic reticulum translocon. Nature, 2005, 433: 377-381.
    [148] Brinkworth R I, Breinl R A, Kobe B. Structural basis and prediction of substrate specificity in protein serine/threonine kinases. Proceedings of the National Academy of Sciences(USA), 2003, 100: 74-79.
    [149] Kreegipuu A, Blom N, Brunak S et al. Statistical analysis of protein kinase specificity determinants. FEBS Letters, 1998, 430: 45-50.
    [150] Konieczny Leluk J, Roterman I. Search for Structural Similarity in Proteins. Bioinformatics, 2003, 19: 117-124.
    
    [151] Bostick D, Vaisman II. A New Topological Method to Measure Protein Structure Sinilarity. Biochemical and Biophysical Research Communications, 2003, 304: 320-325.
    [152] Gilbert D, Westhead D, Nagano N et al. Motif-based searthing in TOPS protein topology databases. Bioinformatics, 1999, 15: 317-326.
    [153] Karlin S, Zuker M, Brocchieri L. Measuring residue associations in protein structures-possible implications for protein folding. Journal of Molecular biology, 1994, 239: 227-248.
    [154] Brocchieri L, Karlin S. How are close residues of protein structures distributed in primary sequence. Proceedings of the National Academy of Sciences, 1995, 92: 12136-12140.
    
    [155] Ponnuswamy P K, Prabakaran M, Manavalan P et al. Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins. Biochimica et Biophysica Acta, 1980, 623: 301- 316.
    [156] Gromiha M M, Selvaraj S. Influence of medium and long range interactions in different structural classes of globular proteins. Journal of Biological Physics, 1997, 23: 151-162.
    [157] Gromiha M M, Selvaraj S. Influence of medium and long range interactions in protein folding. Preparative Biochemistry & Biotechnology, 1999, 29: 339-351.
    [158] Kumarevel T S, Gromiha M M, Selvaraj S et al. Influence of medium and long-range interactions in different folding types of gobular proteins, BBiophysical Chemistry. 2002, 99: 189-198.
    [159] Gromiha M M, Selvaraj S. Inter-residue interactions in the protein folding and stability. Progress in Biophysics & Molecular Biology, 2004, 86: 235-277.
    
    [160] Kullback S. Information Theory and Statistics. New York: Wiley, 1959.
    [161] Pertricoin E, Ardekani B, Levine P et al. Use of proteomic patterns in serum to identify ovarian cancer. Lancet, 2002, 359: 572-577.
    [162] Adam BL, Qu Y, Davis J et al. Serum protein fingerprinting coupled with a pattern matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healty men. Cancer research, 2002, 62: 3609-3614.
    
    [163] Li J, Zhang Z, Rosenzweig J et al. Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clinical Chemistry, 2002, 48(8): 1296-1304.
    
    [164] Qu Y, Adam BL, Yasui Y et al. Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clinical Chemistry, 2002, 48(10): 1835-1843.
    [165] Sorace J, Zhan M. A data review and reassessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics, 2003, 4: 24.
    [166] Qu Y, Adam B, Thornquist M. Data reduction using a discrete wavelet transform in discriminant analysis of very high dimensionality data. Biometrics, 2003, 59: 143-151.
    [167] Baggerley K, Morris J, Wang J. A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization-time of flight proteomics spectra from serum samples. Proteomics, 2003, 3(9): 1667-1672.
    [168] Levner Ilya. Feature selection and nearest Centroid classification for protein mass spectrometry. BMC Bioinformatics, 2005, 6: 68.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700