蛋白质Beta折叠的分析与预测及生物信息工具开发
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
β折叠是一种重要的蛋白质二级结构类型之一,也是影响蛋白质结构预测精度的主要因素之一。对β折叠结构的深入研究和精确预测能够在很大程度上提高蛋白质结构预测的准确率,并对蛋白质折叠和蛋白质设计研究有重要的推动作用。本文就重点对β折叠结构进行研究。
     研究使用来自PISCES服务器的一个数据集。在对数据进行前处理时,改造和完善了我们前期工作中构建的SheetsPair数据库,并将PISCES的数据集整合到SheetsPair数据库中,后续的研究就通过该数据库管理数据。
     对β折叠结构的研究,遵循从β股间氨基酸配对出发到β股肽链配对的路线。首先对β股间的氨基酸配对进行了统计分析。结果表明,股间氨基酸配对不是随机的,而从整体上表现出一种明显的配对亲和倾向。基于统计结果,还分别得到了平行折叠、反平行折叠和总体β折叠的反映氨基酸配对偏好性的相对频率矩阵,这些矩阵成为我们后续研究的基础。分析还发现疏水作用和二硫键是影响氨基酸配对的两种主要因素,此外尚有其他因素(如周围环境)可能也影响氨基酸配对。平行折叠和反平行折叠的氨基酸配对偏好性也不相同。
     然后基于计量多维尺度(MMDS)的方法,对氨基酸配对偏好性进行了分析。通过MMDS的方法,将相对频率矩阵中反映的氨基酸配对的主要特征以图形方式直观地展示出来。在平行折叠、反平行折叠和总体β折叠的MMDS图中都可以看到有一个明显的氨基酸聚集“核心”,位于“核心”的氨基酸主要是疏水性较强的氨基酸,说明了疏水作用在β折叠结构中的重要性。通过MMDS分析,也发现了平行折叠和反平行折叠的氨基酸配对亲和性的差异,这为今后开发预测区分平行折叠和反平行折叠的算法打下了基础。基于MMDS分析的结果,并结合分层聚类的方法,还提出了一种对20种氨基酸聚类降维的方式:总体上将20种氨基酸聚为5类最优,而单独考察平行折叠时聚为6类最优,单独考察反平行折叠时聚为4类最优。
     在前面对β股间氨基酸配对分析的基础上,下面考察β股肽链的配对和排列。从直观上讲,β股的配对排列至少应包括三个方面的研究内容:(1)确定配对关联,即确定组成β片层的各条β股的两两配对关系;(2)预测配对的两条β股的相对方向(平行或反平行);(3)确定配对的两条β股的相对位置。我们的研究就围绕这三个方面分别展开。
     首先重点考察了第(2)方面,即配对β股的相对方向(平行或反平行)。基于前面分析得到的氨基酸配对相对频率矩阵,分析了氨基酸配对与β股排列方向的关系。结果表明,股间氨基酸配对与β股的平行/反平行的排列方向具有十分显著的相关性,股间氨基酸的相互作用在β折叠形成的平行/反平行排列方向的确定上起到了重要的甚至是决定性的作用,而环境因素和其他不确定因素在这方面的影响较小。我们从这个结论出发,采用一种新的编码方式,并基于支持向量机(SVM)开发了一种预测β折叠平行/反平行排列方向的方法。结果表明,该方法可获得比较高的预测准确率(86.89%的准确率和0.7126的Matthew系数值)。
     在第(1)方面,对β股配对关联规律进行了初步研究,发现β折叠股配对关联较多地表现出一种邻近配对倾向(“先来先配”倾向)。在反平行折叠中,相邻β股的配对还有对氨基酸距离的较强偏好性;而在平行折叠中,这种偏好性较弱。
     在第(3)方面,发现组成β片层的β股肽链在两两配对排列时,其末端并不一定彼此对齐,而往往出现一定的“延伸末端”。通过对延伸末端的统计分析表明,配对部分的长度占延伸长度(延伸长度是配对部分长度与两端的延伸末端长度之和)的比例一般要超过25%,配对部分的长度占β折叠股长度的比例一般要超过40%。
     基于研究实践中摸索和积累的许多生物信息学研究经验,我们开发了一些软件或工具,可为包括β折叠在内的许多生物信息学研究带来便利。这些工具主要有:用于β折叠股间氨基酸配对可视化的StrandPairsViewer软件、用于生物大分子序列关系动态绘图和可视化分析的SRD软件、用于时间序列数据读取和展示的NRChart控件(ActiveX控件)、用于膜片钳数据前处理的PCDReader软件、用于长时程增强(LTP)实验数据文本转换的LTPConverter工具、用于日常生物信息通用纯文本处理的超级记事本软件等。其中对许多软件和工具都在其性能优化上做了大量工作(提高运行速度、减少占用内存等)。文中对软件的特点、主要功能、以及主要的程序设计技术、方法技巧等进行了介绍。
The (3-sheet is one of the most important protein secondary structures, and has remained one of the main stumbling blocks of protein structure predictions. An in-depth study and an accurate prediction of (3-sheet may lead to noticeable improvements in de novo protein structure prediction and in the study of protein folding and design. In this study, we mainly explored theβ-sheet structure.
     The dataset used was taken from the PISCES server. Based on our SheetsPair database constructed previously, we prepaired all proteins in the PISCES dataset and integrated them into the database. And then the database was used to manage all the protein data for our further studies.
     We pursued a research strategy from the interstrand amino acid pairs to (3-strand (peptide segment) arrangement. First of all, statistical analysis had been done on the amino acid pairs and non-random appetency propensities had been revealed. Based on the statistical results, three relative frequency (RF) matrices were obtained for parallel, antiparalllel, and total P-strands, respectively. These matrices were then used widely in our further studies. It was shown that the hydrophobic strength and the disulphide forces were the two main factors influencing the interstrand amino acid pairs. Additionally, it seemed that other aspects (such as surroundings) could also contribute to the pairing. Furthermore, analysis results revealed that there were noteable differences in the amino acid pairing preferences between parallel and antiparallelβ-strands.
     We then analyzed the amino acid pairing preferences based on the method of metric multi-dimensional scaling (MMDS). The MMDS method was used for making a visual representation for the RF matrices representing the interactions between amino acids. As the MMDS maps showed, there was a distinct "core" constructed mainly by strong hydrophobic amino acids on each map of parallel, antiparallel and totalβ-strands, respectively. This indicated again the importance of the hydrophobic strength in the amino acid pairs. Another found was that the MMDS maps for parallel and antiparallelβ-strands were different, which could be used in our further study to develop methods for predicting parallel and antiparallel orientation. We also use a hierarchical clustering method on our MMDS results to group the 20 amino acids. It arrived at an optimum number of 5 groups for total, but 6 for parallel and 4 for antiparallel.
     From the results on the analysis of the amino acid pairs above, we then investigated theβ-strand (peptide segment) arrangement. At the most straightforward level, full (3-strand arrangement could consist of:(i) finding the interacting partnerβ-strand(s), (ii) predicting the relative orientation (i.e. parallel or antiparallel) and (iii) shifting the relative positions of the two interactingβ-strands. Our further studies were performed according to these three aspects.
     First of all, we mainly focused on the second aspect of the three above, i.e. the parallel or antiparallel orientation. By extracting features from the RF matrices, we found that the interstrand amino acid pairs played a significant role in determining the parallel or antiparallel orientation ofβ-strands, and the influences of the surroundings and other uncertain factors were small in this aspect. From these conclusions, we proposed a new encoding scheme and developed a support vector machine-based approach for the prediction of the parallel/antiparallel orientation ofβ-strands. As a result, a prediction accuracy of 86.89% and a Matthew's correlation coefficient value of 0.7126 had been achieved.
     In the first aspect of the three above, we preformed a preliminary study on the strand partner distribution. Results showed that most P-strands inclined to part with its neareast neighbour strands (or "First Come First Pair" rule). Furthermore, neareast neighbour paired P-strands had more strong preferences in amino acid distances in antiparallel, but it was not so strong in parallel.
     In the third aspect of the three above, it was found that the ends of one P-strand did not align with the ends of another, but extend a part of it, when they arranged to form aβ-sheet. Statistical results showed that the ratio of the length of the paired part to the extended length (the extended length is the length of paired part plus lengths of two extending parts) was more than 25%, and the ratio of the length of the paired part to the length of the P-strand was more than 40%, generally.
     In the present study, there has been a lot of research in field of bioinformatics. From our experiences and techniques, we developed several software or computer utilities to facilitate the future studies ofβ-strands and studies of other fields of bioinformatics. Such software or computer utilities are as following: StrandPairsViewer software for interstrand amino acid pairs visualization, SRD software for DNA/Protein sequence relationship visualization based on undirected graphs, NRChart control (an ActiveX control) for time series data reading and visualization, LTPConverter tool for long-term potentiation (LTP) experiments data conversion, Super Notepad software for ASCII text processing for daily bioinformatics research, etc. Many efforts had been done to make these software or computer utilities run faster and occupy less memory. The features, appplications, programming methods and techniques of them have been presented in the dissertation.
引文
[1]Kloczkowski A, Ting K L, Jernigan R L, et al. Protein secondary structure prediction based on the GOR algorithm incorporating multiple sequence alignment information. Polymer, 2002,43:441-449
    [2]Huang J T, Wang M T. Secondary structural wobble:the limits of protein prediction accuracy. Biochemical and Biophysical Research Communications,2002,294:621-625
    [3]PDB. RCSB Protein Data Bank (May,25,2010). http://www.rcsb.org/pdb/home/home.do, 2010
    [4]. SWISS-PROT. SWISS-PROT Protein Knogwledgebase Release 2010_06 Statistics (May,18, 2010). http://www.expasy.org/sprot/relnotes/relstat.html,2010
    [5]Steward R E, Thornton J M. Prediction of Strand Pairing in Antiparallel and Parallel β-Sheets Using Information Theory. PROTEINS:Structure,Function,and Genetics,2002,48:178-191
    [6]Anfinsen C B. Principles that govern the folding of protein chains. Science,1973,181(4096): 223-230
    [7]Jones D T. GenTHREADER:an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol,1999,287:797-815
    [8]Schaffer A A, Wolf Y I, Ponting C P, et al. IMPALA:matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics,1999, 15:1000-1011
    [9]Kelley L A, MacCallum R M, Sternberg M J. Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol,2000,299:499-520
    [10]Shi J, Blundell T L, Mizuguchi K. Fugue:sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol, 2001,310:243-257
    [11]Bonneau R, Tsai J, Ruczinski I, et al. Rosetta in CASP4:progress in ab initio protein structure prediction. Proteins,2001,45Suppl5:119-126
    [12]Simons K T, Kooperberg C, Huang E, et al. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol,1997,268:209-225
    [13]Das R, Qian B, Raman S, et al. Structure prediction for CASP7 targets using extensive all-atom refinement with Rosetta@home. Proteins,2007,69(S8):118-128
    [14]Sternberg M J, Bates P A, Kelley L A, et al. Progress in protein structure prediction: Assessment of CASP3. Curr Opin Struct Biol,1999,9(3):368-373
    [15]Venclocas C, Zemla A, Fidelis K, et al. Some measures of comparative performance in the three CASPs. Proteins Suppl,1999,3:231-237
    [16]Zhang Y, Skolnick J. Automated structure prediction of weakly homologous proteins on a genomic scale. Proc.Natl Acad.Sci,2004,101(20):7594-7599
    [17]Zhang Y.I-TASSER:Fully automated protein structure prediction in CASP8. Proteins,2009, 77(S9):100-113
    [18]Dou Y, Baisnee P, Pollastri G, et al.ICBS:a database of interactions between protein chains mediated by β-sheet formation. Bioinformatics,2004,20(16):2767-2777
    [19]Bystroff C, Thorsson V, Baker D, et al. HMMSTR:a hidden Markov model for local sequence-structure correlations in proteins. J Mol Biol,2000,301:173-190
    [20]Ortiz A R, Kolinski A, Rotkiewicz P, et al. Ab initio folding of proteins using restraints derived from evolutionary information. Proteins,1999,37:177-185
    [21]Osguthorpe D J. Ab initio protein folding. Curr Opin Struct Biol,2000,10:146-152
    [22]Samudrala R, Xia Y, Huang E, et al. Ab initio protein structure prediction using a combined hierarchical approach. Proteins,1999, Suppl3:194-198
    [23]Cheng H T, Sen T Z, Kloczkowski A, et al. Prediction of protein secondary structure by mining structural fragment database. Polymer,2005,46:4314-4321
    [24]Song Z, Zhang N, Yang Z, et al. Strand Segments from Primary Protein Sequences by a Set of Neural Networks. Lecture Notes in Computer Science,2007,4492:1248-1253
    [25]Hua S J. Sun Z R. A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure:Support Vector Machine Approach. J.Mol.Biol.2001.308: 397-407
    [26]Galzitskaya O V, Ivankov D N, Finkelstein A V. Folding nuclei in proteins. FEBS Lett,2001, 489:113-118
    [27]Onuchic J N, Wolynes P G. Theory of protein folding. Curr Opin Struct Biol,2004,14(1): 70-75
    [28]Dayalan S, Gooneratne N D, Bevinakoppa S, et al. Dihedral angle and secondary structure database of short amino acid fragments. Bioinformation,2006,1(3):78-80
    [29]Wilson C L, Boardman P E, Doig A J, et al. Improved prediction for N-termini of a-helices using empirical information. Proteins,2004,57:322-330
    [30]Fooks H M, Martin A C, Woolfson D N, et al. Amino Acid Pairing Preferences in Parallel b-Sheets in Proteins. J.Mol.Biol,2006,356:32-44
    [31]Rohl C A, Strauss C E, Misura K M. Protein structure prediction using Rosetta. Methods Enzymol,2004,383:66-93
    [32]Lee J, Kim S Y, Lee J. Protein structure prediction based on fragment assembly and parameter optimization. Biophys Chem,2005,115(2-3):209-214
    [33]Lee J, Kim S Y, Joo K, et al. Prediction of protein tertiary structure using PROFESY, a novel method based on fragment assembly and conformational space annealing. Proteins,2004, 56(4):704-714
    [34]Jones D T. Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices. J.Mol.Biol,1999,292:195-202
    [35]Qiu J D, Liang R P, Zou X Y. Prediction of protein secondary structure based on continuous wavelet transform. Talanta,2003,61:285-293
    [36]Wang L H, Liu J, Li Y F, et al. Predicting Protein Secondary Structure by a Support Vector Machine Based on a New Coding Scheme. Genome Inform,2004,15(2):181-19
    [37]Cuff J A, Barton G J. Application of enhanced multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins,1999,40:502-511
    [38]Montgomerie S, Sundararaj S, Gallin W J, et al. Improving the accuracy of protein secondary structure prediction using structural alignment. BMC Bioinformatics,2006, 7:301
    [39]Rost B, Sander C. Prediction of protein secondary structure at better that 70% accuracy. J.Mol.Biol,1993,232:584-599
    [40]Kuhn M, Meiler J, Baker D. Strand-Loop-Strand Motifs:Prediction of Hairpins and Diverging Turns in Proteins. Proteins,2004,54:282-288
    [41]Parisien M, Major F. Ranking the factors that contribute to protein β-sheet folding. Proteins, 2007,68:824-829
    [42]Dill K A. Domain forces in protein folding. Biochemistry,1990,29(31):7133-7155
    [43]Merkel J S, Sturtevant J M, Regan L. Sidechain interactions in parallel b- sheets:the energetics of cross-strand pairings. Struct.Fold.Des,1999,7:1333-1343
    [44]Johnson R A, Wichern D W. Applied Multivariate Statistical Analysis. Pearson Education Inc,2006
    [45]French S, Robson B. What is conservative substitution? J. Mol.Evol,1983,19:171-175
    [46]Cornette J L, Cease K B, Margalit H, et al. Hydrophobicity scales and computational techniques for detecting amphipatic structures in proteins. J.Mol.Biol,1987,195:659-685
    [47]Chan H S. Folding alphabets. Nat.Struct.Biol,1999,6(11):994-996
    [48]Venkatarajan M S, Braun W. New quantitative descriptors of amino-acids based on multidimensional scaling of a large number of physical-chemical properties. J.Mol.Model, 2001,7:445-453
    [49]Chakrabartty A, Baldwin R L. Stability of a-helices. Adv.Protein Chem,1995,46:141-176
    [50]Rohl C A, Baldwin R L. Deciphering rules of helix stability in peptides. Methods Enzymol, 1998,295:1-26
    [51]Jager M, Dendle M, Fuller A A, et al. A cross-strand Trp-Trp pair stabilizes the hPin1 WW domain at the expense of function. Protein Sci,2007,16:2306-2313
    [52]Salem G M, Hutchinson E G, Orengo C A, et al.Correlation of observed fold frequency with the occurrence of local structural motifs. J Mol Biol,1999,287:969-981
    [53]Berman H M, Westbrook J, Feng Z, et al. The Protein Data Bank. Nucleic Acids Res,2000, 28:235-242
    [54]Searle M S, Ciani B. Design of b-sheet systems for understanding the thermodynamics and kinetics of protein folding. Current Opinion in Structural Biology,2004,14:458-464
    [55]Zhang N, Ruan J S, Wu J, et al. Sheetspair:A database of amino acid pairs in protein sheet structures. Data Science Journal,2007,6(15):S589~S595
    [56]Nowick J S. Exploring P-sheet Structure and Interactions with Chemical Model Systems. Acc.Chem.Res,2008,41(10):1319-1330
    [57]Lifson S, Sander C. Specific recognition in the tertiary structure of betasheets of proteins. J.Mol.Biol,1980,139:627-639
    [58]Wouters M A, Curmi P M G. An analysis of side-chain interactions and pair correlations within antiparallel b-sheets:the differences between backbone hydrogen-bonded and non-hydrogenbonded residue pairs. Proteins Struct.Funct.Genet,1995,22:119-131
    [59]Hutchinson G E, Sessions R B, Thornton J M, et al. Determinants of strand register in antiparllel b-sheets of proteins. Protein Sci,1998,7:2287-2300
    [60]Cochran A G, Tong R T, Starovasnik M A, et al. A minimal peptide scaffold for bturn display: Optimizing a strand position in disulfide-cyclized b-hairpins. J.Am.Chem.Soc,2001,123: 625-632
    [61]Russell S J, Cochran A. Designing stable b-hairpins:Energetic contributions from cross-strand residues. J.Am.Chem.Soc,2001,122:12600-12601
    [62]Chou K C, Nemethy G, Scheraga H A, et al. Role of interchain interactions in the stabilization of right-handed twist of β-sheets. Journal of Molecular Biology,1983,168: 389-407
    [63]Chou K C, Nemethy G, Scheraga H A, et al. Effects of amino acid composition on the twist and the relative stability of parallel and antiparallel beta-sheets. Biochemistry,1983.22: 6213-6221
    [64]Chou K C, Nemethy G, Rumsey S, et al. Interactions between an alpha-helix and a beta-sheet:Energetics of alpha/beta packing in proteins. Journal of Molecular Biology, 1985,186:591-609
    [65]Chou K C, Nemethy G, Rumsey S, et al. Interactions between two beta-sheets:Energetics of beta/beta packing in proteins. Journal of Molecular Biology,1986,188:641-649
    [66]Chou K C, Nemethy G, Scheraga H A, et al. Review:Energetics of interactions of regular structural elements in proteins. Accounts of Chemical Research,1990,23:134-141
    [67]Chou K C, Carlacci L. Energetic approach to the folding of alpha/beta barrels. Proteins:Structure,Function,and Genetics,1991,9:280-295
    [68]Chou K C, Scheraga H A. Origin of the right-handed twist of beta-sheets of poly-L-valine chains. Proceedings of National Academy of Sciences,USA,1982,79:7047-7051
    [69]de L C X, Hutchinson E G, Shepherd A, et al. Toward predicting protein topology:an approach to identifying Bhairpins. Proc Natl Acad Sci USA,2002,99:11157-11162
    [70]Hu X Z, Li Q Z. Prediction of the b-Hairpins in Proteins Using Support Vector Machine. Protein J,2008,27:115-122
    [71]Chou K C. Prediction of beta-turns, J Pept Res,1997,49:120-144
    [72]Kirschner A, Frishman D. Prediction of β-turns and β-turn types by a novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN). Gene, 2008,422:22-29
    [73]Meissner M, Koch O, Klebe G S G. Prediction of turn types in protein structure by machine-learning classifiers. Proteins,2009,74:344-352
    [74]Jahandideha S, Sarvestania A S, Abdolmalekia P, et al. g-Turn types prediction in proteins using the support vector machines. Journal of Theoretical Biology,2007,249:785-790
    [75]Zaremba S M, Gregoret L M. Context-dependence of Amino Acid Residue Pairing in Antiparallel b-Sheets. J.Mol.Biol,1999,291:463-479
    [76]Cheng J, Baldi P. Three-stage prediction of protein β-sheets by neural networks, alignments and graph algorithms.Bioinformatics,2005,21 Suppl 1:i75-i84
    [77]Zhang G Z, Huang D S, Quan Z H, et al. Combining a binary input encoding scheme with RBFNN for globulin protein inter-residue contact map prediction. Pattern Recognition Letters,2005,26:1543-1553
    [78]Cheng J, Baldi P. Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinformatics,2007,8:113-121
    [79]Baldi P, Pollastri G, Andersen C A, et al. Matching Protein b-Sheet Partners by Feedforward and Recurrent Neural Networks. Proc Int Conf Intell Syst Mol Biol,2000,8:25-36
    [80]Zhang N, Ruan J, Duan G, et al. The interstrand amino acid pairs play a significant role in determining the parallel or antiparallel orientation of b-strands. Biochemical and Biophysical Research Communications,2009,386:537-543
    [81]Zhang N, Duan G, Gao S, et al. Prediction of the parallel/antiparallel orientation of beta-strands using amino acid pairing preferences and support vector machines. Journal of Theoretical Biology,2010,263(3):360-368
    [82]Kolinski A, Betancourt M R, Kihara D, et al. Generalized comparative modeling (GENECOMP):a combination of sequence comparison, threading, and lattice modeling for protein structure prediction and refinement. Proteins,2001,44:133-149
    [83]Vapnik V. The nature of statistical learning theory. New York:Springer-Verlag,1995.
    [84]Vapnik V. Statistical learning theory, New York:Wiley,1998
    [85]Chou K C, Cai Y D. Using functional domain composition and support vector machines for prediction of protein subcellular location. J.Biol.Chem,2002,277:45765-45769
    [86]Zhang T L, Ding Y S. Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids,2007,33:623-629
    [87]Cai Y D, Feng K Y, Li Y X, et al. Support vector machine for predicting alpha-turn types. Peptides,2003,24:629-30
    [88]Cai Y D, Liu X J, Xu X B, et al. Support vector machines for the classification and prediction of beta-turn types. J Pept Sci,2002,8:297-301
    [89]Jahandideh S, Abdolmaleki P, Jahandideh M, et al. Novel two-stage hybrid neural discriminant model for predicting proteins structural classes. Biophys.Chem,2007,128(1): 87-93
    [90]Cai Y D, Zhou G P, Chou K C, et al. Support vector machines for predicting membrane protein types by using functional domain composition. Biophysical Journal,2003,84: 3257-3263
    [91]Cai Y D, Pong-Wong R, Feng K, et al. Application of SVM to predict membrane protein types. Journal of Theoretical Biology,2004,226:373-376
    [92]Chen C, Chen L, Zou X, et al. Prediction of protein secondary structure content by using the concept of Chou's pseudo amino acid composition and support vector machine. Protein & Peptide Letters,2009,16:27-31
    [93]Chen K, Kurgan L. PFRES:protein fold classification by using evolutionary information and predicted secondary structure. Bioinformatics,2007,23:2843-50
    [94]Cai Y D, Liu X J, Xu X B, et al. Prediction of protein structural classes by support vector machines. Computers & Chemistry,2002,26:293-296
    [95]Sun X D, Huang R B. Prediction of protein structural classes using support vector machines. Amino Acids,2006,30:469-475
    [96]Chen K, Kurgan L, Ruan J S, et al. Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs. BMC Structural Biology,2007,725:
    [97]Cai Y D, Liu X J, Xu X B, et al. Support vector machines for predicting the specificity of GalNAc-transferase. Peptides,2002,23:205-208
    [98]Cai Y D, Zhou G P, Jen C H, et al. Identify catalytic triads of serine hydrolases by support vector machines. Journal of Theoretical Biology,2004,228:551-557
    [99]Cai Y D. Liu X J, Xu X B, et al. Support Vector Machines for predicting HIV protease cleavage sites in protein. J Comput Chem,2002,23:267-274
    [100]Cai Y D, Lin S, Chou K C, et al. Support vector machines for prediction of protein signal sequences and their cleavage sites. Peptides,2003,24:159-161
    [101]Zhang G Z, Huang D S, Quan Z H, et al. Combining a binary input encoding scheme with RBFNN for globulin protein inter-residue contact map prediction. Pattern Recognition Letters,2005,26:1543-1553
    [102]Wang G, Dunbrack R L. PISCES:a protein sequence culling server. Bioinformatics,2003, 19(12):1589-1591
    [103]Wang G, Dunbrack R L. PISCES:recent improvements to a PDB sequence culling server. Nucleic Acids Research,2005,33WebServerissue:W94-W98
    [104]Linding R, Jensen L J, Diella F, et al. Protein Disorder Prediction:Implications for Structural Proteomics. Structure,2003,11:1453-1459
    [105]Ferron F, Longhi S, Canard B, et al. A Practical Overview of Protein Disorder Prediction Methods. PROTEINS:Structure,Function,and Bioinformatics,2006,65:1-14
    [106]Noivirt-Brik O, Prilusky J, Sussman J L. Assessment of disorder predictions in CASP8. Proteins,2009,77(Suppl9):210-216
    [107]Gao S, Zhang N, Duan G, et al. Predicting function changes associated to single point protein mutations by three of SVMs. Human Mutation,2009,30(8):1161-1166
    [108]Gupta A, Muanch J, Stacho L. Inverse Protein Folding in 2D HP Model. In:Proceedings of the,2004, IEEEcomputationalSystemsBioinformaticsConference(CSB2004):
    [109]Richardson J S. The anatomy and taxonomy of protein structure. Advan.Protein Chem, 1981,34:167-339
    [110]Harrison P M, Sternberg M J. The disulphide beta-cross:from cystine geometry and clustering to classification of small disulphide-rich protein folds. J.Mol.Biol,1996,264: 603-623
    [111]Schiffman S S,Reynolds M L,Young F W. Introduction to multidimensional scaling. New York:Academic Press,1981
    [112]Torgerson W S. Theory and Methods of Scaling. New York:Wiley,1958
    [113]Torgerson W S. Multidimensional scaling:Ⅰ. Theory and method.Psychometrica,1952, 17(4):401-419
    [114]Kruskal J B, Wish M. Multidimensional scaling. Beverly Hills, CA:Sage Publications, 1977
    [115]Rakshit S, Ananthasuresh G K. An amino acid map of inter-residue contact energies using metric multi-dimensional scaling. Journal of Theoretical Biology,2008,250:291-297
    [116]Mead A. Review of the development of Multidimensional Scaling methods. The Statistician,1992,41(1):27-39
    [117]Miyazawa S, Jernigan R L. Estimation of effective inter-residue contact energies from protein crystal structures:Quasi-chemical approximation. Macromolecules,1985,18: 534-552
    [118]Miyazawa S, Jernigan R L. Residue-residue potentials with a favorable contact pair term and unfavorable high packing density term. for simulation and threading.J Mol Biol,1996, 256:623-644
    [119]Eisenberg D, Wilcox W, McLachlan A D. Hydrophobicity and amphiphilicity in protein structure. J Cell Biochem,1986,31(1):11-7
    [120]Li T, Fan, Ke, et al. Reduction of protein complexity by residue grouping. Protein Eng, 2003,16:323-330
    [121]Murphy R L, Wallqvist A, Levy M R, et al. Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng,2000,13:149-152
    [122]Cieplak M, Holter S N, Maritan A, et al. Amino acid classes and protein folding problem. J.Chem.Phys,2001,114:1420-1423
    [123]Cannata N, Toppo S, Romualdi C, et al. Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices. Bioinformatics,2002,18: 1102-1108
    [124]Koisol C, Goldman N, Buttimore H N, et al. A new criteria and method for amino acid classification. J.Theor.Biol,2004,228:97-106
    [125]Ren Y, Liu H, Xue C, et al. Classification study of skin sensitizers based on support vector machine and linear discriminant analysis. Anal Chim Acta,2006,572:272-282
    [126]Meiler J, Baker D. Coupled prediction of protein secondary and tertiary structure. Proc Natl Acad Sci,2003,100:12105-12110
    [127]Chang C C, Lin C J. LIBSVM:a library for support vector machines.2001. Software available at http://www.csie.ntu.edu.tw/*cjlin/libsvm
    [128]Kuhlman B, Dantas G, Ireton G C, et al. Design of a novel globular protein fold with atomic-level accuracy. Science,2003,302:1364-1368
    [129]Chou K C, Zhang C T. Review:Prediction of protein structural classes. Critical Reviews in Biochemistry and Molecular Biology,1995,30:275-349
    [130]Chou K C, Shen H B. Cell-PLoc:A package of web-servers for predicting subcellular localization of proteins in various organisms. Nature Protocols,2008,3:153-162
    [131]Chou K C, Shen H B. Review:Recent progresses in protein subcellular location prediction. Analytical Biochemistry,2007,370:1-16
    [132]Chou K C, Shen H B. FoldRate:A web-server for predicting protein folding rates from primary sequence. The Open Bioinformatics Journal,2009,3:31-50
    [133]Chen K, Jiang Y F, Du L, et al. Prediction of integral membrane protein type by collocated hydrophobic amino acid pairs. J Comput Chem,2009,30:163-172
    [134]Ding Y S, Zhang T L. Using Chou's pseudo amino acid composition to predict subcellular localization of apoptosis proteins:an approach with immune genetic algorithm-based ensemble classifier. Pattern Recognition Letters,2008,29:1887-1892
    [135]Jiang X, Wei R, Zhang T L, et al. Using the concept of Chou's pseudo amino acid composition to predict apoptosis proteins subcellular location:an approach by approximate entropy. Protein & Peptide Letters,2008,15:392-396
    [136]Li F M, Li Q Z. Predicting protein subcellular location using Chou's pseudo amino acid composition and improved hybrid approach. Protein & Peptide Letters,2008,15:612-616
    [137]Lin H. The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. Journal of Theoretical Biology,2008,252: 350-356
    [138]Lin H, Ding H, Feng-Biao Guo F B, et al. Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition. Protein & Peptide Letters,2008,15:739-744
    [139]Shen H B, Song J N, Chou K C. Prediction of protein folding rates from primary sequence by fusing multiple sequential features. Journal of Biomedical Science and Engineering (JBiSE),2009,2:136-143
    [140]Wang G, Dunbrack R L. PISCES:a protein sequence culling server. Bioinformatics,2003, 19(12):1589-1591
    [141]Zhang G Y, Fang B S. Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou's amphiphilic pseudo amino acid composition. Journal of Theoretical Biology,2008,253:310-315
    [142]Zhang G Y, Li H C, Fang B S, et al. Predicting lipase types by improved Chou's pseudo-amino acid composition. Protein & Peptide Letters,2008,15:1132-1137
    [143]Zhou X B, Chen C, Li Z C, et al. Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. Journal of Theoretical Biology,2007,248:546-551
    [144]Wang Y, Xue Z, Shi X, et al. Prediction of π-turns in proteins using PSI-BLAST profiles and secondary structure information. Biochemical and Biophysical Research Communications,2006,347:574-580
    [145]Chan E C Y, Koh P K, Mal M, et al. Metabolic Profiling of Human Colorectal Cancer Using High-Resolution Magic Angle Spinning Nuclear Magnetic Resonance (HR-MAS NMR) Spectroscopy and Gas Chromatography Mass Spectrometry (GC/MS). Journal of Proteome Research,2009,8:352-361
    [146]Hu X Z, LI Q Z. Using support vector machine to predict beta- and gamma-turns in proteins. Journal of Computational Chemistry,2008,29:1867-1875
    [147]Huang N, Chen H, Sun Z R. CTKPred:an SVM-based method for the prediction and classification of the cytokine superfamily. Protein Engineering Design & Selection,2005, 18(8):365-368
    [148]Wang Y, Xue Z, Shen G, et al. PRINTR:Prediction of RNA binding sites in proteins using SVM and profiles. Amino Acids,2008,35:295-302
    [149]Witten I H, Frank E. Data Mining:Practical Machine Learning Tools and Techniques (Second Edition), Morgan Kaufmann,2005
    [150]Caragea C, Sinapov J, Silvescu A. et al. Glycosylation site prediction using ensembles of Support Vector Machine classifiers. BMC Bioinformatics,2007,8438:
    [151]Chou K C, Shen H B. Review:recent advances in developing web-servers for predicting protein attributes. Natural Science,2009,2:63-92
    [152]Biro J C. Indications that "codon boundaries" are physico-chemically defined and that protein-folding information is contained in the redundant exon bases. Theoretical Biology and Medical Modelling,2006,3:28. doi:101186/1742-4682-3-28
    [153]Rebeiz M, Posakony J W. GenePalette:a universal software tool for genome sequence visualization and analysis. Dev.Boil,2004,271:431-438
    [154]Zerhouni E A. Translational and clinical science-time for a new vision. New Engl J Med, 2005,13353(15):1621-3
    [155]Butte A J. Translational Bioinformatics:Coming of Age. J Am Med Inform Assoc,2008, 15(6):709-714
    [156]Butte A J. Translational bioinformatics applications in genome medicine. Genome Medicine,2009,164:
    [157]American Medical Informatics Association. AMIA Strategic Plan.2006; Available from: http://www.amia.org/inside/stratplan/
    [158]Phillips A J. Homology assessment and molecular sequence alignment. J. Biomed. Inform, 2006,39:18-33
    [159]Yang J, Wang J H, Yao Z J, et al. GenomeComp:a visualization tool for microbial genome comparison. J.Microbiol.Meth,2003,54(3):423-426
    [160]Allaby R G, Brown T A. Network Analysis Provides Insights Into Evolution of 5S rDNA Arrays in Triticum and Aegilops. Genetics,2001,157:1331-1341
    [161]Ayling S C, Brown T A. Novel methodology for construction and pruning of quasi-median networks. BMC Bioinformatics,2008,9115:
    [162]Bandelt H J, Forster P, Sykes B C, et al. Mitochondrial Portraits of Human Populations Using Median Networks. Genetics,1995,141:743-753
    [163]Herrnstadt C, Elson J L, Fahy E, et al. Reduced-Median-Network Analysis of Complete Mitochondrial DNA Coding-Region Sequences for the Major African. Asian,and European Haplogroups,Am.J.Hum.Genet,2002,70:1152-1171
    [164]Swofford D L, PAUP. Phylogenetic Analysis Using Parsimony(*and Other Methods). Version,2003,4SinauerAssociatesSunderlandMassachusetts:
    [165]Kumar S, Tamura K, Jakobsen I B, et al. MEGA2:Molecular Evolutionary Genetics Analysis software. Bioinformatics,2001,17:1244-1245
    [166]Felsenstein J. PHYLIP (Phylogeny Inference Package). Cladistics,1989.5:164-166
    [167]Thompson J D. Gibson T J, Plewniak F, et al. The ClustalX windows interface:flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res.1997,25:4876-4882
    [168]Choi J H. Jung H Y, Kim H S, et al. Phylodraw:a phylogenetic tree drawing system. Bioinformatics,2000.16(11):1056-1058
    [169]Choudhuri J V, Schleiermacher C, Kurtz S. et al. GenAlyzer:interactive visualization of sequence similarities between entire genomes. Bioinformatics,2004,20(12):1964-1965
    [170]Enright A J, Ouzounis C A. BioLayout-an automatic graph layout algorithm for similarity visualization. Bioinformatics,2001,17(9):853-854
    [171]Parson W, Dur A. EMPOP-A forensic mtDNA database. Forensic.Sci.Int.:Genetics,2007,1: 88-92
    [172]Brown M D, Hosseini S H, Torroni A, et al. mtDNA Haplogroup X:An Ancient Link between Europe/Western Asia and North America?. Am.J.Hum.Genet,1998,63: 1852-1861
    [173]Wang J R, Zhang L, Wei Y M, et al. Sequence polymorphisms and relationships of dimeric a-amylase inhibitor genes in the B genomes of Triticum and S genomes of Aegilops. Plant.Sci,2007,173:1-11
    [174]Atkinson H J, Morris J H, Ferrin T E, et al. Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies. PLoS ONE,2009, 4(2):e4345
    [175]Eloranta T, Makinen E. TimGA:a Genetic algorithm for drawing undirected graphs. Div ul.Matemat,2001,9(2):155-171
    [176]Garrido M A, Marquez A. Embedding a graph in the grid of a surface with the minimum number of bends is NP-hard. Lecture Notes in Comp Sci.,Graph Drawing,1997,1353: 124-133
    [177]Storer J A. On minimal node-cost planar embeddings. Networh,1984,14:181-212
    [178]Davidson R, Harel D. Drawing Graphs Nicely Using Simulated Annealing. ACM Trans.Graph,1996,15(4):301-331
    [179]Eades P. A Heuristic for Graph Drawing. Congr.Numer,1984,42:149-160
    [180]Huang X D, Lai W, Sajeev A S M, et al. A new algorithm for removing node overlapping in graph visualization. Inform.Sciences,2007,177:2821-2844
    [181]Tamassia R. On Embedding a Graph in the Grid with the Minimum Number of Bends. Siam.J.comput,1987,16(3):421-444
    [182]Huang X D, Lai W. Clustering graphs for visualization via node similarities. J.Visual.Lang.Comput,2006,17:225-253
    [183]Tamassia R, Battista G D, Batini C. Automatic Graph Drawing and Readability of Diagrams. IEEE. T. SYST. MAN. CY,1988,18(1):61-79
    [184]Bridgeman S S, Tamassia R. A user study in similarity measures for graph drawing. JGAA, 2002.6(3):225-254
    [185]Frishman Y, Tal A. MOVIS:A system for visualizing distributed mobile object environments. J.Visual.Lang.Comput,2008,19:303-320
    [186]Tollis E G, Battista G D, Eades P, et al. Graph Drawing:Algorithms for the Visualization of Graphs, New Jersey:Prentice-Hall, Englewood Cliffs, NJ,1999
    [187]Benson G. A new distance measure for comparing sequence profiles based on path lengths along an entropy surface. Bioinformatics,2002,18Suppl2:S44-S53
    [188]Out H H, Sayood K. A new sequence distance measure for phylogenetic tree construction. Bioinformatics,2003,19(16):2122-2130
    [189]Neher E, Sakmann B. Single channel currents recorded from membrane of denervated frog muscle fibers. Nature,1976,260:799-802
    [190]Heyward P M, Shipley M T. A device for automated control of pipette internal pressure for patch-clamp recording. Journal of Neuroscience Methods,2003,123:109-115
    [191]Huber S M, Duranton C, Lang F. Patch-Clamp Analysis of the "New Permeability Pathways" in Malaria-Infected Erythrocytes. International review of cytology,2005,246: 59-134
    [192]Kornreich B G. The patch clamp technique:principles and technical considerations. J Vet Cardiol,2007,9(1):25-37
    [193]Dale T J, Townsend C, Hollands E C, et al. Population patch clamp electrophysiology:a breakthrough technology for ion channel screening. Mol.BioSyst,2007,3:714-722
    [194]Hilbe J M. Review of SigmaPlot 9.0, American Statistician,2005,59(1):111-112
    [195]Heitler W J. Data View,2009. http://www.st-andrews.ac.uk/-wjh/dataview/.
    [196]Marrannes R, Prins E D. Computer programs to facilitate the estimation of time-dependent drug effects on ion channels, Computer Methods and Programs in Biomedicine,2004,74: 167-181
    [197]Liu Z W, Lei T, Zhang T, et al. Peroxynitrite donor impairs excitability of hippocampal CA1 neurons by inhibiting voltage-gated potassium currents. Toxicology Letters,2007, 175:8-15
    [198]Tian Y T, Liu Z W, Yang Y, et al. Effect of alpha-cypermethrin and theta-cypermethrin on delayed rectifier potassium currents in rat hippocampal neurons. NeuroToxicology,2009, 30:269-273
    [199]Tian Y T, Liu Z W, Yao Y, et al. Effects of alph and theta cypermethrin insecticide on transient outward potassium current in rat hippocampal CA3 neurons. Pesticide Biochemistry and Physiology,2008,90:1-7
    [200]Bliss T V, Collingridge G L. A synaptic model of memory:long-term potentiation in the hippocampus. Nature,1993,361(6407):31-39