用户名: 密码: 验证码:
基于量子算法的苹果及PCD相关蛋白亚细胞定位体系研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
动物、植物等真核生物的蛋白质学、蛋白质组学方面的研究在后基因组时代变得越来越重要,由于多种生物(包括果树中的苹果、葡萄在内)基因测序工程的完成,研究的重心开始向确定基因的蛋白质产物功能方向移动。果树蛋白质亚细胞定位是果树蛋白质组学、果树细胞生物学和果树分子生物信息学的重要研究内容。果树蛋白质分子生物功能的实现一方面与代谢、信号传导等果树生物过程紧密相关,另一方面果树蛋白质分子必须处于特定的亚细胞区域才能行使其生物功能。对于未知功能的果树蛋白质获取其亚细胞的位置信息对进一步研究该蛋白质的分子功能是十分必要的。通过生物实验技术手段获取一个果树蛋白质亚细胞定位信息是通常的做法,但是这种做法消耗时间较长且实验成本较高,同时由于果树蛋白质序列的快速增长,在短时间内获取规模化蛋白质亚细胞定位信息(例如:苹果全基因组蛋白质亚细胞定位信息)只能依靠生物信息技术手段来完成。另一方面,从生物数据角度来看生物信息学主要可以分为三个研究领域:大量生物序列数据的生成与管理、生物数据的使用与分析、生物数据分析平台工具的研究与开发。由于生物信息数据大量的产生以及生命科学研究的迅猛发展,无论是从科学研究还是生产实践,人们急需能满足需求的生物数据分析平台工具,在一些研究课题中生物数据分析平台工具甚至成为制约深入研究的瓶颈问题。同时,由于生物数据分析平台工具研究与开发往往需要来自生物、数学、物理、化学、信息科学等多领域的知识,这也增加生物数据分析平台工具研究与开发的复杂性。所以在果树生物数据分析平台工具方面开展深入的研究是十分必要的,并且也具有重要的实践应用价值,这也是我们研究工作的目的之一。
     本文以量子算法为主,针对PCD相关蛋白质亚细胞定位预测中的生物信息技术问题和苹果蛋白质亚细胞定位预测的实现问题进行了深入的分析与研究,结合生物物理和物理的知识,提出了具体的解决办法和实现方案。本文的主要工作和创新之处概括如下:
     1.从蛋白质氨基酸序列的组成出发,利用物理学中粒度的思想,提出了蛋白质氨基酸序列的粒度概念,结合具体氨基酸序列片段详细阐述了蛋白粒度的构成。使用蛋白粒度对氨基酸序列进行分析,进一步给出了蛋白粒度的阶、蛋白粒度的界、蛋白粒度的极限、蛋白粒度增量等概念。在深入的研究时发现:蛋白粒度沿序列不均匀分布;每条蛋白序列都有各自的蛋白粒度的极限;对于所有蛋白来讲,蛋白的各阶粒度都有共同的界。如果从蛋白预测的应用来讲,还可以得出:蛋白粒度包含了氨基酸序列的组成信息,包含了氨基酸序列的排列信息,还包含了同种氨基酸的互邻信息,同时蛋白粒度增量自然包含了氨基酸序列的长度信息。对于如何利用蛋白粒度的理论和知识来构造蛋白序列的特征向量,本文给出了一种具体的构造方法并对有关参数进行了详细的说明。根据蛋白粒度增量的信息对标准数据集的蛋白质二级结构类以及植物蛋白亚叶绿体定位进行预测,得到比前人更好的结果,这些进一步说明了蛋白粒度是反映蛋白质属性的非常有用的指标。
     2.选择ZD98、ZW225、CL317凋亡蛋白标准数据集,利用蛋白粒度对凋亡蛋白序列进行特征提取,得到38维蛋白序列特征向量,对量子神经网络算法(QNN)进行改进后,对凋亡蛋白进行亚细胞定位预测,分别获得了87.8%、83.1%、85.5%的总体预测精度,这些预测精度等于或高于原作者的预测精度,说明蛋白粒度与QNN结合的方法在凋亡蛋白亚细胞定位预测上是有效的。
     3.利用已经公布的苹果全基因组蛋白序列,对苹果全基因组蛋白序列进行粒度等特征提取,分别得到苹果全基因组蛋白二阶粒度组成、三阶粒度组成、粒度多空间融合等特征向量,然后根据量子力学中波函数的叠加思想研制了新的量子算法(QSVM),对苹果全基因组蛋白63541条氨基酸序列进行了亚细胞定位预测,获得了相应的定位信息,并形成了苹果全基因组蛋白亚细胞位点数据库1。
     4.在Chou构造的一个高质量的植物蛋白细胞多定位基准数据集的基础之上,本文提出分别处理的预测模式,对多标签蛋白和单标签蛋白分别进行预测,同时利用GO注释对蛋白序列进行特征提取,取得了较高的预测精度,为蛋白的多定位预测找到了一种新的方法。
     5.在苹果全基因组蛋白数据集的基础上,对有GO注释的苹果蛋白进行了GO注释特征提取,结合本文提出的蛋白粒度的有关理论和知识,再进行蛋白粒度特征提取,研制了新的量子算法(SQSVM),对在苹果全基因组上筛选出来的含GO注释的15297条蛋白氨基酸序列进行了亚细胞定位预测,给出了相应的定位结果,在此基础之上构建了苹果全基因组蛋白亚细胞位点数据库2。
     6.作为生物数据分析平台具体体现的亚细胞定位网站--苹果蛋白亚细胞定位系统网站和植物蛋白亚细胞多定位系统网站的建设,利用本文有关的研究结论,现已完成。即将开通,对中外免费提供服务。
Protein and proteomics research of animals, plants and other eukaryotes is becomingincreasingly important in the post-genomic era. Due to the completion of a variety ofbiological gene sequencing project including apple, grape of fruit trees, the focus of researchbegan to move direction to determine the gene function of the protein product. Proteinsubcellular localization of fruit trees is an important research content of proteomics, cellbiology and molecular bioinformatics of fruit trees. The realization of the biological functionof the fruit protein molecules is closely related to the biological processes of metabolism,signal transduction, and so on. On the other hand, the fruit protein molecules must be in aspecific subcellular region to exercise its biological function. For further study of themolecular function of this protein is essential to obtain the position information of itssubcellular for unknown function fruit protein. Protein subcellular localization information ofa fruit obtained through biological experimental techniques is the usual practice, but thispractice a longer time-consuming and high cost of experiments. For large-scale proteinsubcellular localization information in a short period of time due to the rapid growth ofprotein sequences of the fruit trees (for example: Apple protein genome-wide protein cellularlocalization information), can only rely on bio-information technology means to accomplish.On the other hand, bioinformatics research can be divided into three areas from theperspective of biological data: the generation and management of a large number of biologicalsequence data, the use and analysis of biological data, and the research and development oftool for biological data analysis platform. Since the generation of a large number ofbioinformatics data and the rapid development of life sciences, either from research orproduction practice, need tool of biological data analysis platform which meet the demand ofpeople. In some research tool of biological data analysis platform even become a bottleneckrestricting depth study of the problem. Meanwhile, the research and development of tool ofbiological data analysis platform often need knowledge which come from biology,mathematics, physics, chemistry, information science and other areas, which also increasesthe complexity of the research and development of tool for biological data analysis platform.Therefore, it is necessary to carry out in-depth research in tool for biological data analysisplatform of fruit trees. It also has important practical application value, which is one of thepurposes of our study.
     In this paper, based on quantum algorithms, the issues of biological informationtechnology of PCD protein subcellular localization prediction and the realization of appleprotein subcellular localization prediction are conducted in-depth analysis and research.Combined biophysical and physical knowledge, and specific solutions and implementationare put forward. The main work and innovation are summarized as follows:
     1. The departure from composition of protein amino acid sequence and the use of the ideaof physical granularity, the concept of granularity of amino acid sequence of protein isproposed. The amino acid sequence of protein is analyzed by protein granularity. Theconcepts of protein granularity order, protein granularity bound, protein granularity limit, andprotein granularity increment are given respectively. And we found some useful phenomenon:protein granularity is uneven distribution along the sequence of protein; each protein sequencehas its own protein granularity limit; for all protein, each protein granularity has a commonbound. In terms of the predictable application of protein, it also can be drawn: proteingranularity include the amino acid composition information, the sequence-order information,the same amino acid ‘neighbor’ information, and the sequence length information. In thispaper, a concrete construction method and related parameters are described in detail for howto use the theory and knowledge of protein to construct feature vectors of the proteinsequences. According to the information of protein granularity increment, standard data setsof protein secondary structure classes and sub-chloroplast localization of plant protein havebeen predicted. The better results than their predecessors are obtained, which further illustratethe protein granularity is useful indicator reflects the protein attribute.
     2. The ZD98, ZW225and CL317apoptotic protein standard datasets are selected. Usingprotein granularity to extract apoptotic protein sequence feature, the38-dimensional proteinsequence feature vector is obtained. The apoptotic protein subcellular localization predictionis conducted by improved quantum algorithm (QNN). The overall prediction accuracyachieved87.8%,83.1%and85.5%, respectively. The prediction accuracy is equal to or higherthan the prediction accuracy of the original author, indicating that protein granularity methodcombined with QNN for apoptotic protein subcellular localization prediction is valid.
     3. Based on the apple genome-wide protein sequences which have been published, proteingranularity feature vectors of the apple genome-wide protein sequences are obtained. Featurevectors of the apple genome-wide protein sequences such as second-order protein granularitycomposition, third-order protein granularity composition and integration of multi-granularityspace are obtained. Then according to wave function superposition of quantum mechanics, anew quantum algorithms (QSVM) is developed.The protein subcellular localization prediction of63,541amino acid sequences of the apple genome-wide proteins have been conducted. Thecorresponding results of the protein subcellular localization prediction are presented. Theapple genome-wide protein subcellular sites database1is obtained.
     4. A high-quality plant protein dataset of protein multi-localization constructed by Chou isselected. In this paper, the respectively processed prediction mode is presented and the multi-tagged protein and single-tagged protein are predicted respectively. At the same time the GOannotations are used for feature extraction of protein sequences. The predictions achievehigher prediction accuracy and find a new protein localization prediction method.
     5. Based on the apple genome-wide protein datasets, the GO annotations are used forfeature extraction of the apple protein sequences which have the GO annotations.A newquantum algorithms (SQSVM) combined with the proposed theory and knowledge of theprotein granularity, the protein subcellular localization prediction of15297amino acidsequences of the apple genome-wide proteins which have the GO annotations have beenconducted. The corresponding results of the protein subcellular localization prediction arepresented. On this basis, the apple genome-wide protein subcellular sites database2isconstructed.
     6. Based on the conclusions of this paper, as subcellular localization websites ofbiological data analysis platform-apple protein subcellular localization system and plantprotein subcellular multi-localization have been built. Websites will be launched to providefree services for Chinese and foreigners.
引文
曹慧,李春霞,王孝威,邹岩梅,束怀瑞.水分胁迫诱导八棱海棠和平邑甜茶细胞程序性死亡的研究.园艺学报,2009,36(4):469-474
    梁东.苹果山梨醇代谢相关基因的分子特性研究.西北农林科技大学博士论文,2010
    马怀宇,肖静,杨洪强.水分胁迫下湖北海棠根系线粒体及细胞死亡特性研究.园艺学报,2007,34(3):549-554
    沈嘉,张振文.人工模拟干旱条件下赤霞珠葡萄程序性死亡的细胞形态学研究.西北农林科技大学学报,2009,37(10):125-132
    谭冬梅,许雪锋,李天忠,王忆,韩振海.干旱胁迫诱导新疆野苹果细胞程序性死亡的细胞形态学研究.华北农学报,2007,22(1):50-55
    张士功,刘国栋,刘更另.植物营养与作物抗旱性.植物学通报,2001,18(1):64-69
    张世忠,付莹,周波,姜泽盛,许瑞瑞,束怀瑞.苹果功能基因组数据库的构建与使用.园艺学报,2012,39(11):2245-2250
    Alan M.J.. Programmed cell death in development and defense. Plant Physiol.,2001,125(1):94-97
    Altschul S.F., Gish W., Miller W., Myers E.W. and Lipman D.J.. Basic local alignment searchtool. J. Mol. Biol.,1990,215(3):403-410
    Anfinsen, C.B.. Principles that govern the folding of protein chains. Science,1973,181(96):223-230
    Apweiler R., Attwood T.K., Bairoch A, et al.. The InterPro database, an integrateddocumentation resource for protein families, domains and functional sites. Nucleic AcidsRes.,2001,29(1):37-40
    Ashburner M., Ball C.A., Blake J.A., et al.. Gene Ontology: tool for the unification ofbiology. Nature genetics,2000,25(1):25-29
    Bairoch A., Apweiler R., Wu C.H., et al.. The universal protein resource (UniProt). NucleicAcids Res.,2005,33(suppl1): D154-D159
    Baker, D.. A surprising simplicity to protein folding. Nature,2000,405(6782):39-42
    Barrell D., Dimmer E., Huntley R.P., Binns D., O'Donovan C. and Apweiler R.. The GOAdatabase in2009–an integrated Gene Ontology Annotation resource. Nucleic Acids Res.,2009,37(Database issue):396-403
    Bannai H., Tamada Y., Maruyama O., Nakai K. and Miyano S.. Extensive feature detection ofN-terminal protein sorting signals. Bioinformatics,2002,18(2):298-305
    Birney E., Clamp M. and Durbin R.. GeneWise and Genomewise. Genome Research,2004,14(5):988-995
    Blum T., Briesemeister S. and Kohlbacher O.. MultiLoc2: integrating phylogeny and GeneOntology terms improves subcellular protein localization prediction. BMCBioinformatics,2009,10(1):274
    Bozhkov P.V., Filonova L.H., Suarez M.F., Helmersson A., Smertenko A.P., Zhivotovsky B.and Arnold S.V.. VEIDase is a principal caspase-like activity involved in plantprogrammed cell death and essential for embryonic pattern formation. Cell Death andDifferentiation,2004,11(2):175-182
    Brady S. and Shatkay H.. EpiLoc: a (working) text-based system for predicting proteinsubcellular location. Pac. Symp. Biocomput.,2008,604:15
    Briesemeister S., Rahnenfuhrer J. and Kohlbacher O.. YLoc–an interpretable web server forpredicting subcellular localization. Nucleic Acids Res.,2010,38(suppl2): W497-W502
    Bu W.S., Feng Z.P. and Zhang Z.D.. Prediction of protein (domain) structural classes based onamino acid index. European journal of biochemistry,1999,266(3):1043-1049
    Burge C.B.and Karlin S.. Prediction of complete gene structures in human genomic DNA. J.Mol. Biol.,1997,268(1):78-94
    Cai Y.D., Liu X.J., Xu X.B. and Chou K.C.. Prediction of protein structural classes by supportvector machines. Computational Chemistry,2002,26(3):293-296
    Cai Y.D., Liu X.J., Xu X.B. and Chou K.C.. Support vector machines for prediction of proteindomain structural class. J. Theor. Biol.,2002,221(1):115-120
    Cai Y.D., Feng K.Y., Lu W.C. and Chou K.C.. Using LogitBoost classifier to predict proteinstructural classes. J. Theor. Biol.,2006,238(1):172
    Cai Y.D. and Zhou G.P.. Prediction of protein structural classes by neural network. Biochimie,2000,82(8):783785
    Chang C.C. and Lin C.J.. LIBSVM: a library for support vector machines. ACM Transactionson Intelligent Systems and Technology,2011,2(3):27
    Chen C., Tian Y.X., Zou X.Y., Cai P.X. and Mo J.Y.. Using pseudo-amino acid compositionand support vector machine to predict protein structural class. Journal of theoreticalbiology,2006,243(3):444
    Chen C., Chen L.X., Zou X.Y. and Cai P.X.. Predicting protein structural class based on multi-features fusion. J. Theor. Biol.,2008,253(2):388-392
    Chen L.Y. and Li Q.Z.. Prediction of the subcellular locatin of apoptosis proteins. J. Theor.Biol.,2007,245(4):775-783
    Cherstvy A. G., Kolomeisky A. B. and Kornyshev A. A.. Protein-DNA interactions: reachingand recognizing the targets. J. Phys. Chem.,2008,112(15):4741-4750
    Cherstvy A. G.. Positively charged residues in DNA-binding domains of structural proteinsfollow sequence-specific positions of DNA phosphate groups. J. Phys. Chem.,2009,113(13):4242-4247
    Chichkova N.V., Kim S.H., Titova E.S., Kalkum M., Morozov V.S., Rubtsov Y.P., KalininaN.O., Taliansky M.E. and Vartapetian A.B.. A plant caspase-like protease activated duringthe hypersensitive response. Plant Cell,2004,16(1):157-171
    Chou K.C.. A key driving force in determination of protein structural classes. Biochem.Biophys. Res. Commun.,1999,264(1):216-224
    Chou K. C.. Pseudo amino acid composition and its applications in bioinformatics,proteomics and system biology. Current Proteomics,2009,6(4):262-274
    Chou K. C.. Some remarks on protein attribute prediction and pseudo amino acidcomposition. J. Theor. Biol.,2011,273(1):236-247
    Chou K.C. and Cai Y.D. Prediction of protein subcellular locations by GO-FunD-PseAApredicor. Biochem. Biophys. Res. Commun.,2004,320(4):1236-1239
    Chou K.C. and Cai, Y.D.. Using GO-PseAA predictor to predict enzyme sub-class. Biochem.Biophys. Res. Commun.,2004,325(2):506-509
    Chou K.C. and Magglora G.M.. Domain structural class prediction. Protein engineering,1998,11(7):523-538
    Chou K.C. and Shen H.B.. Hum-PLoc: a novel ensemble classifier for predicting humanprotein subcellular localization. Biochem. Biophys Res. Commun.,2006,347(1):150-157
    Chou, K.C. and Shen, H.B.. Large-scale plant protein subcellular location prediction. Journalof Cellular Biochemistry,2007,100(3):665-678
    Chou K.C. and Shen H.B.. Cell-PLoc2.0: An improved package of web-servers for predictingsubcellular localization of proteins in various organisms. Nat. Sci.,2010,2(10):1090-1103
    Chou K.C. and Shen H.B.. Plant-mPLoc: A Top-Down Strategy to Augment the Power forPredicting Plant Protein Subcellular Localization. PLoS ONE,2010,5(6):e11335
    Chou K.C. and Zhang C.T.. Review: Prediction of protein structural classes. Crit. Rev.Biochem. Mol. Biol.,1995,30(4):275-349
    Claros M.G. and Vincens P.. Computational method to predict mitochondrially importedproteins and their targeting sequences. Eur. J. Biochem.,1996,241(3):779-786
    Coffeen W.C. and Wolpert T.J.. Purification and characterization of serine proteases thatexhibit caspase-like activity and are associated with programmed cell death inAvenasativa. The Plant Cell,2004,16(4):857-873
    Conesa A., G tz S., García-Gómez J.M., Terol J., Talón M. and Robles M.. Blast2GO: auniversal tool for annotation, visualization and analysis in functional genomics research.Bioinformatics,2005,21(18):3674-3676
    Curwen V., Eyras E., Andrews T.D. et al.. The Ensembl automatic gene annotation system.Genome Res.,2004,14(5):942-950
    Dai W., Yanng Q., Xue G. and Yu Y.. Boosting for Transfer Leaming. Proceedings of the24thInternational Conference on Machine Learning,2007,193-200
    Dai W., Chen Y., Xue Q., Yang Q. and Yu Y.. Translated Learning: Transfer Learning acrossDifferent Feature Spaces. Proceedings of the Advances in Neural Information ProcessingSystems (NIPS),2008,353-360
    Dangl J.L., Dietrich R.A. and Richberg M.H.. Death Don’t Have No Mercy: Cell DeathPrograms in Plant-Microbe Interactions. Plant Cell,1996,8(10):1793-1807
    Danon A., Rotari V.I., Gordon A., Mailhac N. and Gallois P. Ultraviolet-C overexposureinduces programmed cell death in Arabidopsis, which is mediated by caspase-likeactivities and which can be suppressed by caspase inhibitors, p35and Defender againstApoptotic Death. J. Biol. Chem.,2004,279(1):779-787
    Delseny M., Han B. and Hsing Y.I.. High throughput DNA sequencing: The new sequencingrevolution. Plant Sci.,2010,179(5):407-422
    Ding H.Q. and Dubchak I.. Multi-class protein fold recognition using support vector machinesand neural networks. Bioinformatics,2001,17(4):349-358
    Ding Y.S. and Zhang T.L.. Using Chou’s pseudo amino acid composition to predictsubcellular localization of apoptosis proteins: An approach with immune geneticalgorithm-based ensemble classifier. Pattern Recognition Letters,2008,29(13):1887-1892
    Du Q.S., Jiang Z.Q., He W.Z., Li D.P. and Chou, K.C.. Amino acid principal componentanalysis (AAPCA) and its applications in protein structural class prediction. J. Biomol.Struct. Dyn.,2006,23(6):635-640
    Du P.F., Cao S.J. and Li Y.D. SubChlo: Predicting protein subchloroplast locations withpseudo-amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN)algorithm. J. Theor. Biol.,2009,261(2):330-335
    Elmore S.. Apoptosis: A Review of Programmed Cell Death. Toxicologic Pathology,2007,35(4):495-516
    Emanuelsson O., Nielsen H. and von Heijne G.. ChloroP, a neural network-based method forpredicting chloroplast transit peptides and their cleavage sites. Protein Sci.,1999,8(5):978-984
    Emanuelsson O., Nielsen H., Brunak S. and von Heijne G.. Predicting subcellular localizationof proteins based on their N-terminal amino acid sequence. J. Mol. Biol.,2000,300(4):1005-1016
    Erie E., Marie D.J. and Terran L.. Modeling transfer relationships between learning tasks forimproved inductive transfer. Lectures Notes in Artificial Intelligence,2008,5211,317-332
    Ferro M., Salvi D., Brugiere S., Miras S., Kowalski S., Louwagie M., Garin J., Joyard J. andRolland N.. Proteomics of the chloroplast envelope membranes from Arabidopsisthaliana. Mol. Cell Proteomics,2003,2(5):325-345
    Fontana P., Cestaro A., Velasco R., Formentin E. and Toppo S.. Rapid annotation ofanonymous sequences from genome projects using semantic similarities and a weightingscheme in gene ontology. PLoS One,2009,4(2): e4619
    Gao Q.B., Zhao H.Y., Ye X.F. and He J.. Prediction of pattern recognition receptor familyusing pseudo-amino acid composition. Biochem. Biophys. Res. Commun.,2012,417(1):73-77
    Groth D., Lehrach H. and Hennig S.. GOblet: a platform for Gene Ontology annotation ofanonymous sequence data. Nucleic Acids Res.,2004,32(suppl2): W313-W317
    Guda C., Guda P., Fahy E. and Subramaniam S.. MITOPRED: a web server for the predictionof mitochondrial proteins. Nucleic Acids Res.,2004,32(suppl2): W372-W374
    Guigoó R., Knudsen S., Drake N. and Smith T.. Prediction of gene structure. J. Mol. Biol.,1992,226(1):141-157
    Harris M. A., Clark J., Ireland A., et al.. The Gene Ontology (GO) database and informaticsresource. Nucleic Acids Res.,2004,32(Database issue): D258
    Hawkins J. and Boden M.. Detecting and sorting targeting peptides with neural networks andsupport vector machines. J. Bioinform. Comput. Biol.,2006,4(01):1-18
    Hawkins J., Davis L. and Boden M.. Predicting nuclear localization. J. Proteome Res.,2007,6(4):1402-1409
    Hayat M. and Khan A.. Predicting membrane protein types by fusing composite proteinsequence features into pseudo amino acid composition. J. Theor. Biol.,2011,271(1):10-17
    He J.J., Gu H. and Liu W. Q.. Imbalanced Multi-Modal Multi-Label Learning for SubcellularLocalization Prediction of Human Proteins with Both Single and Multiple Sites. PLoSONE,2012,7(6): e37155
    Hennig S., Groth D. and Lehrach H.. Automated Gene Ontology annotation for anonymoussequence data. Nucleic Acids Res.,2003,31(13):3712-3715
    Horton P., Park K. J., Obayashi T., Fujita N., Harada H., Adams-Collier C. J. and Nakai K..WoLF PSORT: protein localization predictor. Nucleic Acids Res.,2007,35(suppl2):W585-W587
    Hsu F., Pringle T.H., Kuhn R.M. et al.. The UCSC Proteome Browser.2005,33(suppl1):D454-D458
    Hu J. and Yan X. H.. BS-KNN: An Effective Algorithm for Predicting Protein SubchloroplastLocalization. Evolutionary Bioinformatics,2012,8:79-87
    Hua S. and Sun Z.. Support vector machine approach for protein subcellular localizationprediction. Bioinformatics,2001,17(8):721-728
    Huang W.L., Tung C.W., Ho S.W., Hwang S.F. and Ho S.Y.. ProLoc-GO: Utilizinginformative Gene Ontology terms for sequence-based prediction of protein subcellularlocalization. BMC Bioinformatics,2008,9(1):80
    Huang Y. and Li Y.D.. Prediction of protein subcellular locations using fuzzy k-NN method.Bioinformatics,2004,20(1):21-28
    Hubbard T.J., Aken B.L., Beal K.et al.. Ensembl2007. Nucleic Acids Res.,2007,35(suppl1):D610-D617
    Huh W.K., Falvo J.V., Gerke L.C., Carroll A.S., Howson R.W., Weissman J.S. and O’Shea1E.K.. Global analysis of protein localization in budding yeast. Nature,2003,425(6959):686-691
    International Peach Genome Initiative (IPGI).2010. http://www.rosaceae.org/peach/genome
    Jahandideh S., Abdolmaleki P., Jahandideh M. and Asadabadi E.B.. Novel two-stage hybridneural discriminant model for predicting proteins structural classes. BiophysicalChemistry,2007,128(1):87-93
    Jaillon O., Aury J.M., Noel B., Policriti A., Clepet C., Casagrande A., Choisne N., AubourgS., Vitulo N., Jubin C., Vezzi A., Legeai F., Hugueney P., Dasilva C., Horner D., Mica E.,Jublot D., Poulain J., Bruyère C., Billault A., Segurens B., Gouyvenoux M., Ugarte E.,Cattonaro F., Anthouard V., Vico V., Del Fabbro C., Alaux M., Di Gaspero G., Dumas V.,Felice N., Paillard S., Juman I., Moroldo M., Scalabrin S., Canaguier A., Le Clainche I.,Malacrida G., Durand E., Pesole G., Laucou V., Chatelet P., Merdinoglu D., DelledonneM., Pezzotti M., Lecharny A., Scarpelli C., Artiguenave F., Pè M.E., Valle G., MorganteM., Caboche M., Adam-Blondon A.F., Weissenbach J., Quétier F., Wincker P., French-Italian Public Consortium for Grapevine Genome Characterization.. The grapevinegenome sequence suggests ancestral hexaploidizationin major angiosperm phyla. Nature,2007,449(7161):463-467
    Jiang X.Y., Wei R., Zhang T.L. and Gu Q.. Using the concept of Chou’s pseudo amino acidcomposition to predict apoptosis proteins subcellular location: an approach byapproximate entropy. Protein Peptide Lett.,2008,15(4):392-396
    Jin L.X., Fang W.W. and Tang H.W.. Prediction protein structural classes by a new measure ofinformation discrepancy. Computatonal biology and chemistry,2003,27(3):373-380
    Jones A. M.. Programmed cell death in development and defense. Plant Physiology,2001,1125(1):94-97
    Jung S., Staton M., Lee T., Blenda A., Svancara R., Abbott A. and Main D.. GDR(GenomeDatabase for Rosaceae): Integrated web-database for Rosaceae genomics and geneticsdata. Nucleic Acids Res.,2008,36(suppl1): D1034-D1040
    Kanaka D.K., Kurgan L. and Dick S.. Classifier ensembles for protein structural classprediction with varying homology. Biochemical and Biophysical ResearchCommunications,2006,348(3):981-988
    Kandaswamy K.K., Pugalenthi G., Moller S., Hartmann E., Kalies K.U., Suganthan P.N. andMartinetz T.. Prediction of apoptosis protein locations with genetic algorithms and supportvector machines through a new mode of pseudo amino acid composition. Protein Pept.Lett.,2010,17(12):1473-1479
    Khan S., Situ G., Decher K. and Schmidt C. J.. GoFigure: automated Gene OntologyTMannotation. Bioinformatics,2003,19(18):2484-2485
    Kircher M. and Kelso J.. High-throughput DNA sequencing--concepts and limitations.BioEssays,2010,32(6):524-536
    Korf I., Flicek P., Duan D. and Brent M.R.. Integrating genomic homology into gene structureprediction. Bioinformatics,2001,17(suppl1), S140-S148
    Krogh A.. Two methods for improving performance of an HMM and their application for genefinding. Center for Biological Sequence Analysis. Phone,1997,45:4525
    Kuhn R.M., Karolchik D., Zweig A.S. et al.. The UCSC genome browser database: update2007. Nucleic Acids Res.,2007,35(suppl1): D668-D673
    Kulp D., Haussler D., Reese M. and Eeckman F.. A generalized hidden Markov model for therecognition of human genes in DNA. Proc. Int. Conf. on Intelligent Systems for MolecularBiology, St. Louis.1996:134-142
    Kumar A., Agarwa S., Heyman J.A. et al.. Subcellular localization of the yeast proteome.Genes&Development,2002,16(6):707-719
    Kurgan L. and Homaeian L.. Prediction of structural classes for protein sequence and domain-impact of prediction algorithms, sequence representation and homology, and testprocedures on accuracy. Pattern recognition,2006,39(12):2323-2343
    Lacomme C. and Santa Cruz S.. Bax-induced cell death in tobacco is similar to thehypersensitive response. Proc. Natl. Acad. Sci. USA,1999,96(14):7956-7961
    Lamesch P., Berardini T.Z., Li D., et al.. The Arabidopsis Information Resource (TAIR):improved gene annotation and new tools. Nucleic Acids Res.,2012,40(D1), D1202-D1210
    Lawlor D.W.. Limitation to photosynthesis in water stress leaves stomata vs metabolismandthe role of ATP. Ann. Bot.,2002,89(7):1-15
    Lei Z. and Dai Y.. An SVM-based system for predicting protein subnuclear localizations.BMC Bioinformatics,6(1):291
    Levitt M. and Chothia C.. Structural patterns in globular proteins. Nature,1976,261(5561):552-558
    Li W. and Godzik A.. Cd-hit: a fast program for clustering and comparing large sets of proteinor nucleotide sequences. Bioinformatics,2006,22(13):1658-1659
    Lin H. and Li Q. Z.. Using pseudo amino acid composition to predict protein structural class:approached by incorporating400dipeptide components. J. Comput. Chem.,2007,28(9):1463-1466
    Lin H., Wang H., Ding H., Chen Y.L. and Li Q.Z.. Prediction of subcellular localization ofapoptosis protein using Chou’s pseudo amino acid composition. Acta. Biotheor.,2009,57(3):321-330
    Lincoln J.E., Richael C., Overduin B., Smith K., Bostock R. and Gilchrist D.G.. Expression ofthe antiapoptotic baculovirus p35gene in tomato blocks programmed cell death andprovides broad-spectrum resistance to disease. Proc. Natl. Acad. Sci. USA,2002,99(23):15217-15221
    Majoros W.H., Pertea M., Antonescu C. and Salzberg S.L.. GlimmerM, Exonomy and Unveil:three ab initio eukaryotic genefinders. Nucleic Acids Res.,2003,31(13):3601-3604
    Mardis E. R.. Next-generation DNA sequencing methods. Annu. Rev. Genom. Human Genet.,2008,9:387-402
    Margulies M., Egholm M., Altman W.E., et al.. Genome sequencing in microfabricated high-density picolitre reactors. Nature,2005,437(7057):376-380
    Masso M. and Vaisman II.. Knowledge-based computational mutagenesis for predicting thedisease potential of human nonsynonymous single nucleotide polymorphisms. J. Theor.Biol.,2010,266(4):560-568
    Matsuda S., Vert J.P., Saigo H., Ueda N., Toh H. and Akutsu T.. A novel representation ofprotein sequences for prediction of subcellular location using support vector machines.Protein Sci.,2005,14(11):2804-2813
    Matsumura H., Nirasawa S., Kiba A., Urasaki N., Saitoh H., Ito M., Kawai-Yamada M.,Uchimiya H. and Terauchi R.. Overexpression of Bax inhibitor suppresses the fungalelicitor-induced cell death in rice (Oryza sativa L) cells. Plant J.,2003,33(3):425-434
    McClelland J.L. and Rumelhart D. E. PDP Research Group.. Parallel distributed processing.Explorations in the microstructure of cognition,1986,2
    Mei S.Y., Fei W. and Zhou S. G.. Gene Ontology based transfer learning for proteinsubcellular localization. BMC Bioinformatics,2011,12(1):44
    Mei S.Y.. Multi-kernel transfer learning based on Chou’s PseAAC formulation for proteinsubmitochondria localization. J. Theor. Biol.,2012,293:121-130
    Mitschke J., Fuss J., Blum T., Hoglund A., Reski R., Kohlbacher O. and Rensing S.A..Prediction of dual protein targeting to plant organelles. New Phytol.,2009,183(1):224-235
    Mulder N. J., Apweiler R., Attwood T. K., et al.. New developments in the InterPro database.Nucleic Acids Res.,2007,35(suppl1): D224-D228
    Munroe D. J. and Harris T. J. R.. Third-generation sequencing fireworks at Marco Island. Nat.Biotechnol.,2010,28(5):426-428
    Murzin A.G., Brenner S.E., Hubbard T. and Chothia C.. SCOP: A structural classification ofprotein database for the investigation of sequence and structures. J. Mol. Biol.,1995,247(4):536-540
    Nair R. and Rost B.. Mimicking cellular sorting improves prediction of subcellularlocalization. J. Mol. Biol.,2005,348(1):85-100
    Nakashima H., Nishikawa, K. and Ooi T.. The folding type of a protein is relevant to theamino acid composition. J. Biochem.,1986,99(1):153-162
    Neuberger G., Maurer-Stroh S., Eisenhaber B., Hartig A. and Eisenhaber F.. Prediction ofperoxisomal targeting signal1containing proteins from amino acid sequence. J. Mol.Biol.,2003,328(3):581-592
    Niu B., Jin Y.H., Feng K.Y., Lu W.C., Cai Y.D. and Li G.Z.. Using AdaBoost for the predictionof subcellular location of prokaryotic and eukaryotic proteins. Mol. Divers.,2008,12(1):41-45
    Pan S.J. and Yang Q.. A survey on transfer learning. IEEE Transactions on Knowledge andData Engineering,2010,22(10):1345-1359
    Pearson W. R.. Rapid and sensitive sequence comparison with FASTP and FASTA. MethodsEnzymol.,1990,183:63-98
    Pearson W. R. and Lipman D. J.. Improved tools for biological sequence comparison. Proc.Natl. Acad. Sci. USA,1988,85(8):2444-2448
    Peltier J.B., Emanuelsson O., Kalume D.E., Yterberg J., Friso G., Rudella A., Liberles D.A.,Roepstorff P., Heijne G. and Wijk K.J.. Central functions of the lumenal and peripheralthylakoid proteome of Arabidopsis determined by experimentation and genome-wideprediction. Plant Cell,2002,14(1):211-236
    Petsalaki E.I., Bagos P.G., Litou Z.I. and Hamodrakas S.J.. PredSL: a tool for the N-terminalsequence-based prediction of protein subcellular localization. Genomics ProteomicsBioinformatics,2006,4(1):48-55
    Pierleoni A., Martelli P.L., Fariselli P. and Casadio R.. BaCelLo: a balanced subcellularlocalization predictor. Bioinformatics,2006,22(14): e408-e416
    Platt J.C., Cristianini N., and Shawe-Taylor J.. Large margin dags for multi-classclassification. In Advances in Neural Information Processing Systems,2000,12(3):547-553
    Proost S., Van B.M., Sterck L., Billiau K., Van P.T., Vande P.Y. and Vandepoele K.. PLAZA:Acomparative genomics resource to study gene and genome evolution in plants. Plant Cell,2009,21(12):3718-3731
    Purushothaman G. and Karayiannis N. B.. Quantum neural networks (QNNs): inherentlyfuzzy feedforward neural networks. Neural Networks, IEEE Transactions on,1997,8(3):679-693
    Reinhardt A. and Hubbard T.. Using neural networks for prediction of the subcellular locationof proteins. Nucleic Acids Res.,1998,26(9):2230-2236
    Rhee S.Y., Beavis W., Berardini T.Z., Chen G., Dixon D., Doyle A., Garcia-Hernandez M.,Huala E., Lander G., Montoya M., Miller N., Mueller L.A., Mundodi S., Reiser L.,Tacklind J., Weems D.C., Wu Y., Xu I., Yoo D., Yoon J. and Zhang P.. The ArabidopsisInformation Resource (TAIR): A model organism database providing a centralized,curated gateway to Arabidopsis biology, research materials and community. Nucleic AcidsRes.,2003,31(1):224-228
    Rojo E., Martin R., Carter C., Zouhar J., Pan S., Plotnikova J., Jin H., Paneque M., Sanchez-Serrano J.J., Baker B., Ausubel F.M. and Raikhel N.V.. VPEγ exhibits a caspase-likeactivity that contributes to defense against pathogens. Current Biology,2004,14(1):897-1906
    Salamov A.A. and Solovyev V.V.. Ab initio gene finding in Drosophila genomic DNA.Genome Research,2000,10(4):516-522
    Sanchez P., de Torres Zabala M. and Grant M.. AtBI-1, a plant homologue of Bax inhibitor-1,suppresses Bax-induced cell death in yeast and is rapidly upregulated during woundingand pathogen challenge. The Plant Journal,2000,21(4):393-399
    Schein A.I., Kissinger J.C. and Ungar L.H.. Chloroplast transit peptide prediction: a peekinside the black box. Nucleic Acids Res.,2001,29(16): e82-e82
    Small I., Peeters N., Legeai F. and Lurin C.. Predotar: A tool for rapidly screening proteomesfor N-terminal targeting sequences. Proteomics,2004,4(6):1581-1590
    Snyder E.E. and Stormo G.D.. Identification of coding regions in genomic DNA sequences: anapplication of dynamic programming and neural networks. Nucleic Acids Res.,1993,21(3):607-613
    Snyder E.E. and Stormo, G.D.. Identification of protein coding regions in genomic DNA. J.Mol. Biol.,1995,248(1):1-18
    Solovyev V.V., Salamov A.A. and Lawrence C.B.. Predicting internal exons byoligonucleotide composition and discriminant analysis of spliceable open reading frames.Nucl. Acid. Res.,1994,22(24):5156-5163
    Song J.. Prediction of homo-oligomeric proteins based on nearest neighbour. Computers inBiology and Medicine,2007,37(12):1759-1764
    Tamura T. and Akutsu T.. Subcellular location prediction of proteins using support vectormachines with alignment of block sequences utilizing amino acid composition. BMCBioinformatics,2007,8(1):466
    Tanz S.K., Castleden I., Hooper C.M., Vacher M., Small I. and Millar H. A.. SUBA3: adatabase for integrating experimentation and prediction to define the SUBcellular locationof proteins in Arabidopsis. Nucleic Acids Res.,2013,41(D1): D1185-D1191
    The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plantArabidopsis thaliana. Nature,2000,408(6814):796-815
    Tung C.W., Liaw C., Ho S.J. and Ho S.Y.. Prediction of protein subchloroplast locations usingRandom Forests. World Academy of Sci., Eng. and Tech.,2010,903-907
    Tung T.Q. and Lee D.. A method to improve protein subcellular localization prediction byintegrating various biological data sources. BMC Bioinformatics,2009,10(Suppl1): S43
    Uberbacher E.C. and Mural R.J.. Locating protein-coding regions in human DNA sequencesby a multiple sensor-neural network approach. Proc. Natl. Acad. Sci. USA,1991,88(24):11261-11265
    Vapnik V.. Statistical Learning Theory. New York: Wiley-Interscience,1998
    Velasco R., Zharkikh A., Affourtit J., Dhingra A., Cestaro A., Kalyanaraman A., Fontana P.,Bhatnagar S.K., Troggio M., Pruss D., Salvi S., Pindo M., Baldi P., Castelletti S.,Cavaiuolo M., Coppola G., Costa F., Cova V., Dal Ri A., Goremykin V., Komjanc M.,Longhi S., Magnago P., Malacarne G., Malnoy M., Micheletti D., Moretto M., PerazzolliM., Si-Ammour A., Vezzulli S., Zini E., Eldredge G., Fitzgerald L.M., Gutin N.,Lanchbury J., Macalma T., Mitchell J.T., Reid J., Wardell B., Kodira C., Chen Z.T.,Desany B., Niazi F., Palmer M., Koepke T., Jiwan D., Schaeffer S., Krishnan V., Wu C.J.,Chu V.T., King S.T., Vick J., Tao Q.Z., Mraz A., Stormo A., Stormo K., Bogden R., EderleD., Stella A., Vecchietti A., Kater M.M., Masiero S., Lasserre P., Lespinasse Y., AllanA.C., Bus V., Chagné D., Crowhurst R.N., Gleave A.P., Lavezzo E., Fawcett J.A., ProostS., Rouzé P., Sterck L., Toppo S., Lazzari B., Hellens R.P., Durel C.E., Gutin A.,Bumgarner R.E., Gardiner S.E., Skolnick M., Egholm M., Van de Peer Y., Salamini F. andViola R.. The genome of the domesticated apple (Malus×domestica Borkh.). Nat. Genet.,2010,42(10):833-839.
    Wan S.B., Mak M.W. and Kung S.Y.. Protein subcellular localization prediction based onprofile alignment and Gene Ontology. Machine Learning for Signal Processing (MLSP),2011IEEE International Workshop on. IEEE,2011:1-6
    Wan S.B., Mak M.W. and Kung S.Y.. mGOASVM: Multi-label protein subcellularlocalization based on gene ontology and support vector machines. BMC Bioinformatics,2012,13(1):290
    Wang Y.C., Wang X.B., Yang Z.X. and Deng N.Y.. Prediction of enzyme subfamily class viapseudo amino acid composition by incorporating the conjoint triad feature. Protein Pept.Lett.,2010,17(11):1441-1449
    Weston J. and Watkins C.. Support vector machines for multi-class pattern recognition.Proceedings of the seventh European symposium on artificial neural networks,1999,4(6):219-224
    Wheeler D.L., Barrett T., Benson D.A. et al.. Database resources of the National Center forBiotechnology Information. Nucleic Acids Res.,2007,35(suppl1): D5-D12
    Wootton J. C. and Federhen S.. Statistics of local complexity in amino acid sequences andsequence databases. Computers&chemistry,1993,17(2):149-163
    Xi L.L., Li S.Y., Liu H.X., Li J.H., Lei B.L. and Yao X.J.. Global and local prediction ofprotein folding rates based on sequence autocorrelation information. J. Theor. Biol.,2010,264(4):1159-1168
    Xiao X., Shao S.H., Huang Z.D. and Chou K.C.. Using pseudo amino acid composition topredict protein structural classes: approached with complexity measure factor. J. Comput.Chem.,2006,27(4):478-482
    Xiao X., Wang P. and Chou, K.C.. GPCR-2L: Predicting G protein-coupled receptors andtheir types by hybridizing two different modes of pseudo amino acid compositions.Molecular Biosystems,2011,7(3):911-919
    Xiao X., Wu Z.C. and Chou K.C.. iLoc-Virus: A multi-label learning classifier for identifyingthe subcellular localization of virus proteins with both single and multiple sites. J. Theor.Biol.,2011,284(1):42-51
    Yanng Q., Chen Y., Xue Q., Dai W. and Yu Y.. Heterogeneous transfer learning for imageclustering via the social web. Proceedings of the Joint Conference of the47th AnnualMeeting of the ACL and the4th International Joint Conference on Natural LanguageProcessing of the AFNLP: Volume1-Volume1. Association for Computational Linguistics,2009:1-9
    Zdobnov E.M. and Apweiler R.. InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics,2001,17(9):847-848
    Zhang M.Q.. Identification of protein coding regions in the human genome by quadraticdiscriminant analysis. Proc. Natl. Acad. Sci. USA,1997,94(2):565-568
    Zhang T.L. and Ding Y.S.. Using pseudo amino acid composition and binary-tree supportvector machines to predict protein structural classes. Amino Acids,2007,33(4):623-629
    Zhang T.L., Ding Y.S. and Chou K.C.. Prediction protein structural classes with pseudo-aminoacid composition: approximate entropy and hydrophobicity pattern. Journal of theoreticalbiology.2008,250(1):186
    Zhang Z.H., Wang Z.H., Zhang Z.R. and Wang Y.X.. A novel method for apoptosis proteinsubcellular localization prediction combining encoding based on grouped weight andsupport vector machine. FEBS Letters,2006,580(26):6169-6174
    Zhou G.P. and Doctor K.. Subcellular location prediction of apoptosis proteins. Proteins:Struct. Funct. Genet.,2003,50(1):44-48

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700