蛋白质序列数据的分类预测研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
序列数据是数据挖掘问题中一类特殊数据,广泛存在于社会生活各个领域,如何从这些复杂海量序列数据库中挖掘蕴含其中的有用信息是数据挖掘的新研究课题,具有重要理论意义和实际价值。本论文以蛋白质序列数据为例进行序列数据分类研究,亦为生物信息学中课题。
     论文围绕蛋白质序列数据的分类预测这一主题,在综合众多序列数据分析算法的基础上,将序列特征分析归纳为两类主要方法,基于特征提取的方法和基于相似性模型的方法,由此将研究路线分为两条。一方面基于特征提取方法,分别针对膜蛋白及信号肽序列,根据序列各自特性提取相应特征进行分类。另一方面,基于相似性模型,提出基于全序列比对的相似度以预测信号肽,进而嵌入核空间提高预测稳定性,达到提取序列明确属性向量的目的,至此实现两条技术路线的统一。论文还进一步通过线性降维实现冗余及不相关维数约简及可视化。总的来说,本论文集中于蛋白质序列的分类预测研究,着重于以下几个创新点:
     (1)针对不同序列有区别有目的地提取序列特征生成属性向量,从而训练分类器并提供对新样本的预测。其中对于序列长度相对较长的膜蛋白序列,首先进行数值化编码生成时间序列,将其作为各样本以不同时间间隔抽样的离散信号,从而基于数字信号处理理论进行序列分析,避免了以往算法忽略序列次序信息的缺点。分析发现借助信号低频的幅度及相位信息,可以有效提取序列特征并可减少噪声带来的影响。实验结果表明这种基于频域的特征提取方法可以有效提取膜蛋白序列特征,以利于分类预测。
     (2)在对序列长度相对小的信号肽序列预测时,采用滑动窗截断的方式将不等长序列转换为固定长度的序列片断,经过互信息分析发现其内部各位点间存在复杂的耦合作用,针对已有算法盲目定义这种耦合作用的情况,提出基于多决策树方式提取规则,并借助其识别信号肽及其断裂点。经实验证明这种处理方式在信号肽预测问题中可有效提高序列片断及信号肽剪切点的预测率。
     (3)以相似性作为分类预测的基石,定义基于全序列比对的相似度预测信号肽,避免了采用滑动窗所带来的不平衡样本等诸多问题。通过分析此相似度的数学特性,详细证明其为一种度量。另外将其应用于信号肽预测中,在预测率及稳健性方面获得了良好效果,结果表明此相似度确实可以表征样本之间的相似关系,并为预测分类提供了良好的信息表示方式。提出的算法已经通过internet在网上提供相应使用服务,为扩大算法的使用范围提供了快速有效的途径。
     (4)探讨非正定核的处理方法,在分析基于全序列比对的相似度与欧氏距离偏差基础上,提出基于全序列比对的非正定核算法,并应用于信号肽分类预测中;另一方面,在保证预测率的前提下,实现提取序列样本特征向量的目的,将问题重新化归于基于特征的模式识别问题。实验结果表明算法确实可以有效提取蛋白质序列特征,方便信号肽预测工作。
     (5)针对线性降维中的“小样本问题”,充分利用类内离散度矩阵的空空间的特性,提出新的降维方法,且有效处理了小特征值导致的不稳定问题。信号肽预测工作中,在已经得到高维属性向量前提下,约简大量冗余和不相关属性,提高处理效率并实现了可视化的要求,取得了理想的效果。
Sequential data is a kind of special data in data mining, and widely exists in diverse fields. How to extract or mine knowledge from large amounts of sequential data is a new research topic, and has theoretical and practical importance. In this paper, we study the classification and prediction for sequential data, especially for biological sequence.
     In this dissertation, we focus on the topic of classification and prediction for biological sequence. With many analysis algorithms for sequential data, we summarize them as two kinds of methods, algorithms based on feature extraction or those on similarity. On one hand, based on feature extraction method, extract the different features for different kinds of sequence, membrane proteins and signal peptides. On the other hand, we propose similarity based on global alignment for prediction, and then embed the similarity into kernel space to improve the stability. With these methods, the feature vector can be got and the method based on similarity is united with the feature extraction method. Feature reduction is also studied and sequential data can be visualized. The innovative ideas in this dissertation are as follows:
     (1) Based on the traditional pattern recognition algorithm, extract various features according to different sequence and then train classifier for predicting new samples. For membrane proteins, first encode them as discrete-time series sampled by different sampling interval, and then analysis the series by digital signal processing theory. This method avoids the loss of sequence-order information as other algorithms did. In the frequency domain, we extract low-frequency feature, magnitude as well as phase, to represent the main series information and decrease the noisy. The experiment illustrates the performance of feature extraction by low-frequency spectrum for predicting membrane protein types.
     (2) For the short sequences, such as signal peptides, sliding-window is adopted to transform diverse-length sequences to length-fixed segments and complex coupling affect is found by mutual information, while many former algorithms just blindly simplify that information. Then, the multi-decision tree is proposed to extract statistical rulers for predicting signal peptides and their cleavage sites. Promising result is got in the experiment.
     (3) Taking similarity as foundation for classification, we defined the similarity model based on global alignment, and avoid the shortcomings of sliding-window methods, such as imbalance problem. By analyzing the mathematical characteristic, the similarity is proved to be a kind of measurement. When applied to predicting signal peptides, the similarity gets the stable high prediction rate. The result demonstrates the defined similarity can well represent the relationship between sequences and provide a suitable form for them. On-line bioinformatics web server is also available for promoting the development of biology science.
     (4) Study on the indefinite kernel. Fristly, analyzing the different between the similarity based on global alignment and traditional Euclidean distance. We proposed indefinite kernel algorithm and apply it to predict the signal peptides. On the other hand, the feature vector can be got with high prediction rate and the method based on similarity is united with the feature extraction method. Experiment proves the performance.
     (5) Study on reducing the data dimension and extract the useful features for classification. Making full use of the null space of within-class scatter matrix, we propose Separated Space based Linear Discriminant Analysis(SSLDA) and avoid the unstability of traditional LDA. For signal peptides, with the high-dimension got by indefinite kernel based on global alignment similarity, we apply SSLDA and get reduced dimension. And sequential data can also be visualized
引文
[Alb94] Alberts B., Bray D., Lewis J., Raff M., Roberts K., Watson J. D. Molecular Biology of the Cell. chap1. (3rd edit). Garland Publishing, New York & London,1994.
    [Alt90] Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. Journal of Molecular Biology, 215:403-410, 1990.
    [Alt97] Altschul S. F., Madden T. L., Schaffer A. A., Zhang J., Zhang Z., Miller W., Lipman D. J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25:3389-3402, 1997.
    [Apw01] Apweiler R., Attwood T. K., Zdobnov E. M. The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res, 29: 37-40, 2001.
    [Bah02] Bahlmann C., Haasdonk B., Burkhardt H. On-Line Handwriting Recognition with Support Vector Machines—A Kernel Approach. Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition(IWFHR’02) 2002.
    [Bal01] Baldi P., Brunk S. Bioinformatics: The Machine Learning Approach. MIT Press, London, 2001.
    [Bet98] Bettini C., Wang X., Jajodia S. Mining Temporal Relationships with Multiple Granularities in Time Sequences. IEEE Data Engineering Bulletin, 21(1):32-38, 1998.
    [Bha04] Bhasin M., Raghava G. P. S. ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST. Nucleic Acids Res., 32: W414-W419, 2004.
    [Bha05] Bhasin M., Garg A., Raghava G. P. S. PSLpred: prediction of subcellular localization of bacterial protein. Bioinformatics, 21: 2522-2524, 2005.
    [Bia00] 边肇祺 张学工等. 模式识别. 清华大学出版社, 2000.
    [Bos92] Boser B. E., Guyon L. M., Vapnik V. N. A training algorithm for optimal margin classifiers. In D. Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, Pittsburgh, PA: ACM Press, 144-152, 1992.
    [Box76] Box G. E. P., Jenkins G. M. Time series analysis: Forecasting and Control. Oakland, CA:Holden Day, 1976.
    [Bru00] Bruzzone L. An approach to feature selection and classification of remote sensing images based on the Bayes rule for minimum cost. IEEE Transactions on geoscience and remote sensing, 38(1): 429-438, 2000.
    [Bur98] Burges C. J. C. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):955-974, 1998.
    [Bus51] Bush R. R., Mosteller F. A model for stimulus generalization and discriminant. Psycholgical Review, 58: 413-423, 1951.
    [Cai02] Cai Y. D., Liu X. J., Xu X. B., Chou K. C. Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect. Journal of Cellular Biochemistry, 84:343-348, 2002.
    [Cai03] Cai Y. D., Chou K. C. Nearest neighbor algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. Biochem. Biophys. Res. Commun., 305: 407-411, 2003.
    [Cai04] Cai Y. D., Chou K. C. Prediction subcellular localization of proteins in a hybridization space. Bioinformatics, 20: 1151-1156, 2004.
    [Car70] Carroll J. D., Chang J. J., Analysis of individual differences in multidimensional scaling via an n-way generalization of "Eckart-Young" decomposition. Psychometrika, 35(3):283-319, 1970.
    [Cha99] Chapelle O., Haffner P., Vapnik V. N. Support Vector Machines for Histogram-Based Image Classification. IEEE Transaction on neural networks, 10(5):1055-1064, 1999.
    [Che92] Cheng Y. Q., Zhuang Y. M., Yang, J. Y., Optimal Fisher Discriminate Analysis Using the Rank Decomposition [J], Pattern Recognition, 25(1): 101-111, 1992.
    [Che00] Chen L. F., Liao H. Y. M., Ko M. T. et al. A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recognition, 33(10): 1713-1726, 2000.
    [Che04] 陈伟, 翟晓. ARMA 模型在远程心电诊断中的应用研究. 科技通报, 20(6):569-572, 2004.
    [Chi03] Chiang J. H., Hao P. Y. A New Kernel-Based Fuzzy Clustering Approach: Support Vector Clustering With Cell Growing. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 11(4):518-527, 2003.
    [Cho80] Chou P. Y. Amino acid composition of four classes of proteins, Abstracts of Papers, Part I, Second Chemical Congress of the North American Continent, Las Vegas, 1980.
    [Cho89] Chou P. Y. Prediction of Protein Structure and the Principles of Protein Conformation. In: Fasman, G. D. (ed.), Plenum Press, New York, 549-586, 1989.
    [Cho95] Chou K. C. A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins, 21:319–344, 1995.
    [Cho99] Chou K. C., Elrod D. W. Prediction of membrane protein types and subcellular locations. Proteins, 34:137–153, 1999.
    [Cho00] Chou K. C. Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem. Biophys. Res. Commun, 278: 477-483, 2000.
    [Cho01] Chou K. C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins, 43:246–255, 2001.
    [Cho01a] Chou K. C. Using subsite coupling to predict signal peptides. Protein Engineering, 14(2):75-79, 2001.
    [Cho01b] Chou K. C. Prediction of signal peptides using scaled window. Peptides 22:1973-1979, 2001.
    [Cho01c] Chou K. C. Prediction of protein signal sequences and their cleavage sites. Proteins: Structure, Function, and Genetics, 42:136-139, 2001.
    [Cho02] Chou K. C. A new branch of proteomics: prediction of protein cellular attributes. (Ch. 4). In: Weinrer PW , Lu Q , editors. Gene cloning and expression technologies. Westborough, MA: Eaton Publishing:57-70, 2002.
    [Cho02a] Chou K. C., Cai Y. D. Using functional domain composition and support vector machines for prediction of protein subcellular location. Journal of Biological Chemistry, 277(48):45765-45769, 2002.
    [Cho02b] Chou K. C. Prediction of Protein Signal Sequences. Current Protein and Peptide Science, 3(1):615-622, 2002.
    [Cho03] Chou K. C., Cai Y. D. A new hybrid approach to predict subcellular locatization of proteins by incorporating gene ontology. BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 311(3):743-747, NOV 21 2003.
    [Cor95] Cortes C., Vapnik V. Support vector networks. Machine Learning, 20:1-25, 1995.
    [Das98] Das G., Lin K., Mannila H., Renganathan G, Smyth P. Rule Discovery from Time Series. KDD, 16-22, 1998.
    [Dav01] David H., Heikki M., Padhraic S. Principles of Data Mining. MIT Press, 2001.
    [Dec02] DeCoste D., Sch?lkopf B., “Training Invariant Support Vector Machines,” Machine Learning, 46(1):161-190, 2002.
    [Dur02] Durbin R., Eddy S., Krogh A., Mitchison G., Biological sequence analysis, 2002(.英文原版), 北京:清华大学出版社.
    [Ede99] Edelman S. Representation and recognition in vision. Cambridge, MA: MIT Press, 1999.
    [Eid03] Eidenberger H., Breiteneder C. Visual Similarity Measurement with the Feature Contrast Model. SPIE IS&T Electronic Imaging Conference (Storage and Retrieval for Media Databases), Santa Clara, USA, 2003.
    [Eis59] Eisler H., Ekman G. A mechanism of subjective similarity. Acta Psychologica, 16(2):1-10, 1959.
    [Fen01] Feng Z. P., Zhang C. T. Prediction of the subcellular location of prokaryotic proteins based on the hydrophobic index of the amino acids. Int. J. Biol. Macromol 14: 255-261, 2001.
    [Fen02] Feng Z. P., Zhang C. T. A graphic representation of protein primary structure and its application in predicting subcellular locations of prokaryotic proteins. Int. J. Biochem. Cell Biol, 34: 298-307, 2002.
    [Fig97] Figliola A., Serrano E. Analysis of Physiological Time Series Using Wavelet Transforms. IEEE Engineering in Medicine and Biology Magazine, 16(3): 74-79, 1997.
    [Fol87] Folz R. J., Gordon J. I. Computer-assisted predictions of signal peptidase processing sites. Biochem Biophys Res Commun, 146(2):870-877, 1987.
    [Fri89] Friedman J. H. Regularized discriminant analysis. Journal of the American Statistical Association, 84(405): 165—175, 1989.
    [Fuj01] Fujiwara Y., Asogawa M. Prediction of subcellular locations using amino acid composition and order. Genome Informatics, 12: 103-112, 2001.
    [Fum91] Fumio K., Kenji N. On the Practical Implication of Mutual Information for Statistical Decision making. IEEE TRANSACTIONS ON INFORMATION THEORY, 37(4): 1151-1156, 1991.
    [Gao05] Gao Q. B., Wang Z. Z., Yan C., Du Y. H. Prediction of protein subcellular location using a combined feature of sequence. FEBS LETTERS, 579(16):3444-3448, 2005.
    [Gao06] Gao H., Davis J. W. Why direct LDA is not equivalent to LDA. Pattern Recognition. 39:1002-1006, 2006.
    [Gen83] Gentner D. Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7(1):155—170, 1983.
    [Gol85] Goldfarb L. A new approach to pattern recognition. In L.N.Kanal and A.Rosenfeld, editors, Progress in Pattern Recognition, 2:241-402, 1985.
    [Gol94] Goldstone R. L. The role of similarity in categorization: providing a groundwork. Cognition, 52: 125-157, 1994.
    [Gol94a] Goldstone R., Medin D. Similarity, interactive. activation, and mapping. Journal of Experimental Psychology: Memory and Cognition, 20:3-28, 1994.
    [Haf01] Hafez A. Association Mining of Dependency between Time Series. Proceedings of SPIE, Apr 16-17 v 4384:291-301, 2001.
    [Hah03] Hahn U. Similarity. In L. Nadel (Ed.) Encyclopedia of Cognitive Science. London: Macmillan, 2003.
    [Hah03a] Hahn U., Chater N., Richardson L. B. Similarity as transformation. Cognition, 87(1):1-23, 2003.
    [Han99] Han J., Dong G., Yin Y. Efficient mining of partial periodic patterns in time series databases. In Proc. 1999 Int. Conf. Data Engineering(ICDE'99), Sydney, Australia. 106-115, April 1999.
    [Han01] Hand D., Mannila H., Smyth P. Principles of Data Mining.Cambridge, MA, USA:The MIT Press, 2001.
    [Hao00] 郝柏林,张淑誉. 生物信息学手册.上海:上海科技出版社,2000.
    [Has95] Hastie T., Tibshirani R. Penalized discriminant analysis. The Annals of Statistics, 23(1): 73-102, 1995.
    [Has05] Haasdonk B. Feature Space Interpretation of SVMs with Indefinite Kernels. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 27( 4): 482-492, 2005.
    [He01] 何炎祥, 石莉, 张戈等. 时序模式的几种开采算法及比较分析.小型微型计算机系统, 22(5):601-604, 2001.
    [Hei97] Heikki M., Pirjo R. Similarity of Event Sequences. In proceedings of the fourth International Workshop on Temporal Representation and Reasoning,:136-139, 1997.
    [Hen97] Henrik N., Jacob E., Seren B. A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. International Journal of Neural Systems, 8(5):581-599, 1997.
    [Hol94] Holland J. H., Holyoak K. J., Nisbett R. E., Thagard P. R. Induction: Processes of inference, learning, and discovery. Cambridge, MA: Bradford Books/MIT Press, 1986.
    [Hon91] Hong Z. Q., Yang J. Y. Optimal Discriminant Plane for a Small Number of Samples and Design Method of Classifier on the Plane. Pattern Recognition, 24(4): 317-324, 1991.
    [Hu90] 胡国定 张润楚. 多元数据分析方法――纯代数处理. 南开大学出版社, 1990.
    [Ima77] Imai S. Pattern similarity and cognitive transformations. Acta Psychologica, 41(6): 433-447, 1977.
    [Jay57] Jaynes E.T. Information theory and statistical mechanics. Phys. Rev., 106:620-630, 1957.
    [Jin05] Jin L., Tang H., Fang W. Prediction of protein subcellular locations using a new measure of information discrepancy. J Bioinform Comput Biol., 3: 915-927, 2005.
    [Jin06] Jing X. Y., Wong H. S., Zhang D. Face recognition based on discriminant fractional Fourier feature extraction. PATTERN RECOGNITION LETTERS, 27(13):1465-1471, 2006.
    [Kru64] Kruskal J. B. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(2):1-27, 1964.
    [Kru64a] Kruskal J. B. Nonmetric multidimensional scaling: a numerical method. Psychometrika, 29(2):115-129, 1964.
    [Kru78] Krumhansl C. L. Concerning the Applicability of Geometric Models to Similarity Data: The Interrelationship between Similarity and Spatial Density. Psychological Review, 85(5):445-463, 1978.
    [Kun06] Kunttu I., Lepisto L, Rauhamaa J., Visa A. Fourier-based object description in defect image retrieval. MACHINE VISION AND APPLICATIONS, 17(4):211-218, 2006
    [Lei04] Lei Z. D., Dai Y. A novel approach for prediction of protein subcellular localization from sequence using Fourier analysis and support vector machines. BIOKDD, 11-17, 2004.
    [Len04] Lennartsson D., Nordin P. A Genetic Programming Method for the Identification of Signal Peptides and Prediction of Their Cleavage Sites. EURASIP Journal on Applied Signal Processing, 1:138–145, 2004.
    [Li97] Li M.,Vitanyi P., An Introduction to Kolmogorov Complexity and its Apllications (2nd ed.). Springer-Verlag, 1997.
    [Li01] 李斌, 章劲松, 谭立湘等. 利用 FSART 算法实现对时间序列数据的聚类分析.小型微型计算机系统, 22(3):333-337, 2001.
    [Li03] Li, H. F., T., J., Zhang, K. S., Efficient and Robust Feature Extraction by Maximum Margin Criterion, Advances in Neural Information Processing Systems, Vancouver, Canada, 2003, 97 - 104.
    [lin98] 林宝成, 陈永彬. 基于 ARMA 模型的汉语讲话者识别. 声学学报 23(3): 229-234, 1998.
    [Lod02] Lodhi H. et al. Text Classification Using String Kernels. J. Machine Learning Research, 2:419-444, 2002.
    [Man95] Mannila H., Toivonen H., Verkamo A. I. Discovering frequent episodes in sequences. In Proc. First International Conference on Knowledge Discovery and Data Mining (KDD-95), Montreal, Quebec, Canada. AAAI Press, Menlo Park, California, pp210-215, 1995.
    [Man00] Mangasarian O.L. Generalized support vector machines. In B.S.A.J. Smola, P.Bartelett and D.Schuurmans, editors, Advances in large margin classifiers,MIT Press, 135-146, 2000.
    [Mar93] Markman A. B., Gentner D., Splitting the differences: A structural. alignment view of similarity. Journal of Memory and Language, 32(4):517-535, 1993.
    [Mar98] Martoglio B., Dobberstein B. Signal sequences: more than just greasy peptides. Trends in Cell Biology, 8(2),410-415. 1998.
    [Mcg68] McGee V. C. Multidimensionnal scaling of n sets of similarity measures: A nonmetric individual differences approach. Multivariate Behavioral Research, 3:233-248, 1968.
    [Mcg85] McGeoch D. J. On the predictive recognition of signal peptide sequences. Virus Res 3(3):271-86 1985.
    [Mer09] Mercer J. Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy Soc. London, A 209: 415-446, 1909.
    [Mic98] Michael T. R., Paul R. C. Concepts from Time Series. Proceedings of the Fifteenth National Conference on Artificial Intelligence, 739-745, 1998.
    [Mik99] Mike S., Ratsch G., Weston J., Sch?lkopf B., Muller K. R. Fisher discriminant analysis with kernels. In Y.-H.Hu,J.Larsen, E.Wilson, and S.Douglas, editors, Neural Networks for Signal Processing IX,IEEE, 41-48, 1999.
    [Moj04] Mojsilovic A., Gomes J., Rogowitz B., Semantic-friendly indexing and quering of images based on the extraction of the objective semantic cues. International Journal Of Computer Vision, 56(1-2):79-107, 2004.
    [Mor04] Moreno P. J., Ho P., Vasconcelos N. A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications. Proc. Advances in Neural Information Processing Systems, 16:1385-1392, 2004.
    [Mur01] Murvai J., Valhovicek K., Barta E., Pongor S. The SBASE protein domain library, release 8.0: a collection of annotated protein sequence segments. Nucleic Acids Res, 29: 58-60, 2001.
    [Nai02] Nair R., Rost B. Inferring subcellular localization through automated lexical analysis. Bioinformatics, 18: S78-S86, 2002.
    [Nai03] Nair R., Rost B. Better prediction of subcellular localization by combing evolutionary and structural information. Proteins Struct. Funct. Genet., 53: 917-930, 2003.
    [Nak86] Nakashima H., Nishikawa K., T ool. The folding type of a protein is relevant to the amino acid composition. J. Biochem, 99:153-162, 1986.
    [Nak94] Nakashima H., Nishikawa K. Discrimination of intracellular and extracellular proteins using amino acid composition and residues-pair frequencies. J. Mol. Biol. 238: 54-61, 1994.
    [Nie96] Nielsen H., Engelbrecht J., von H. G., Brunak S. Defining a similarity threshold for a functional protein sequence pattern: The signal peptide cleavage site. PROTEINS, 24:165-177, 1996.
    [Nie97] Nielsen H., Engelbrecht J., Brunak S., von H. G. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng., 10:1-6, 1997. (http://www.cbs.dtu.dk/ftp/signalp/p/signalp/)
    [Nov62] Novikoff A. B. J. On convergence proofs on perceptrons. Proceedings of the Symposium on the Mathematical Theory of Automata, Polytechnic Institute of Brooklyn, XII: 615-622, 1962.
    [Opp85] Oppenheim A. V., Willsky A. S., NawabS. H. Signals and Systems..Prentice Hall, 1985.
    [Par00] Park S., Chu W. W., Yoon J., Hsu C. Efficient Searches for Similar Subsequences of Different Lengths in Sequence Databases. 16th International Conference on Data Engineering,Date, : 29 Feb.-3 March 2000 2000.
    [Pea88] Pearson, W. R., Lipman, D. J. Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the USA, 4:2444-2448, 1988.
    [Pea00] Pearson W. R. Flexible sequence similarity searching with the FASTA3 program package. Methods Mol. Biol, 132: 185-219, 2000.
    [Pek01] Pekalska E., Paclik P., Duin R. P. W. A Generalized Kernel Approach to Dissimilarity-based Classification. Journal of Machine Learning Research, 2:175-211, 2001.
    [Qui69] Quine W. V. Ontological Relativity and Other Essays. Columbia University Press, 165, 1969.
    [Raj05] Rajagopalan V., Ray A. Wavelet-based Space Partitioning for Symbolic Time Series Analysis. Proceedings of the 44th IEEE Conference on Decision and Control, and the European Control Conference, Seville, Spain, 5245-5250, 2005.
    [Ros62] Rosenblatt F. Principles of Neurodinamics: Perceptron and Theory of Brain Mechanisms. Washington D. C. Spartan Books, 1962.
    [San01] Santini S., Gupta A., Jain R., Emergent semantics through interaction in image databases. IEEE Transactions On Knowledge And Data Engineering, 13(3):337-351, 2001.
    [Sch94] Schneider G., Wrede P. The rational design of amino acid sequences by artificial neural networks and simulated molecular evolution: De novo design of an idealized leader peptidase cleavage site. Biophysical Journal, 66(2):335-344, 1994.
    [Sch99] Sch?lkopf B., Smola A., Muller K. Kernel Principle Component Analysis. In B. Sch?lkopf, C.Burges, and A.Smola, editors, Advances in Kernel Methods-Support Vector Learning, MIT Press, 327-352, 1999.
    [Sch00] Sch?lkopf B., Smola A., Williamson R. C. New support vector algorithms. Neural Computation, 12:1207-1245, 2000.
    [Sco04] Scott M. S., Thomas D. Y., Hallett M. T. Predicting subcellular localization via protein motif co-occurrence. Genome Res., 14: 1957-1966, 2004.
    [Sha48] Shannon C. E. A mathematical theory of communication. Bell Sys. Tech. Journal, 27:379-423,623-659, 1948
    [She62] Shepard R. N. The analysis of proximities: Multidimensional scaling with an unknown distance function. I. Psychometrika, 27(2):125-140, 1962.
    [She62a] Shepard R. N. The analysis of proximities: Multidimensional scaling with an unknown distance function. II. Psychometrika, 27(3):219-246, 1962.
    [She87] Shepard R. N. Toward a universal law of generalization for psychological science. Science, 237(4820):1317-1323, 1987.
    [Shu06] Shung Y. L. Wavelet feature selection based neural networks with application to the text independent speaker identification. Pattern Recognition, 39:1518-1521, 2006.
    [Sjo72] Sjoberg L. A cognitive theory of similarity. Goteborg Psychological Reports, 2(10). 1972.
    [Smi74] Smith E. E., Shoben E. J., Rips L. J. Structure and process in semantic memory: A featural model for semantic decisions. Psychological Review, 81:214-241, 1974.
    [Swe96] Swets D. L., Weng J. Using discriminant eigenfeatures for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8): 831-836, 1996.
    [Tan06] Tan H. S. Fourier neural networks and generalized single hidden layer networks in aircraft engine fault diagnostics. JOURNAL OF ENGINEERING FOR GAS TURBINES AND POWER-TRANSACTIONS OF THE ASME, 128(4):773-782, 2006.
    [Tan06a] Tang H,et al., Laplacian Linear Discriminant Analysis, Pattern Recognition, Vol.39, No.1, pp.136-139, 2006.
    [Ten00] Tenenbaum J. B., de Silva V., Langford J. C. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319-2322, 2000.
    [Tho94] Thompson J. D., Higgins D. G., Gibson T. J. CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res. 22: 4573-4680, 1994.
    [Tia86] Tian Q., Barbero M., Gu Z. H. et al. Image classification by the foley-sammon transform. Optical Engineering, 25(7): 834—840, 1986.
    [Tor65] Torgerson W. S. Multidimensional scaling of similarity. Psychometricka, 30(4):379-393, 1965.
    [Tve77] Tversky A. Features of similarity. Psychological Review, 84(4):327-352, 1977.
    [Tve82] Tversky A., Gati I. similarity, separability, and the triangle inequality. Psychological Review, 89(2):123-154, 1982.
    [Tve86] Tversky A., Hutchinson J.W. Nearest neighbor analysis of psychological spaces. Psychological Review, 93:3-22, 1986.
    [Vap68] Vapnik V. N., Chervonenkis A. J. On the uniform convergence of relative frequencies of events to their probabilities. Doklady Akademii Nauk USSR, 181(4), 1968. (English transl. Sov. Math. Dokl.)
    [Vap71] Vapnik V. N., Chervonenkis A. J. On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Apl., 16:264-280, 1971.
    [Vap74] Vapnik V N., Chervonenkis A. J. Theory of Pattern Recognition (in Russian), Nauka, Moscow, 1974.
    [Vap79] Vapnik V. N. Estimation of Dependencies Based on Empirical Data (in Russian). Nauka, Moscow, 1979. (English translation: Springer Verlag, New York, 1982).
    [Vap89] Vapnik V. N., Chervonenkis A. J. The necessary and sufficient conditions for consistency of the method of empirical risk minimization (in Russian). Yearbook of the Academy of Sciences of the USSR on Recognition, Classification, and Forecasting, 2, Nauka, Moscow, 207-249, 1989. (English translation: Pattern Recogn. and Image Analysis, 1(3): 284-305, 1991).
    [Vap95] Vapnik V. The Nature of Statistical Learning. Springer-Verlag, New York, 1995.(中文版:张学工译. 统计学习理论的本质. 北京:清华大学出版社,2000).
    [Ver02] Vert J. P. Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings. Pac. Symp. Biocomput., 649–660, 2002.
    [Von84] Von H. G. How signal sequences maintain cleavage specificity. Journal of Molecular Biology, 173:243-251, 1984.
    [Von85] von H. G. Signal sequences. The limits of variation. Journal of molecular biology, 184(1):99-105, 1985.
    [Wan04] Wang M., Yang J., Liu G. P., Xu Z. J., Chou K. C. Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid compostition. Protein Engineering, Design&Selection, 17(6):509-516, 2004.
    [Wan05] Wang M., Yang J., Chou K. C. Using string kernel to predict signal peptide cleavage site based on subsite coupling model. Amino Acids, 28:395-402, 2005.
    [Win00] Winter S., Location similarity of regions. Isprs Journal Of Photogrammetry And Remote Sensing, 55(3):189-200, 2000.
    [Yao92] 姚天任. 数字语音处理. 华中科技大学出版社, 1992.
    [Yan01] Yang S. M., Kyu Y. W., Woong K. L. Efficient Time-Series Sub-sequence Matching Using Duality in Constructing Windows. Information Systems, 26(4):279-293, 2001.
    [Yan04] Yang Q., Tang X. Recent Advances in Subspace Analysis for Face Recognition. SINOBIOMETRICS, 275-287, 2004.
    [Yaz96] Yazdani N., Ozsoyoglu Z. M. Sequence Matching of Images. 8th International Conference on Scientific and Statistical Database Management (SSDBM '96), 53, 1996.
    [Yu01] Yu H., Yang H. A direct LDA algorithm for high-dimensional data - with application to face recognition. Pattern Recognition, 34(10): 2067-2070, 2001.
    [Zha00] 张春霆. 生物信息学的现状与展望. 中国青年科技. 79:48-49, 2000.
    [zha05] 张卓. SAS 软件的应用——基于 ARMA 模型的商品销售额的预测分析. 统计与信息论坛, 20(4):104-107, 2005.
    [Zho98] Zhou G. P. An intriguing controversy over protein structural class prediction. J. Protein Chem., 17: 729–738, 1998.
    [Zha02] 张保稳. 时间序列数据挖掘研究. 博士学位论文,西北工业大学,计算机软件与理论专业,2002.
    [Zha03] 赵国平. 生物信息学. 北京:科学出版社, 2003.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700