用户名: 密码: 验证码:
模糊聚类新算法及应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
论文的研究内容主要包括模糊聚类新方法及其生物应用,以模糊聚类的创新理论为研究重点,以生物领域中的实际应用为背景,内容涉及计算智能技术与相关生物学应用的结合问题,属于交叉学科的研究课题,具有十分重要的理论意义和实际应用价值。论文的研究路线分为两条,首先提出了若干模糊聚类新算法,丰富和完善了模式识别中有关聚类的理论与方法。然后针对生物领域的实际应用问题,研究了面向复杂生物数据集的计算智能新理论,利用计算智能深入的数据分析和信息挖掘能力,揭示大量生物数据之间复杂的相互关系,以此实现理论与应用两条技术路线在生物信息学中的统一。论文的主要工作如下:
     1.研究了基于核方法的模糊聚类算法:协同模糊核聚类算法和加权模糊核聚类算法。将协同关系函数引入模糊核聚类算法的目标函数中,得到一种新的协同模糊核聚类算法。该算法的特点是通过核方法把数据映射到高维特征空间以扩大样本之间的差异性,并且能用一个目标函数处理多个特征子集的数据,将模糊核聚类算法在不同特征子集上进行协同,使各类中心点的区分更加明显,得到了聚类效果更好的新算法。另外针对加权模糊核聚类算法(WFKCA)容易陷入局部最优的问题,提出了一种改进算法,将迭代自组织数据分析算法(ISODATA)的思想引入到WFKCA算法中,利用聚类中心分裂/合并的中间结果来调整初始中心。改进算法采用特征空间中的计算度量,并增加了对聚类中心的调整幅度,聚类性能更稳定。
     2.研究了基于模糊散布矩阵的聚类算法及其应用,首先对基于模糊Fisher准则(FFC)的聚类算法的性能进行了改进研究。针对已有算法类中心计算式不准确的问题,提出采用更合理的类中心迭代式的新方法,获得了更好的聚类性能。然后基于模糊Fisher聚类算法在聚类时能得到最优投影矢量,设计了一种适合生物领域智能预测的分类器,它不同于有监督和无监督聚类,是一种整合的模糊Fisher聚类算法,并用于识别分泌性蛋白的信号肽。当用户本身拥有高可靠性的训练样本时,模糊Fisher分类器能很方便地满足用户对模型训练的需求。最后对于维数较高且结构复杂的生物数据集,提出一种自动确定最佳聚类数目的方法,该方法充分体现“类内紧凑类间离散”的思想,结合目标函数二阶差分的判定准则,通过聚类算法的自学习来确定复杂生物数据集的合理聚类数目。
     3.已有的蛋白序列特征提取方法是对整条独立序列的特征提取,不适用于替换局部信号肽序列以后的外源蛋白质。因此我们将信号肽与外源蛋白之间的相容程度定义为结构融合度,从数学角度分析信号肽拼接以后与邻近残基之间的相互作用,提出信号肽拼接区域与目标蛋白之间的数学模型。将从模型提取的结构融合度特征用于识别外源蛋白的可分泌性,取得了满意的实验结果。
     4.对近期提出的一种基于点对约束的半监督模糊聚类算法进行了研究,研究发现其约束项与原算法的目标函数之间数量级不一致,是造成隶属度调整过度的主要原因。针对该问题,我们在重新定义目标函数的基础上提出了改进算法,引入新的约束惩罚函数,通过优化求解带约束惩罚条件的目标函数得到了新的半监督聚类算法。新的约束项与原目标函数之间能很好地协调合作,并能通过对隶属度的适当调整得到更好的聚类效果。
Novel fuzzy clustering algorithms and their applications in biological fields are studied with the emphasis on fuzzy clustering theories and the practical problems in applied biotechnology. It is the interdisciplinary subject of computational intelligence and applied biotechnology-related topic, and is of great significance for both theoretical research and practical applications. There are two research routines in the thesis. For the theoretical research on fuzzy clustering, new fuzzy clustering algorithms are proposed. For the bioinformatics research, new theories of computational intelligence are proposed that aims to solve practical problems in applied biotechnology. Facilitated by the useful tool of computational intelligence of mining the complex information among the biological data, the two research routines of theory and application are integrated. The main contributions in the thesis are summarized as follows.
     1.Two new fuzzy clustering algorithms based on kernel method are proposed including collaborative kernel fuzzy clustering and weighted fuzzy kernel clustering. An improved collaborative kernel fuzzy c-means clustering (CKFCM) algorithm is proposed, in which the function of collaborative relationship was incorporated into kernel fuzzy c-means clustering (KFCM). CKFCM can map the observed data to a higher dimensional feature space with a kernel function which can enlarge the difference among samples, and CKFCM implementing on several subsets can be processed together with an objective function, which improves the clustering performance by collaborating partition matrices among different feature subsets. So CKFCM achieves better classification by more separable centers, and is an effective clustering with better performance. An improved algorithm of weighted fuzzy kernel clustering (WFKCA) is proposed to overcome its shortcoming of liability to stick to a local optimum. The idea of iterative self-organizing data analysis techniques algorithm (ISODATA) is introduced into the WFKCA, and initial center vectors are adjusted by the intermediate results from splitting and/or merging of clustering centers to reduce the possibility of local optimum. The improved algorithm uses matchable measurement from feature space, and increases the adjustment range of clustering centers, so it achieves more stable performance of clustering.
     2.Studies are made on the clustering algorithms based on fuzzy scatter matrices. Firstly, aiming at the problem that previous algorithms use inaccurate iterative expression of cluster centers, an improved clustering based on Fuzzy Fisher Criterion (FFC) with new centers equations is proposed. Secondly, an integrated fuzzy fisher clustering (IFFC) by combining the supervised and unsupervised clusterings is developed, and a novel classifier based on IFFC for recognizing secretory proteins is designed. The classifier is suitable for intelligent prediction in biology area and is convenient for users to train the model. Lastly, an automatic technique to determine the reasonable cluster number of complex biological datasets is proposed. The significant calculation is implemented by an optimization algorithm that reflects the idea of compactness of intra-cluster and separability of inter-cluster, then the reasonable cluster number is determined by using the maximum criteria of second order difference of objective function. The new method can automatically get the reasonable cluster number for complex datasets.
     3. Previous methods of feature extraction for protein sequence are suitable for the independent sequence, which are limited for heterologous proteins that in-frame fuse signal peptide. A structural fusion degree (SFD) is defined to determine the compatibility degree of target proteins and signal peptides, and the interaction between fused signal peptides and adjacent residues of proteins is analyzed mathematically. A mathematical model of extended signal region and the protein is proposed. SFD features are extracted from this model to recognize the secretability of heterologous proteins, and satisfactory results are obtained by the proposed model.
     4. A study is made on a recently developed semi-supervised fuzzy clustering algorithm with pairwise constraints, in which the disagreement on the magnitude order between penalty cost function and the basic objective function will cause over adjustment of membership values. In order to solve this problem, an improved algorithm is proposed based on a redefined objective function. A new constraint function is incorporated additively as a penalty cost of basic objective function to obtain a new semi-supervised optimization problem. The new penalty cost function can achieve a good agreement and cooperation with the basic objective function and can produce more accurate clustering results by moderately enhancing or reducing the ambiguous membership values.
引文
[1] Rui X, Wunsch D, Survey of Clustering Algorithms[J]. IEEE Transactions on Neural Networks, 2005, 16(3): 645-678.第一章
    [2] Omran M G H, Engelbrecht A P, Salman A. An overview of clustering methods[J]. Intelligent Data Analysis, 2007, 11(6), 583-605.
    [3] Pedrycz W, Rai P. Collaborative Fuzzy Clustering with the use of Fuzzy C-Means and its Quantification[J]. Fuzzy Sets and Systems, 2008, 159(18): 2399-2427.
    [4] Grira N, Crucianu M, Boujemaa N. Fuzzy clustering with pairwise constraints for knowledge-driven image categorization[J]. IEE Proceedings Vision, Image & Signal Processing. 2006, 153(3), 299-304.
    [5] Girolami M. Mercer kernel-based clustering in feature space[J]. IEEE Transactions on Neual Networks, 2002, 13(3): 780-784.
    [6]田雨波,钱鉴.计算智能与计算电磁学[M].北京:科学出版社, 2008.
    [7] Bezdek J C. On the relationship between Neural Networks, Pattern Recognition and Intelligence[J]. The International Journal of Approximate Reasoning. 1992, 6(2): 85-107.
    [8] Bezdek J C. What is computational intelligence[C]. In: J.M. Zurada (eds.): Computational Intelligence: Imitating Life. New York: IEEE Press, 1994: 1-2.
    [9] Marks R J. Intelligence: Computational Versus Artificial[J]. IEEE Transactions on Neural Networks, 1993, 4(5): 737-739.
    [10] Zadeh L A. Fuzzy sets[J]. Information and Control, 1965, 8(3): 338-353.
    [11] Zadeh L A. Fuzzy algorithm[J]. Information and Control, 1968, 12(2): 94-102.
    [12] Bellman R E, Zadeh L A. Decision-making in fuzzy environment[J]. Management Science, 1970, 17(4): 141-164.
    [13] Zadeh L A. Similarity relation and fuzzy ordering[J]. Information Science, 1971, 3(2): 177-200.
    [14] Zadeh L A. outline of a new approach to the analysis of complex systems and decision processes[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1973, 3(1): 28-44.
    [15]滕在霞,刘悦,高峻峻.基于加权欧式距离的替代率估算方法[J].计算机工程, 2010, 35(15): 283-285.
    [16] Klein D, Kamvar S D, Manning C D. From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering[C]. In: Proceedings of the 19th International Conference on Machine Learning, Los Altos: Morgan Kaufmann Publishers Inc., 2002: 307-314.
    [17] Xing E P, Ng A Y, Jordan M I, Russell S. Distance metric learning with application to clustering with side-information[C]. In: Becker S T S, Obermayer K (Eds.), Advances in Neural Information Processing Systems, Cambridge: MIT Press, 2003: 505-512.
    [18] Bezdek J C, Ehrlich R, Full W. FCM: the Fuzzy C-Means Clustering Algorithm[J]. Computer & Geoscience, 1984, 10(2-3): 191-203.
    [19] Bilenko M, Mooney R J. Adaptive duplicate detection using learnable string similarity measures[C] in: International Conference on Knowledge Discovery and Data Mining, Washington, DC, 2003: 39-48.
    [20] Millgan G W, Cooper M. An examination of procedures for determining the number of clusters in a data set [J]. Psychometrika, 1985, 50(2): 159-179.
    [21] Chen S C, Zhang D Q. Robust image segmentation using FCM with spatial constraints based on new kernel-induced distance measure[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2004, 34(4): 1907-1916.
    [22] Krishnapuram R. A possibilistic approach to clustering [J]. IEEE Transactions on Fuzzy Systems, l993, 1(2): 98-l l0.
    [23] Nascimento S, Mirkin B, Moura-Pires F. Modeling proportional membership in fuzzy clustering[J]. IEEE Transactions on Fuzzy Systems, 2003, 2(11): 173-186.
    [24]祁宏宇,吴小俊,王士同,杨静宇.一种协同的FCPM模糊聚类算法[J],模式识别与人工智能, 2010, 23(1): 121-126.
    [25] Fukuyama Y, Sugeno M. A new method of choosing the number of clusters for fuzzy c-means method[C]. In: Proceedings of the 5th Fuzzy System Symposium, 1989 : 247-250.
    [26] Sugeno M, Yasukawa T. A fuzzy-logic-based approach to qualitative modeling[J]. IEEE Transactions on Fuzzy Systems, 1993, 1(1) : 7-31.
    [27] Wu K L, Yu J, Yang M S. A novel fuzzy clustering algorithm based on a fuzzy scatter matrix with optimality test[J]. Pattern Recognition Letters, 2005, 26(5): 639-652.
    [28]郭海湘,诸克军.基于模糊c-均值算法和遗传算法的新聚类方法[J].华南理工大学学报(自然科学版), 2004, 32(10): 93-96.
    [29] Yang M S, Wu K L, Yu J. A novel fuzzy clustering algorithm[C]. In: IEEE Internatinional Symposium on Coputational Intelligence in Robotics and Automation. Kobe: IEEE, 2003, 2: 647-652.
    [30]伍忠东,高新波,谢维信,基于核方法的模糊聚类算法[J].西安电子科技大学学报(自然科学版)2004, 31(4): 533-537.
    [31]沈红斌,王士同,吴小俊.离群模糊核聚类算法[J].软件学报, 2004, 15(7): 1021-1029.
    [32] Vapnik V N. Statistical learning theory[M]. New York: Wiley, 1998.
    [33] Müller K R, Mika S, Rtsch G, Tsuda K, Schlkopf B. An Introduction to Kernel-based Learning algorithms[J]. IEEE Transactions on Neural Networks, 2001, 12(2): 181-202.
    [34]邓超,郭茂祖.基于Tri-Training和数据剪辑的半监督聚类算法[J].软件学报, 2008, 19(3): 663-673.
    [35]张博锋,白冰,苏金树.基于自训练E M算法的半监督文本分类[J].国防科技大学学报, 2007, 29(6): 65-69.
    [36]张松顺李朝锋吴小俊高翠芳.改进微分进化算法的半监督模糊聚类[J].计算机应用, 2009, 29(4): 1046-1051.
    [37]尹学松,胡恩良,陈松灿.基于成对约束的判别型半监督聚类分析[J].软件学报, 19(11): 2791-2802.
    [38]张春霆.生物信息学的现状与展望[J].院士论坛, 2000, 22(6): 17-20.
    [39]邱建丁,梁汝萍,邹小勇,莫金垣.连续小波变换在蛋白质结构预测中的应用[J].中山大学学报(自然科学版), 2003, 42(2): 59-61.
    [40]白素琴,惠长坤,吴小俊,王士同.一种基于遗传算法的模糊聚类算法及基与FCM算法的结合[J].华东船舶工业学院学报(自然科学版), 2001, 15(6): 40-43.
    [41] Chou P Y, Fasman G D. Empirical Predictions of Protein Conformation[J]. Annual Review of Biochemistry, 1978, 47: 251-276
    [42] Biou V, Gibrat J F, Levin J M, Robson B, Garnier J. Secondary structure prediction: combination of three different methods[J]. Life Sciences & Medicine PEDS, 1988, 2(3): 185-191.
    [43]孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报, 2008, 19(1): 48-61.
    [44] Jain A K, Murty M N, Flynn P J. Data Clustering: A Review[J]. ACM Computing Surveys, 1999, 31(3): 264-323.
    [45]许能忠.生物信息学[M].北京:清华大学出版社, 2008.
    [46]霍奇曼.生物信息学(第2版) [M].北京:科学出版社. 2010.
    [47]孙啸,陆祖宏,谢建明.生物信息学基础[M].北京:清华大学出版社, 2005.
    [48] Thompson J D, Gibson1 T J, Plewniak F, Jeanmougin F, Higgins D G. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools[J]. Nucleic Acids Research, 1997, 25(24): 4876-4882.
    [49] Reverter A, McWilliam S M, Barris W, Dalrymple B P. A rapid method for computationally inferring transcriptome coverage and microarray sensitivity[J]. Bioinformatics, 2004, 21(1): 80-89.
    [50] Tasheva E S, Ke A, Conrad G W. Analysis of the expression of chondroadherin in mouse ocular and non-ocular tissues[J]. Molecular Vision, 2004, 10:,544-554.
    [51] Hartemink A J. Reverse engineering gene regulatory networks[J]. Nature Biotechnology, 2005, 23(5): 554-555.
    [52] Koski T. Hidden Markov Models for Bioinformatics[M]. Dordrecht: Kluwer Academic Publishers, 2001.
    [53] Biou V, Gibrat J F, Levin J M, Robson1 B, Garnier J. Secondary structure prediction: combination of three different methods[J]. Life Sciences & Medicine PEDS, 1988, 2(3): 185-191.
    [54]王艳春,何东健,王守志.基于级联神经网络的蛋白质二级结构预测[J].计算机工程, 2010, 4: 22-24.
    [55] Shen H B, Yi D L, Yao L X, Yang J, Chou K C. Knowledge-based computational intelligence development for predicting protein secondary structures from sequences[J]. Expert Review of Proteomics, 2008, 5(5): 653-662.
    [56]周正荣,宋晓峰,王明浩.使用组合分类器预测蛋白质相互作用[J].电子学报, 2010, 38(6): 1464-1467.
    [57]孙鹏飞,张健沛.基于量子遗传算法的蛋白质折叠结构预测[J].哈尔滨工程大学学报, 2010, 31(1): 92-97.
    [58] Chou K C.Some remarks on protein attribute prediction and pseudo amino acid compositon(50th Anniversary Year Review)[J]. Journal Of Theoretical Biology,2011,273,236-247.
    [59] Plewczynski D, Slabinski L, Ginalski K, Rychlewski L. Prediction of signal peptides in protein sequences by neural networks[J]. acta biochimica polonica, 2008, 55(2): 261-267.
    [60] Zhang Z, Wood W I. A profile hidden Markov model for signal peptides generated by HMMER[J]. Bioinformatics, 2003, 19(2): 307-308.
    [61] Bendtsen J D, Nielsen H, Heijne G, et al. Improved prediction of signal peptides: SignalP 3.0[J]. J Mol Biol, 2004, 340(4): 783-795.
    [62] Chou K C, Shen H B. Signal-CF: A subsite-coupled and window-fusing approach for predicting signal peptides[J]. Biochemical and Biophysical Research Communications, 2007, 357(3): 633-640.
    [63] Liu H, Yang J, Liu D Q, et al. Using a new alignment kernel function to identify secretory proteins[J]. Protein Pept Lett, 2007, 14(2): 203-208.
    [64]张树波,赖剑煌,何建国.一种基于最优局部信息融合的蛋白质亚细胞定位预测方法[J].中山大学学报(自然科学版), 2008, 47(6): 16-21.
    [65] Menne K M L, Hermjakob H, Apweiler R. A comparison of signal sequence prediction methods using a test set of signal peptides[J]. Bioinformatics, 2000, 16(8): 741-742.
    [66] Klee E W, Ellis L B M. Evaluating eukaryotic secreted protein prediction[J]. BMC Bioinformatics, 2005, 6(1): 256.
    [67]张敏,于剑.基于划分的模糊聚类算法[J].软件学报, 2004, 15(6): 858-868
    [68] Zhang D Q, Chen S C. Clustering incomplete data using kernel based fuzzy c-means algorithm[J]. Neural Processing Letter, 2003, 18(3): 155-162
    [69] Grigorios F T, Aristidis C L. The Global Kernel k-Means Algorithm for Clustering in Feature Space[J]. IEEE Trans on Neural Networks, 2009, 20(7): 1181-1194
    [70] Camastra F, Verri A. A Novel Kernel Method for Clustering[J]. IEEE Trans on pattern analysis and machine intelligence, 2005, 27(5): 801-805
    [71] Pedrycz W. Collaborative fuzzy clustering[J]. Pattern Recognition Letters, 2002, 23(14): 1675-1686
    [72] Falcon R, Jeon G, Bello R, et al. Learning Collaboration Links in a Collaborative Fuzzy Clustering Environment[M]. Lecture Notes in Computer Science. Berlin / Heidelberg: Springer, 2007: 483-495
    [73] Yu F S, Tang J, Cai R Q. A Necessary Preprocessing in Horizontal Collaborative Fuzzy Clustering[C]. In: IEEE International Conference on Granular Computing. Washington, DC: IEEE Computer Society, 2007. 399-403
    [74] Nascimento S. Fuzzy Clustering via Proportional Membership Model[M]. Amsterdam, Berlin, Oxford, Tokyo, Washington, D.C.: IOS Press, 2005: 16-17.
    [75] Blake C, Keogh E, Merz C. UCI Machine Learning Repository Data Sets [DB/OL]. http://archive.ics.uci.edu/ml/datasets.html.
    [76] Mitra S, Banka H, Pedrycz W. Rough-Fuzzy Collaborative Clustering[J]. IEEE Trans on Systems, Man, and Cybernetics-Part B: Cybernetics. 2006, 36(4): 795-805
    [77] Pedrycz W, Vukovich G.. Clustering in the framework of collaborative agents[C]. In: Proceedings of the 2002 IEEE International Conference on Fuzzy Systems, 2002, 1: 134-138.
    [78] Ayad H, Kamel M. Finding natural clusters using multi-cluster combiner based on shared nearest neighbors. Lecture Notes in Computer Science[M]. Berlin / Heidelberg: Springer, 2003: 166-175.
    [79] Jain A, Duin R, Mao J. Statistical pattern recognition: a review[J]. IEEE Trans on Pattern Analysisand Machine Intelligence, 2000, 22(1): 4-37.
    [80] Bezdek, J C. A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1980, 2(1): 1-8.
    [81] Silva J C, Klusch M. Inference in distributed data clustering[J]. Engineering Applications of Artificial Intelligence, 2006, 19(4): 363-369.
    [82] Merugu S, Ghosh J. A privacy-sensitive approach to distributed clustering[J]. Pattern Recognition Letters, 2005, 26(4): 399-410.
    [83] Nowak R. Distributed EM algorithms for density estimation and clustering in sensor networks[J]. IEEE Transactions on Signal Processing, 2003, 51(8): 2245-2253.
    [84] Topchy A, Jain K, Punch W. Clustering ensembles: models of consensus and weak partitions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(12): 1866-1881.
    [85] Strehl A, Ghosh J. Cluster ensembles: a knowledge reuse framework for combining multiple partitions[J]. Journal of Machine Learning Research, 2002, 3: 583-617.
    [86]张莉,周伟达,焦李成.核聚类算法[J].计算机学报, 2002, 25(6): 587-590.
    [87] Shen H B, Yang J, Wang S T, et al. Attribute Weighted Mercer Kernel based Fuzzy Clustering Algorithm for General Non-Spherical Datasets[J]. Soft Computing Journal, 2006, 10(11): 1061-1073.
    [88] Hall L O, ?zyurt I B, Bezdek J C. Clustering with a Genetically Optimized Approach[J]. IEEE Transactions on Evolutionary Computation, 1999, 3(2): 103-112.
    [89]周林峰,丁永生.基于遗传算法的Mercer核聚类方法[J].模式识别与人工智能, 2006, 19(3): 307-311.
    [90] Kanade P M, Hall L O. Fuzzy Ants as a Clustering Concept [C]// Ed. Walker E L. 22nd International Conference of the North American Fuzzy Information Processing Society. Piscataway, New York, USA: IEEE, 2003: 227-232.
    [91] Riccardi G A, Schow P H. Adaptation of the ISODATA Clustering Algorithm for Vector Supercomputer Execution [C]// Proceedings of Supercomputing 88. Vol.II: Science and Application. Piscataway, New York, USA: IEEE, 1988: 141-150.
    [92]杨淑莹.模式识别与智能计算—Matlab技术实现[M].北京:电子工业出版社,2008:257-262.
    [93] Chou K C. Some remarks on protein attribute prediction and pseudo amino acid composition (50th Anniversary Year Review) [J]. Journal Of Theoretical Biology, 2011, 273, 236-247.
    [94] Mordong. http://www.hudong.com/wiki/信号肽[DB/OL], 2010-11-20.
    [95] Zhang X Z, Cui Z L, Hong Q, et al. High-level expression and secretion of methyl parathion hydrolase in Bacillus subtilis WB800[J]. Appl Environ Microbiol, 2005, 71(7): 4101–4103.
    [96] Fu L L, Xu Z R, Li W F, et al. Protein secretion pathways in Bacillus subtilis: implication for optimization of heterologous protein secretion[J]. Biotechnol Adv, 2007, 25(1): 1–12.
    [97] Liew A W C, Yan H, Yang M. Pattern recognition techniques for the emerging field of bioinformatics: a review[J]. Pattern Recognition, 2005, 38(11): 2055–2073.
    [98] Keedwell E, Narayanan A. Intelligent Bioinformatics: the application of artificial intelligence techniques to bioinformatics problems[M]. Chichester, West Sussex, England: John Wiley & Sons Ltd, 2005: 101–218.
    [99] Liu D Q, Liu H, Shen H B, et al. Predicting secretory protein signal sequence cleavage sites byfusing the marks of global alignments[J], Amino Acids. 2007, 32:493-496.
    [100] Li Y Z, Wen Z N, Zhou C S, et al. Effects of neighboring sequence environment in predicting cleavage sites of signal peptides[J]. Peptides, 2008, 29(9): 1498–1504.
    [101] K?ll L, Krogh A, Sonnhammer ELL. A combined transmembrane topology and signal peptide prediction method[J]. J Mol Biol, 2004, 338(5): 1027–1036.
    [102] Shen H B, Chou K C. Ensemble classifier for protein fold pattern recognition[J]. Bioinformatics, 2006, 22(14): 1717–1722.
    [103] Shen H B, Chou K Chen. PseAAC: a flexible web-server for generating various kinds of protein pseudo amino acid composition[J], Analytical Biochemistry, 2008, 373: 386-388.
    [104] LiòP. Wavelets in bioinformatics and computational biology: state of art and perspectives[J]. Bioinformatics, 2003, 19(1): 2–9.
    [105] Sonenshein A L, Hoch J A, Losick R. Bacillus subtilis and its closest relatives[M]. Washington DC: ASM Press, 2001.
    [106] Xia Y, Chen W, Zhao JX, et al. Construction of a new food-grade expression system for Bacillus subtilis based on theta replication plasmids and auxotrophic complementation[J]. Appl Microbiol Biotechnol, 2007, 76(3): 643–650.
    [107] Hemila H, Pakkanen R, Heikinheimo R, et al. Expression of the Erwinia carotovora polygalacturonase-encoding gene in Bacillus subtilis: role of signal peptide fusions on production of a heterologous protein[J]. Gene, 1992, 116(1): 27–33.
    [108] Tsukagoshi N, Iritani S, Sasaki T, et al. Efficient synthesis and secretion of a thermophilic alpha-amylase by protein-producing Bacillus brevis 47 carrying the Bacillus stearothermophilus amylase gene[J]. J Bacteriol, 1985, 164(3): 1182–1187.
    [109] Palva I, Sarvas M, Lehtovaara P, et al. Secretion of Escherichia coli beta-lactamase from Bacillus subtilis by the aid of alpha-amylase signal sequence[J]. Proc Natl Acad Sci USA, 1982, 79(18): 5582–5586.
    [110] Palva I, Sarvas M, Lehtovaara P, et al. Secretion of Escherichia coli beta-lactamase from Bacillus subtilis by the aid of alpha-amylase signal sequence[J]. Biotechnology, 1992, 24: 344–348.
    [111] Sloma A, Pawlyk D, Pero J. Development of an expression and secretion system in Bacillus subtilis utilizing sacQ: genetics and biotechnolog of Bacilli, vol. 2 (edited by Ganesan AT, Hoch JA). San Diego: Academic Press, 1988: 23–26.
    [112] Vasantha N, Thompson L D. Fusion of pro region of subtilisin to staphylococcal protein A and its secretion by Bacillus subtilis[J]. Gene, 1986, 49(1): 23–28.
    [113] Fahnestock S R, Fisher K E. Expression of the staphylococcal protein A gene in Bacillus subtilis by gene fusions utilizing the promoter from a Bacillus amyloliquefaciens alpha-amylase gene[J]. J Bacteriol, 1986, 165(3): 796–804.
    [114] Payne M S, Jackson E N. Use of alkaline phosphatase fusions to study protein secretion in Bacillus subtilis[J]. J Bacteriol, 1991, 173(7): 2278–2282.
    [115] Nakayama A, Shimada H, Furutani Y, et al. Processing of the prepropeptide portions of the Bacillus amyloliquefaciens neutral protease fused to Bacillus subtilis alpha-amylase and human growth hormone during secretion in Bacillus subtilis[J]. J Bacteriol, 1992, 23(1): 55–69.
    [116] Franchi E, Maisano F, Testori S A, et al. A new human growth hormone production process using a recombinant Bacillus subtilis strain[J]. J Bacteriol, 1991, 18(1/2): 41–54.
    [117] Edelman A, Joliff G, Klier A., et al. A system for the inducible secretion of proteins from Bacillus subtilis during logarithmic growth[J]. FEMS Microbiol Lett, 1988, 52(1/2): 117–120.
    [118] Petit M A, Joliff G, Mesas J M, et al. Hypersecretion of a cellulase from Clostridium thermocellum in Bacillus subtilis by induction of chromosomal DNA amplification[J]. Biotechnology, 1990, 8(6): 559–563.
    [119] Zhang M, Zhao C, Du L X, et al. Expression, purification, and characterization of a thermophilic neutral protease from Bacillus stearothermophilus in Bacillus subtilis[J]. Science in China Series C-Life Sciences, 2008, 51(1): 52–59.
    [120] Simonen M, Tarkkaa E, Puohiniemia R, et al. Incompatibility of outer membrane proteins OmpA and OmpF of Escherichia coli with secretion in Bacillus subtilis: fusions with secretable peptides[J]. FEMS Microbiol Lett, 1992, 79(1/3): 233–241.
    [121] Imanaka T, Tanaka T, Tsunekawa H, et al. Cloning of the genes for penicillinase, penP and penI, of Bacillus licheniformis in some vector plasmids and their expression in Escherichia coli, Bacillus subtilis, and Bacillus licheniformis[J]. J Bacteriol, 1981, 147(3): 776–86.
    [122] Soutschek-Bauer E, Staudenbauer W L. Synthesis and secretion of a heat-stable carboxymethylcellulose from Clostridium thermocellum in Bacillus subtilis and Bacillus stearothermophilus[J]. Mol Gen Genet, 1987, 208(3): 537–541.
    [123] Vasantha N, Filpula D. Expression of bovine pancreatic ribonuclease A coded by a synthetic gene in Bacillus subtilis[J]. Gene, 1989, 76(1): 53–60.
    [124] Wang L F, Wong S L, Lee SG, et al. Expression and secretion of human atrial natriuretic alpha-factor in Bacillus subtilis using the subtilisin signal peptide[J]. Gene, 1988, 69(1): 39–47.
    [125] Takagi M, Imanaka T. Role of the pre-pro-region of neutral protease in secretion in Bacillus subtilis[J]. J Ferment Bioeng, 1989, 67(2): 71–76.
    [126] Palva I. Construction of a Bacillus secretion vector. University of Helsinki PhD thesis, 1983.
    [127] Yoshimura K, Toibana A, Kikuchi K, et al. Differences between Saccharomyces cerevisiae and Bacillus subtilis in secretion of human lysozyme[J]. Biochem Biophys Res Commun, 1987, 145(2): 712–718.
    [128] Ganesan A T, Hoch J A. Bacillus molecular genetics and biotechnology applications[M]. San Diego: Academic Press, 1986: 479–491.
    [129] Dion M, Rapoport G, Doly J. Expression of the MuIFN alpha 7gene in Bacillus subtilis using the levansucrase system[J]. Biochimie, 1989, 71(6): 747–75
    [130] Zhang M, Fang WW, Zhang JH, et al. MSAID: multiple sequence alignment based on a measure of information discrepancy[J]. Comput Biol Chem, 2005, 29(2): 175–181.
    [131] Tjalsma H, Bolhuis A, Jongbloed JDH, et al. Signal peptide-dependent protein transport in Bacillus subtilis: a genome-based survey of the secretome[J]. Microbiol Mol Biol Rev, 2000, 64(3): 515–547.
    [132]边肇祺,张学工.模式识别[M].第2版.北京:清华大学出版社, 2000: 185–198.
    [133] Nijland R, Heerlien R, Hamoen L W, et al. Changing a single amino acid in Clostridium perfringensβ-toxin affects the efficiency of heterologous secretion by Bacillus subtilis[J]. Applied andEnvironmental Microbiology, 73(5): 1586-1593, 2007.
    [134] Henikoff S, Henikoff J G. Amino acid substitution matrices from protein block[J].Proceedings of the National Academy of Science of the United States of America, 1992, 89(22): 10915-10919.
    [135] Duda R O, Hart P E. Pattern Classifcation and Scene Analysis[M], Wiley, Toronto, 1973.
    [136] David C. K-means Iterative Fisher (KIF) unsupervised clustering algorithm applied image texture segmentation[J]. Pattern Recognition, 2002, 35(9) : 1959-1972.
    [137]曹苏群,王士同,陈晓峰等.基于模糊Fisher准则的半模糊聚类算法[J].电子与信息学报, 2008, 30(9): 2162-2165.
    [138]支晓斌,范九伦.基于模糊Fisher准则的自适应降维模糊聚类算法[J].电子与信息学报, 2009, 31(11): 2653-2658.
    [139] WU X J, Kittler J, YANG J Y, et al. A New Direct LDA (D-LDA) Algorithm for Feature Extraction in Face Recognition[A]. Proceedings of the 17th International Conference on Pattern Recognition [C], Washington, DC: IEEE Computer Society, 2004, 4: 545-548.
    [140]张宏怡,张军英,赵峰.基于核空间中最优变换和聚类中心的鉴别特征提取[J].计算机研究与发展, 2008, 45(12): 2138-2144.
    [141] Schallmey M, Singh A, Ward O P. Developments in the use of Bacillus species for industrial production[J]. Can J Microbiol, 2004, 50(1): 1-17.
    [142] Nielsen H, Engelbrecht J, Brunak S, et al. A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites[J]. Int. J. Neural Syst. 1997, 8: 581–599.
    [143] Nielsen H, Engelbrecht J, Heijne G. V, Brunak S. Defining a similarity threshold for a functional protein sequence pattern: The signal peptide cleavage site [J]. Proteins, 1996, 24(2), 165-177.
    [144] Chou K C. Prediction of protein signal sequences and their cleavage sites Proteins[J]. Proteins, 2001, 42(1):136-139.
    [145] Wang M, Yang J, Chou K C. Using string kernel to predict signal peptide cleavage site based on subsite coupling model[J]. AMINO ACIDS, 2005, 28(4): 395-402.
    [146] Chou K C. Prediction of Protein Signal Sequences[J]. Curr Protein Pept Sci, 2002, 3(6), 615-622.
    [147] Chou K C. Prediction of protein cellular attributes using pseudo amino acid composition[J]. Proteins: Struct, Funct, Genet, 2001, 43(3): 246-255.
    [148] Mandell AJ, Selz KA, Shlesinger MF. Wavelet transformation of protein hydrophobicity sequences suggests their membership of structural families[J]. Physica A, 1997, 244: 254-62.
    [149]邱建丁,梁汝萍,邹小勇,莫金垣.应用连续小波变换预测蛋白质的二级结构[J].化学学报, 2003, 61(5): 748-754.
    [150] Mathews B W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys[J]. Acta, 1975, 405: 442-451.
    [151] Gardy J L, Spencer C, Wang K, et al. PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria[J]. Nucl. Acids Res. 2003, 31: 3613-3617.
    [152] Hua S, Sun Z. Support vector machine approach for protein subcellular localization prediction[J]. Bioinformatics, 2001, 7: 721-728.
    [153] Hathaway R J, Bezdek J C. Optimization of clustering criteria by reformulation[J]. IEEE Transactions on Fuzzy Systems, 1995, 3(2): 241-245.
    [154] Kim D W, Lee H K, Lee D. On cluster validity index for estimation of optimal number of fuzzy clusters[J]. Pattern Recognition, 2004, 37(10): 2009-2025.
    [155] Bezdek J C, Pal N R. Some new index of cluster validity[J]. IEEE Transactions on Systems, Man and Cybernetics-Part B: Cybernetics, 28(3): 301-315.
    [156] Pakhira M, Bandyopadhyay S. Maulik, U. Validity index for crisp and fuzzy clusters[J]. Pattern Recognition, 2004, 37(3): 487-501.
    [157] Wang Y N, Li C S. Zuo Y. A selection model for optimal fuzzy clustering algorithm and number of clusters based on competitive comprehensive fuzzy evaluation[J]. IEEE Transactions on Fuzzy Systems, 2009, 17(3): 568-577.
    [158] Porter R, Canagarajah N. A robust automatic clustering scheme for image segmentation using wavelets[J]. IEEE Transactions on Image Processing, 1996, 5(4): 662-665.
    [159] Yin Z H, Tang Y G, Sun F C, et al. Fuzzy clustering with novel separable criterion[J]. Tsinghua science and technology, 2006, 11(1): 50-53.
    [160] Grira N, Crucianu M, Boujemaa N. Active semi-supervised fuzzy clustering[J]. Pattern Recognition, 2008, 41(5): 1834-1844.
    [161] Dai W, Yang Q, Xue G R, et al. Boosting for Transfer Learning[C]. Proceedings of the Twenty-Fourth International Conference on Machine Learning, 2007, 193-200.
    [162] Cristianini N, Shawe-taylor J, Elissee A, et al. On kernel-target alignment[M]. Advances in Neural Information Processing Systems 14. MIT Press, 2002, 367-373.
    [163] Abbasnejad M E, Ramachandram D, Mandava R. Optimizing Kernel Functions Using Transfer Learning from Unlabeled Data[C], Second International Conference on Machine Vision, 2009, 111-117.
    [164] Basu S, Banerjee A, Mooney RJ. Active semi-supervision for pairwise constrained clustering[C]. Proceedings of the Fourth SIAM International Conference on Data Mining. Florida, USA: SIAM, 2004: 333-344.
    [165] Lu Z, Carreira-Perpi?án Má. Constrained Spectral Clustering through Affinity Propagation[C]. IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2008: 1-8.
    [166] Yan R, Zhang J, Yang J, et al. A Discriminative Learning Framework with Pairwise Constraints for Video Object Classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(4): 578–593.
    [167] Pedrycz W, Amato A, Lecce V D, et al. Fuzzy Clustering With Partial Supervision in Organization and Classification of Digital Images[J]. IEEE Transactions on Fuzzy Systems, 2008, 16(4): 1008-1026.
    [168]孙广玲,唐降龙.基于分层高斯混合模型的半监督学习算法[J].计算机研究与发展, 2004, 41(1): 156-161.
    [169] Wagstaff K, Cardie C. Clustering with Instance-level Constraints[C]. Proceedings of the Seventeenth International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, 2000: 1103-1110.
    [170] Basu S, Banerjee A, Mooney R J. Semi-supervised clustering by seeding[C]. In: Proceedings of the 19th International Conference on Machine Learning (ICML-2002), 2002, 19-26.
    [171] Demiriz A, Bennett K, Embrechts M. Semi-supervised clustering using genetic algorithms[M]. Cihan Dagli (Eds.), Intelligent Engineering Systems Through Artificial Neural Networks, New York: ASME Press, 1999, 9: 809-814.
    [172] Frigui H, Krishnapuram R. Clustering by competitive agglomeration[J]. Pattern Recognition, 1997, 30(7): 1109–1119.
    [173] Pedrycz W, Waletzky J. Fuzzy Clustering with Partial Supervision[J]. IEEE Transactions on Systems, Man, and Cybernetics- Part B: Cybernetics, 1997, 27(5): 787-795.
    [174] Lange T, Law MHC, Jain AK, et al. Learning With Constrained and Unlabelled Data[C]. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC, USA: IEEE Computer Society, 2005, 1: 731-738.
    [175] Wagstaff K, Cardie C, Rogers S, et al. Constrained K-means Clustering with Background Knowledge[C]. In: Proceedings of the Eighteenth International Conference on Machine Learning, 2001, 577-584.
    [176] Davies D L, Bouldin D W. A cluster separation measure[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1979, PAMI-1(2) : 224-227.
    [177] Bensaid A M, Hall L O, Bezdek J C, et al. Partially supervised clustering for image segmentation[J]. Pattern Recognition, 1996, 29(5): 859-871.
    [178] Wu X J, Kittler J, Yang J Y, et al. An analytical algorithm for determining the generalized optimal set of discriminant vectors[J]. Pattern Recognition 2004, 37(9): 1949-1952.
    [179] Castleman K R. Digital Image Processing[M]. NJ: Prentice Hall, 2000.
    [180] Zhang X R, Jiao L C, Liu F, et al. Spectral Clustering Ensemble Applied to SAR Image Segmentation[J], IEEE Transactions on Geoscience and Remote Sensing, 2008, 46(7): 2126-2136.
    [181] Chen Y X, Wang J Z. A region-based fuzzy feature matching approach to content-based image retrieval[J], IEEE Transaction on Pattern Analysis and Machine Intelligence, 2002, 24(9):1252-1267.
    [182] Unnikrishnan R, Pantofaru C, Hebert M. Toward objective evaluation of image segmentation algorithms[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(6): 929-944.
    [183] Meil? M. Comparing clusterings: an axiomatic view[C]. Proceedings of the 22nd international conference on Machine learning, 2005.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700