机器学习及其神经网络分类器优化设计

英文题名：Machine Learning and Optimization Design of Neural Network Classifier
作者：胡静
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：机器学习 ; 分类器结构优化设计 ; 样本优化选择准则 ; 流形学习 ; 主动学习
英文关键词：Machine learning ; Optimazation design of classifier's structure ; Active selection rule of classifier's sample ; Manifold learning ; Active learning
学位年度：2007
导师：高隽
学科代码：081203
学位授予单位：合肥工业大学
论文提交日期：2007-05-01
答辩委员会主席：王煦法

摘要

机器学习以知识的自动获取和产生为研究目标，是人工智能领域研究的热点问题之一。分类器的优化设计一直是机器学习、模式识别和数据挖掘等领域研究的核心问题，它在图像识别、语音理解、自然语言处理、医疗诊断及Web页面的分类等领域具有广泛的应用。如何提高分类器对环境的适应能力，是分类器优化设计的关键问题。应用机器学习方法实现分类器的优化设计，是机器学习和人工智能界的一个重要研究课题。
     本文以机器学习及其在神经网络分类器设计中的应用为研究内容，重点研究了机器学习在求解神经网络分类器设计中的网络结构优化和样本选择优化问题的方法。本文的主要研究工作如下：
     1)从分类器设计的角度出发，讨论了机器学习的最新发展方向及面临的具体问题，并对近年来出现的一些新机器学习方法进行了分析和研究。
     2)研究了流形学习的分类器网络结构优化设计问题，针对利用神经网络对同一对象的非线性结构样本集进行分类和识别时，如何合理地设计网络结构的问题，提出了一个新颖的基于低维参数空间估计的神经网络结构设计的方法。该方法以流形学习为基础，结合Sammon系数有效估计出低维参数空间大小，并将此对应到神经网络结构分组设计的隐节点分组数目上，从而设计出具有一定泛化能力的网络结构。
     3)研究了主动学习的分类器样本优化选择准则：针对模糊神经网络分类器设计过程中所遇到的样本采样与标记过程耗时长、代价大的问题，提出了一个新颖的模糊神经网络样本主动选择准则，以最小—最大边界法以及确定样本的不确定性阈值两个新概念来定义样本的选择标准，确保选择其中信息量尽可能大的样本进行标记，使得网络设计过程中对未标记样本的标记工作量和时间大为减少。
     4)未标记样本的模糊神经网络分类器设计：针对已标记和未标记样本的混合分类问题。提出了一种基于非刻度—多维度收缩的、新的排序—模糊神经网络分类器模型。该模型首先利用非刻度—多维度收缩法对输入的所有样本进行了排序，然后获得样本间的相似性测度值，并利用该相似性测度值指导随后的分类器超盒扩张与压缩过程，从而使得该模型不仅提高了对未标记样本进行有效分类的性能，而且无论是在网络结构方面，还是在训练时间方面都有所改进。
Machine Learning makes a target of automatic retrieval and produce of knowledge. It has become one of the key areas in artificial intelligence and machine learning. The optimization design of classifier also is nuclear question in field of Machine Learning, Pattern Recognition and Data Mining, it has wide application in image recognition, speech understanding, medical treatment diagnosis and classification of web page. To improve the adaptation to the environment of classifier is the key problem for optimization design of classifier. Making use of machine learning methods to realize optimization design of classifier becomes an important research topic in machine learning and artificial intelligence.
     The research topic of this paper is machine learning and its applications in design neural network classifier. It has been focused on methods of optimization design of neural network classifier's structure and samples selection. The research topic of this paper has 4 parts as follow:
     1) From the point of design classifier view, the questions of the develop way and problem of machine learning have been discussed, some new methods appeared in recent years have been analyzed and researched.
     2) Research of optimization design of classifier's structure. Based on manifold learning. A novel approach of designing of neural networks based on parameter space in the low-dimension manifold was proposed to solve the problems about neural networks design rationally, which is used in recognition and classification of congener samples with non-linear configuration. This method based on manifold learning combines Sammon stress in order to estimate the value of parameter space in low-dimension, furthermore this value corresponds with the number of hidden in neural networks.
     3) Research of optimization selection of classifier's samples based on active learning. A novel approach of active learning based on fuzzy neural network classifier was proposed to solve the problems of surprisingly time consuming and costly in sample collection and annotation. Two new concepts of Min-Max Margin Based Approach and Uncertainty threshold on samples were introduced as a rule of active sample selecting to guarantee the most informative samples annotated. Therefore, the annotation and time cost were greatly reduced.
     4) Design of fuzzy neural network classifier of unlabeled sample. A novel kind of Ordination-Fuzzy min-max neural network (OFMM) based on non-metric multidimensional scaling (MDS) was proposed to solve the classification problems of unlabeled input pattern. Firstly, all the input patterns were sorted by MDS to get their similarity measures. Then these measures were used to supervise the following expansion and contraction stage of hyperboxes for classification. OFMM had improvements both in the validity of unlabelled patterns classification, and the network structure and training time.

引文

[1] Christopher Bishop, Pattern Recognition and Neural Networks, IEEE Transaction on neural networks vol.8, No.3 MAY 1997: 815-819.
    [2] Anupam Joshi, Narendran Ramakrishman, John R. Rice, On Neurobiological, Neuro-Fuzzy, Machine Learning, and Statistcal Pattern Recognition Techniques. IEEE Transaction on Neural Networks, vol.8, No.1, JANUARY 1997: 18-31.
    [3] Rissanen J. M, Modeling by shortest data description [J], Automation, 1978, 14: 465-471.
    [4] Carbonell J, Introduction: Paradigms for machine learning[J], AI Magazine, 1989, 40(1): 1-9.
    [5] Dietterich T, Machine learning research: four current directions[J], AI Magazine, 1997, 18(4): 97-136.
    [6] Tom M．Mitchell，Machine Learning[M]，曾华军等译，北京，机械工业出版社，2003．
    [7] 王珏，周志华，周傲英，机器学习及其应用[M]，北京，清华大学出版社，2006．
    [8] Tenenbaum JB, de Silva V, Langford JC. A global framework for nonlinear dimensionality reduction. Science, 2000, 290(12): 2319-2323.
    [9] S.Tong, D.Koller, Support Vector Machine Active Learning with Applications to text Classifications to Text Classification[J]. Machine Learning Research, 2001, vol 2, pp: 45-66.
    [10] Bchzad M, Shahshahani, David A, The Effect of Unlabeled Samples in Reducing the Small Smaple Size Problem and Mitigating the Hughes Phenomenon, IEEE Transactions on geoscience and remote sensing, vol. 32, No. 5 SEPT, 1994: 1087-1095.
    [11] Sulion R, Barlo A, Reinforcement learning: an introduction, M.I.T Press, 1998.
    [12] Waibel A, Hanazawa T, Phoneme recognition using time-delay neural networks, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 37, No. 3, 1989: 328-339.
    [13] Tesauro G, Temporal difference learning and TD-gammon, Communications of the ACM, vol. 38, No. 3, 1995: 58-68.
    [14] Pomerleau D. A., An autonomous land vehicle in neural networks (Technical Report CMU-CS-89-107). Pittsburgh, PA: Carnegie Mellon University.
    [15] Cooper G, et al (1997), An evaluation of machine-learning from methods for predicting pneumonia mortality, AI in Medicine, (to appear).
    [16] Fayyad,U.M., Smyth,P., Weir, N., Automated analysis and exploration of image databases: Results, Progress, and Challenges. Journal of Intelligent Information Systems, 1995, 4, 1-19.
    [1] Hossam Osman, Moustafa M.Fahmy, Neural Classifiers and Statistical Pattern Recognition: Applications for Currently Established Links, IEEE Transactions on system, man, cybernetics—part B: cybernetics, vol.27,No. 3, June,1987: 488-496.
    [2] Wilson Kindlein Junior, Methodology for product design based on the study of bionics, www.elsevier.com/locate/matdes.
    [3] Sushmita Mitra, Data Mining in soft computing framework: A survey, IEEE Transaction on neural networks vol. 13, No.1, 2002: 3-10.
    [4] Marques J．P，模式识别----原理、方法及应用，吴逸飞译，北京，清华大学出版社，2002．
    [5] Guoqiang Peter Zhang, Neural Networks for Classification: A Survey, IEEE Transaction on system, man and cybernetics—Part C: Application and Reviews, vol.30, No.4, Nov, 2000, 451-460.
    [6] Vittorio Murion, Structured Neural Networks for Pattern Recognition, IEEE Transaction on system, man and cybernetics—Part B: Cybernetics, vol.28, No.4, August, 1998, 553—560.
    [7] 高隽，人工神经网络原理及仿真实例[M]，北京，机械工业出版社，2003．
    [8] Rajesh Parekh, Constructive Neural Networks Learning Algorithms for Pattern Classification, IEEE Transaction on Neural Networks, vol. 11, No. 2, March 2000: 436-446.
    [9] Cesare Alippi, Selecting Accurate, Robust, and Minimal Feed-forward Nueral Networks, IEEE Transaction on Neural Networks, vol. 49, No. 12, March 2002: 1799-1809.
    [10] Mirchandani G, Cao W, On hidden neurons for neural networks, IEEE Tr Cirs syst, vol.36, 1998:661-664.
    [11] Sergios Theodoridis，模式识别[M]，李晶皎译，北京，电子工业出版社，2004．
    [12] C.E.Shannon(1948) The Mathematical Theory of Communication, BSTJ, vol.27 379-632
    [13] N．维纳著，郝季仁译，控制论，北京，科学出版社，1992，232-310．
    [14] Thorndike E.L, Education Psychology: Brifer Course, 1914.
    [15] McClelland J, L. Putting, Knowledge in its place: A scheme for programming parallel processing structure on the fly cognitive science. 1985(9),: 113-145.
    [16] Lee.J, On method for improing performance of PI-type fuzzy logical controller, IEEE Trans,Syst, Man,Cybren, vol.21(5),pp:952-960,1991.
    [17] Carbonell J, Introduction: Paradigms for machine learning[J], Artificial Intelligence,1989,40(1): 1-9.
    [18] Dietterich T, Machine Learning research: Four current directions(final draft)[J], AI Magazine, 1997,18(4): 97-136.
    [19] Quinlan J, Introduction of decision trees[J], Machine Learning, 1986, 1(1): 81-106
    [20] Quinlan J, Improved use of continuous attributes in C4,5[J], Journal of AI Research, 1996, 4: 77-90.
    [21] Vladimir N．Vapnik．统计学习理论的本质[M]．张学工译，北京：清华大学出版社，2000．
    [22] Rosenblatt M. The Perceptron: A Perceiving and recognizing automaton, Project PARA, Cornell Aeronant Lab. Rept. 85-460-1, 1957.
    [23] Rumelhart D E, McClelland J L, Parallel Distributed Processing. Cambridge, MA: The MIT Press, 1986.
    [24] Widrow B, Hoff M. IRE Wescon convension record, part 4. Institute of radio Eng, New York, 1960, 96—104.
    [25] Schapire R. The Strength of weak learn-ability. Machine Learning, 1990, 5(2):197-227.
    [26] Han J, Kamber M, Data mining: Concepts and techniques[M], San Mateo Morgan Kaufmann Publishs, 2000.
    [27] 郭荫，王珏，数据挖掘与数据库知识发现综述[J]，模式识别与人工智能，1998，11(3)：292-299．
    [28] Duda R, Hart P, Pattern classification and scene analysis[M], Hoboken.NJ: John Wiky & Sons, 1973.
    [29] Waibel A, Hanazawa T, Phoneme recognition using time-delay neural networks, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 37, No. 3, 1989: 328-339.
    [30] Carbonell J. Introdiction: Paradigms for Machine Learning. AI, 1989, 40(1):1-5.
    [31] Dietterich T. Machine Learning Research: Four Current Directions. Final Draft, 1999.
    [32] Hastie T, Robert T, Jerome F, The element of statistical learning: data mining, inference, prediction, New York: Springer, 2001.
    [33] Lu hua-minn, Yesbaiabu F, Robert H N, Image Mainfolds,: Application of Artificial Neural Networks in Image Processing Ⅲ, Proceeding of SPIE, Bellingbam, Wasbinglon: SPIE, 1998, 3307: 52-63.
    [34] Seung H S, Dannrel D L, The manifold ways of perception science, 2000, 290: 2268-2269.
    [35] Hyvannen A, Survey on independent component analysis neural computing survey on independent component analysis, Neural computing surveys, 1999, 2: 94-128.
    [36] Turk M, Penlland A, Eigen-faces for recognition. Journal of cognitive neuroscience, 1997, 3(1): 71-86.
    [37] 何力，张军平，周志华，基于放大因子和延伸方向研究流形学习算法[J]，计算机学报，2005，28(12)：2000-2009．
    [38] MInsky M, Papert S. Percption (Expanded edition 1988). Cambridge MA: The MIT Press, 1986.
    [39] Sliva V D, Tenenbaum J B, Global versus local methods in nonlinear dimensionality reduction. Neural Information Processing System (NIPS'2002),2003,15:705-712.
    [40] Tenenbaum JB, de Silva V, Langford JC. A global framework for nonlinear dimensionality reduction. Science, 2000, 290(12): 2319-2323.
    [41] Olga kouropteva,oleg okun. Selection of the optimal parameter value for the locally linear embedding algorithm. http://citeseer.ist.psu.edu/635753.html.
    [42] Donoho D L, Grimes C, When does ISOMAP recover the natural parameterization of families of articulated images? Technical Report, 2002, 27, Department of Statistics, Stanford University, 2002.
    [43] Gonzalez R C, Woods R E, Digital images processing, second education, Beijing: publishing house of electronics industry, 2003.
    [44] 谭立球，谷士文，费跃平，个人化电子邮件自动过滤系统的设计[J]，计算机应用，2002，22(6)：23-35．
    [45] Pitoyo, Hartono, Shuji: Active Learning of Neural Network, Proceeding of International Joint Conference on Neural Networks, 1993(1), 2548-2551.
    [46] Tirthankar, RayChaudhuri, G C Hamey: Minimisation of data collection by active learning, School of MPCE, Macquarie University, New South Wales 2109, Australia, 134-145.
    [47] Y Freund, H S Seung, E Shamir, Information prediction and query by committee In Advances in Neural Information Processing Systems, volume 5 Morgan Kaufmann, San Marco, California, 1993.
    [48] David A, Neural network exploration using optimal experiment design Technical Report AIM-1491, Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Junc 1994.
    [49] Broadley C. E, Utgoff P.E, Multivariate decision trees[J] Machine Learning, 1995, (19): 45-77.
    [50] Platt J C, Large margin DAGs for multi-classification[J], advances in Neural Information Processing Systems, 2000, 547-553.
    [51] Chapelle O, Haffner P, Vapnik V. Support vector machine for histogram-based image classification[J], IEEE Trans. Neural Networks, 1999, (10): 1055-1064.
    [52] Aldenderfer M. S, Blashfieid P K, Cluster analysis[M], Sage Publications, Beverly Hills., USA, 1984.
    [53] Li xiaoli, liu Jiming, Shi zhongzhi, A Chinese web page classifier based on support vector machine and unsupervised clustering[J], Chinese Journal of Computers, 2001, 24(1): 62-68.
    [54] Bennett K P, Demiriz A, Semri-supervised support vector machine, In: DC M Kearns, S Solla, eds. Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 1998. 368-374.
    [55] Nigam K, McCallum A, et al. Text classification from labeled and unlabeled documents using EM. Machine Learning, 1999, 39: 103-134.
    [56] Ying W, Thomas H, Self-supervised learning for object recognition based on kernel discriminant EM algorithm. The IEEE Int'l Conf on Computer Vision, Vancouver, 2001.
    [57] Shahshahani B M, Landgrebe D A. The Effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon[J]. IEEE Trans on Geoscience and Remote Sensing, 1994, 32(5),: 1087-1095.
    [58] Anguita D, Ridella S, Rivieccio F et al. Unsupervised clustering and the capacity of support vector machines[J]. IEEE Trans on Geoscience and Remote Sensing, 1999, 22(5),: 807-815.
    [59] Kothari R, Jain V. Learning from labeled and unlabeled data using minimal number of queries[J]. IEEE Trans on Neural Networks, 2003, 14(6): 1096--1105.
    [60] Fardanessh M T, Ersoy O K. Classification accuracy improvement of neural classifiers by using unlabeled data[J]. IEEE Trans on Geoscience and Remote Sensing 1998, 36(3): 1020—1025.
    [61] Kristin P, Bennett A, et al. Exploiting unlabeled data in ensemble methods. The 8th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining. Edmonton, 2002.
    [62] Joachims T Transductive inference for text classification using support vector machines. In Proc 16th Annual Conf Computational learning Theory, Madison, WI, 1998, 92-100.
    [63] Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. In: Proc 11th Int'l Conf Machine learning, Bled, Slovenia, 1992, 200-209
    [64] Dempster A P Laired N M, Rubin D B. Maximum likelihood form incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 1977, B(39): 1-38.
    [1] V.N.Vapnik, Estimation of Dependences Based on Empirical Data[M], New York: Springer Verlag, 1982.
    [2] A Ehrenfeucht, et al, A General lower bound on the number of examples needed for learning[C], Pro. of the Workshop on Computational learning Theory, Morgan Kaufmann, 1988.
    [3] Blumer A, et al, Learnability and Vapnik-Chervonenkis Dimension[J], ACM, 1989, 36(4): 929-965.
    [4] Baum E B, Haussler D, What size net gives valid generalization[J], Neural Computation, 1989, 1: 151-160.
    [5] 张鸿宾．训练多层网络的样本数问题[J]，自动化学报，1993，19(1)：71-77．
    [6] 阎平凡．人工神经网络的容量，学习与计算复杂性[J]，电子学报，1995，23(4)：63-67．
    [7] 魏海坤，徐嗣鑫，宋文忠，神经网络泛化理论和泛化方法[J]，自动化学报，2001，vol(27)，No．6，806-815．
    [8] Hush D, Salas J. Improving the learning rate of back-proagation[C]. In Proc. of the IEEE ICNN, 1988: 441-447.
    [9] Ponnapalli.P.V.S, el al. A formal selection and pruning algorithm for feedforward artificial neural network optimization[J]. IEEE Trans. On Neural Network, 1999, 10(4): 954—968.
    [10] Cun Y Le, Denker J S, Solla S A, Optimal Brain Damage[C], In Advance in Neural Information Processing System 2, San Mateo, CA: Morgan Kaufmann, 1990, 598-605.
    [11] Hassibi B, et al, Optimal Brain Surgeon and General Network Pruning[J], In IEEE Int. Conf Neural Networks 1993, 1, 293—299.
    [12] Bartlett Eric B, Dynamic Node Architecture Learning: an information theortic approach[J], Neural Networks, 1994, 7(1): 129-140.
    [13] Hwang Jeng-Neng, S.S. You, S.R. Lay, et al. The cascade-correlation learning: A projection pursuit perspective. IEEE Trans. NN, 1996, 7(2): 278-288.
    [14] 孙功星，朱科军，戴长江，等．任务自适应神经网络结构研究[J]．核电子学与探测技术，1999，19(3)：164-168．
    [15] Vladimir N．Vapnik．统计学习理论的本质[M]．张学工译，北京：清华大学出版社，2000，203页．
    [16] J.Ding, M.Shimamura. Neural network structures for expression recognition. IJCNN'93-Naqoya proceedings of 1993 International Conference on Neural Networks. 1993, Oct, Vol(2): 1430-1433.
    [17] D.Young, L.M.Cheng. Autonomous hidden node determination using dynamic expansion and contraction approach. Proceeding ISSIPNN'94 International Symposium on Speech, Image processing and Neural Networks, 1994, Apr, vol(2): 421-424.
    [18] Gori M, Tesi A. On the Problem of Local Minima in Backpropagation[J]. IEEE Trans. On Pattern Analysis and Machine Intelligence, 1992, 14(1): 76-86.
    [19] Bianchini M, Maggini M. On the Problem of Local Minima in Recurrent Neural Networks[J]. IEEE Trans. On Neural Networks, 1994, 5(2): 167-172.
    [20] Bianchini M, Frasconi P. Learning without Local Minima in Radial Basis Function Networks[J]. IEEE Trans. On Neural Networks, 1995, 1.6(3): 740-755.
    [21] P. Baldi, K. Hornik, Neural Networks and Principal Component Analysis: Learning from examples and local minima, Neural Networks, Vol.2, 1989:.53-58
    [22] Z.Uykan, T.Gauzalis, M.Ertugrul. Analysis of input-output clustering for determining centers of RBFN[J] IEEE Trans. On Neural Network, 2000, 11(4): 851-858.
    [23] W.Pedryez. Fuzzy clustering in the design of radial basis function neural networks[J]. IEEE Trans on Neural Networks, 1989, 9(4): 601-612.
    [24] 周琛，马峰。基于模糊聚类与最小二乘法的径向基网络设计[J]，系统仿真学报，2002，14(7)：859-862．
    [25] J.Gonzalez, J.Rojas, H.Pomares et al. A new clustering technique for function approximation[J]. IEEE Trans on Neural Networks, 2002, 13(1): 132-142.
    [26] N.Kambhatla, T.K.Leen. Fast non-liner dimension reduction, Advances in Neural Information Processing Systems, NIPS6, 1994, pp: 152-159.
    [27] Zhang CS, Wang J, Zhao NY. Reconstruction and analysis of multi-pose face images based on nonlinear dimensionality reduction. Pattern Recognition. 2004, 37(1): 325-336.
    [28] Lee MD. Determining the dimensionality of multidimensionality scaling modeling. Journal of Mathematical Psychology, 2001, 45(4): 149-166.
    [29] Aman S, Differential Geometrical Methods in Statistics[M], New York: Springer Verlag Press, 1985.
    [30] Aman S, Information Geometrical of EM and em Algorithm for Neural Networks[J], Neural Networks,1995, 8(9): 1397-1408.
    [31] Aman S, Kurara K, Nagaoka H, Information geometrical of Boltzmann Machines[J], IEEE Trans on Neural Networks, 1992, 3(2): 260-271.
    [32] Aman S, Information geometrical of Neural Networks New Bayesian Duality Theory[A], Proc. of the internal conference on Neural Information Processing[C], Hong Kong: ICONIP, 1996. 3-7
    [33] H.Sebastian, D.Lee. The manifold ways of perception. Science, 2000, 290(12): 2268-2269.
    [34] Donoho D L,Grimes C, When does ISOMAP recover nature parameterization of families of articulated images? Technical report 2002-27, Department of statistics, Stanford University 2002.
    [35] Zha H,Zhang Z. Isometric and continuum ISOMAP, Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington, DC, August 21-24, 2003.
    [36] Camastra F, Vinciarelli A. Estimating the intrinsic dimension of the data with a fractal-based method. IEEE Trans on Pattern Analysis and Machine Intelligence, 2002, 24(10): 123-134.
    [37] Costa J, Hero A O. Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Trans on Signal Processing, 2004, 25(8):2210-2221.
    [38] Balazs K B, Intrinsic dimension estimation using packing numbers. In: Neural Information Processing Systems 15 (NIPS'2002), 2003.
    [39] Zhang Junping, He Li Zhou Zhi-hua Analysis magnification factors and principal spread directions in manifold learning. In: Processing of of the 9th Online World Conference on Soft Computing in Industrial Applications (WSC9), 2004.
    [40] He Li, Zhang Junping, Zhou Zhi-Hua. Investigating manifold learning algorithms based on magnification factors and principal spread directions. Technical Report, Intelligent Information Processing Laboratory, Fudan University, Dec, 2004.
    [41] De silva V, Tenenbaum J B. Unsupervised learning of curved manifold. In: Nonlinear Estimation and classification, New York: Springer-Verlag, 2002.
    [42] Hastie T, Robert T, Jerome F, The element of statistical learning: data mining, inference, prediction, New York: Springer, 2001.
    [43] Lu hua-minn, Yesbaiabu F, Robert H N, Image Mainfolds,: Application of Artificial Neural Networks in Image Processing Ⅲ, Proceeding of SPIE, Bellingbam, Wasbinglon: SPIE,1998,3307: 52-63.
    [44] Turk M, Penlland A, Eigen-faces for recognition. Journal of cognitive neuroscience, 1997, 3(1): 71-86.
    [45] 赵连伟，罗四维，赵艳敞等。高维流形的低维嵌入及嵌入维数研究[J]。软件学报，2005，16(8)：1423-1430．
    [46] Kohonen T, Self-Organizing maps, New York: Springer, 1995.
    [47] Duin R, Superlearning capabilities of Neural Networks? Pro. Eighth Scandinavian conf. Image Analysis, 1993, pp: 547-554.
    [48] Oja E, Data Compression Feature Extraction, and Auto-association in Feed-forward Neural Networks, Artificial Neural Networks, New York: Elsevier Sciences, 1991, vpp: 737-745.
    [49] Kambhatla N, Leen T K, Fast Non-linear dimension reduction, Advances in Neural Information Processing System, NIPS6, pp: 152-159, 1994.
    [50] Bennett R S, The intrinsic dimension of signal collections, IEEE Trans. Information Theory, 1969, vol.29, pp: 517-525.
    [51] Kruskal J B, Multidimensional scaling by optimizing goodness of fit to a non-metric hypothesis, Psychometrika vol. 29, 1964, pp: 1-27.
    [52] Pettis K, Bailey T, et al An intrinsic dimensionality estimator from near-neighbor information, IEEE Trans. Pattern Analysis and machine intelligence, vol. 1, 1979, pp: 25-37.
    [53] Fukunaga K, Olse D R, An algorithm for finding intrinsic dimensionality of data, IEEE Trans. Computers vol. 20, No. 2, 1971, pp: 176-183.
    [54] Trunk G V, Statistical estimation of the Intrinsic dimensionaiity of a noisy signal collection, IEEE Trans. Computers. Vol. 25, 1976, pp: 165-171.
    [55] 焦李成，神经网络系统理论[M]，西安：西安电子科技大学出版社，1992．
    [56] John. W Sammon, JR. A nonlinear mapping for data structure analysis. IEEE Transaction on computers vol(18), No.5, May 1969.
    [57] 高隽，胡勇，胡良梅．关于AM记忆样本选择的实验研究[J]，模式识别与人工智能，2002，15(3)：367-371．
    [58] P.K.Simpson. Fuzzy min-max neural networks—Part 1: classification[J]. IEEE Trans on Neural Networks, 1992, 3(5): 766-786.
    [1] Herber A Simon, Glenn Lea, Problem solving and rule reduction, a unified view. In: Knowledge and Cognition. Erbuam 1974 15(2): 63-73.
    [2] David D Lewis, Willian Gale A, A sequential algorithm for training text classifiers. In: Proc of the Annual Int'l ACM-SIGIR Conf on Research and Development in Information Retrival SIGIR 94. London: Springer Verlag, 1994, 3-12.
    [3] S.Tong, D.Koller, Support Vector Machine Active Learning with Applications to text Classifications to Text Classification[J], Machine Learning Research, 2001, vol 2, pp.45-66.
    [4] Vapnik VN，统计学习理论本质[M]，张学工，译．北京：清华大学出版社，2000．
    [5] M.Sassano, An Empirical Study of Active Learning with Support Vector Machines for Japanese Word Segmentation, Proc.40th Ann Meeting of the Assoc. Computational Linguistics (ACL), July 2002.
    [6] David A, Cohn, Zoubin, Active Learning with statistical models In G.Tesauro, D.Touretzky and T.K Leen, editors, Advances in Neural Information Processing System 7th Cambridge Ma 1995. MIT Press.
    [7] Pitoyo Hartono, Shuji Hashimoto, Active Learning of Neural Network Proceeding of 1993 International Joint Conference on Neural Networks, 1993, pp: 2548-2551.
    [8] Tirthankar RayChaudhuri, G C Hamey, Minimisation of data collection by active learning, School of MPCE, Macquarie University, New South Wales 2109, Australia.
    [9] Matteo Pardo, Giorgio Sberveglieri, Learning form data: A tutorial with emphasis on modern pattern recognition methods[J], In IEEE Sensors Journalune 2002, vol 2, No.3, pp:231-245.
    [10] David A, Cohn, Zoubin, Active Learning with statistical models In G.Tesauro, D.Touretzky and T.K Leen, editors, Advances in Neural Information Processing System 7th Cambridge Ma 1995. MIT Press.
    [11] Y Freund, H S Seung, E Shamir, Information prediction and query by committee In Advances in Neural Information Processing Systems, volume 5 Morgan Kaufmann, San Mateo, California, 1993.
    [12] S.Tong, D.Koller, Support Vector Machine Active Learning with Applications to text Classifications to Text Classification[J], Machine Learning Research, 2001,vol 2, pp.45-66.
    [13] Gake W A, Church K W. A program for aligning sentences in bilingual corpora. ComputationalLinguistics, 1993, 19(1): 75-102.
    [14] David D.Lewis, Calett J, "Heterogeneous Uncertainty Sampling for Supervised Learning", Proceedings of the Eleventh International Conference on Machine Learning New Brunswick, NJ, 1994, pp: 148-156.
    [15] Colin Campbell, Nello Cristianini, Alex Smola, Query Learning with Large Margin Classifiers, International Conference Machine Learning, 2000, pp: 111-118.
    [16] Gramer H, Mathematical Method of Statistics. Princeton, NJ: Princeton Unvi. Press, 1946, pp:497-506.
    [17] Akaike H, A new look at the statistical model identification, IEEE Trans, Automat, Contr, vol.19, No 6, 1974: 716-723
    [18] White H, Learning in artificial neural networks: Astatistical perpective, Neural Comput, vol 1, 1989: 425-464.
    [19] Fukumizu K, Statistical Active Learning in Multilayer Perceptrons[J], IEEE Trans on neural networks, vol.11, No. 1, 2000: 17-26.
    [20] 张健沛，徐华．支持向量机(SVM)主动学习方法研究与应用[J]，计算机应用，vol．24，No．1，2004：1-3．
    [21] Burges C, A tutorial on support vector machines for pattern recognition[J], Knowledge Discovery and Mining, 1998, 2(2): 955-974.
    [22] Schohn G, Cohn D, Less is more: Active learning with support vector machines[A]. Proceedings of the seventeenth international conference on machine learning[C], 2000, : 839-846.
    [23] Patrick K. Simpson, Fuzzy Min-Max Neural Networks--Part 1: Classification, IEEE Trans. On Neural Networks, 1992, 3(5): 776-786
    [24] Patrick K. Simpson, Fuzzy Min-Max Neural Networks--Part 2: Clustering, IEEE Trans. On Fuzzy Systems, 1993, 1(1): 32-45
    [25] 高隽，胡勇，胡良梅．关于AM记忆样本选择的实验研究[J]，模式识别与人工智能，2002，15(3)：367～371
    [26] 宫秀军，孙建平，史忠植，主动贝叶斯网络分类器[J]，计算机研究与发展，2002，39(5)pp：574—579．
    [27] 徐杰，施鹏飞，图象检索中基于标记与未标记样本的主动学习算法[J]，上海交通大学学报，2004，38(12)pp：2069-2072．
    [28] Mingkun Li, Ishwar K. Confidence-Based Active Learning, IEEE Transaction on pattern analysis and machine intelligence 2005, Vol (28) pp: 1251-1261
    [29] M.Li. Confidence-Based classifer design and its applications[D], Ph D dissertation, Oakland Univ, 2005.
    [30] M Li, Sethi I K. Confidence-Based classifer design, Pattern Recognition, 2006, 39(7), pp: 1230-1240.
    [1] Zadeh L A. Fuzzy Logic[J]. IEEE Computer, 1998, 21(4): 83—91.
    [2] Richard O Duda, Peter E Hart, David G. Stork. Pattern Classification, Second Edition[M], Beijing: China Machine Perss, 2004, 2: 192-195.
    [3] Simpson P K, Fuzzy min-max neural networks—Part 1: classification[J]. IEEE Trans on Neural Networks, 1992, 3(5): 766—786.
    [4] 陈曦、靳东明、李坚，一种多分辨组合的模糊神经网络分类器[J]，电子学报，Vol(30)，No．6，2002：928—933．
    [5] Wang L, Chan K L, Bootstrapping SVM active learning by incorporating unlabeled images for image retrieval, Proc, IEEE Int, Conf Pattern Recognition, 2003: 629-634..
    [6] Wang L, Chan K L, Incorporating prior knowledge into SVM for image retrieval, Proc IEEE Int, Conf, Pattern Recognition 2004: 981-984.
    [7] Y Wu, Q Tian, T. S Huang, Discriminant-EM algorithm with application to image retrieval, Proc, IEEE Int, Conf. Computer Vision and Pattern Recognition 2000: 222-227.
    [8] Joachimt T, Transductive inference for text classification using support vector machines Proc Int. Conf Machine learning 1999: 200-209.
    [9] Gabrys B Bargiela A, General fuzzy min-max neural networks for clustering and classification[J], IEEE Trans. Neural Networks, 2000, 11(3): 769—783.
    [10] Shepard R N. The analysis of proximities: The multidimensional scaling with an unknown distance function[J]. Psychometrika, 1962, 27(5): 125-139.
    [11] Kothari R, Jain V, Learning form labeled and unlabeled data using a minimal numbei of queries, IEEE Trans on Neural Networks Vol(14), No. 6, 2003: 1496—1505.
    [12] Nigam K, McCallum A, Thram S, et al. Learning to classify text from labeled and unlabeled documents. In Wostow, J, Madison C R, eds. Processing of the 1sth National Conference on Artificial Intelligent. Wisconsin: AAA Press, 1998, 792—799.
    [13] Weber M, Welling M, Perona P, Towards automatic discovery of object categorise. IEEE Conf on Computer Vision and Pattern Recognition, Hilton Island, 2000.
    [14] Platt J C, Cristian N, Shawe T J, Large margin DAGs for multi-class classification[J]. advance in Neural Information Processing in Systems, 2000: 547—553.
    [15] Aldenderfer M S, Biashfield P K, Cluster analysis[M], Sage Publications, Beverly Hills, USA, 1984.
    [16] Cherkassky V, Muller F, Learning from Data-concept, theory and methods[M], John Wiley & Sons, NY, USA, 1998.
    [17] 宫秀军、史忠植，基于Bayes潜在语义模型的半监督Web挖掘[J]，软件学报，vol(13)，No．8，2002：1508—1514．
    [18] Joachims T, Transductive inference for text classification using support vector machine[C], Proceeding of the 16th international conference on machine learning. San Francisco: Morgan Kaufmann 1999, 200-209.
    [19] Wang Y, Sangteng H, Training TSVM with the proper number of positive sample[J]. Pattern Recognition letters, 2005, 26: 2187—2194.
    [20] Yisong C, Guoping W, Dong Shihai, Learning with progressive transductive support vector machine[J], Pattern Recognetion Letters, 2003(12): 1845—1855.
    [21] Joachims T, Transductive learning vis spectral graph partitioning[C], International Conference on Machine Learning, Washington: [s,n] 2003, 290-297.
    [22] Zelikovitz S. Transductive LSI for short text classification problems [C]//7th International Florida Artificial Intelligence Research Symposium Conference. Miami: AAAI Press, 2004.
    [23] Kryskal J B. Nonmetric Multidimensional scaling: A numerical method[J]. Psychometrika, 1964, 29(3): 115—129.
    [24] 肖利，金远平、徐宏炳等．基于多维标度的快速挖掘关联规则算法．软件学报，1999，10(7)：750—753．(A Multidimensional Scaling Based Algorithm for Fast Mining Association Rules)
    [25] Stephen H, Levine, John G, Point Pattern Representation Using Imprecise, Incomplete, Nonmetric, Information, IEEE Trans On System Man and Cybernetic, Vol 24, No 2, 1994: 222-233
    [26] Greg Ross, Matthew Chalmers, A visual workspace for hybrid multidimensional scaling algorithms, IEEE Symposium on Information Visualization 2003, October 19-21, Seattle, Washington, USA, 2003: 91-96
    [27] 余世孝．非度量多维测度及其在群落分类中的应用[J]．植物生态学报，1995，19(2)：128—136．(The Application of Non-Metric Multidimensional Scaling in Community Classification)
    [28] Blake C, Keogh E, Merz C. UCI repository of machine learning databases[EB/OL] http://www.ics.uci.edu/～melearn/ML Repository.html. 1998,12,1.
    [29] 高隽．人工神经网络原理及仿真实例[M] 北京，机械工业出版社，2003．(The Principle of Neural Networks and its simulated experiments)．

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700