用户名: 密码: 验证码:
基于多元数据子空间坐标图表示的可视化模式识别
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
模式识别是人类以及其他一些高级动物赖以生存的基本智能之一。大多数情况下,人都有很好的模式识别能力,这种能力被视为是自然的事情,可是让机器处理同样的模式识别问题时往往会遇到更大的困难。尽管已有几十年的研究历史,直到今天还是不能很好理解人类是如何识别模式的。虽然计算机模式识别理论和方法已经获得了充分的研究和巨大的进步,但仍然存在一些根深蒂固的问题,例如著名的小样本问题、维数灾难问题、黑匣子问题等。长期以来,全自动化始终是模式识别系统的设计目标之一,识别过程中的人工参与被降低到最小。虽然在分类器设计阶段,也用到一些数据探索性分析和可视化方法,但是这些可视化方法并没有真正纳入模式识别流程,往往只是对原始数据或者结果进行简单的可视化。
     多元数据可视化作为数据分析的一种重要方法,已经在许多领域获得广泛应用。但目前对各种多元可视化技术的相互联系研究得不多,各种图表示方法缺乏统一的理论基础。要将多元数据的多元图表示方法和机器算法集成以实现可视化模式识别,尚需解决一些基础性的问题。本文研究主要围绕三个基本问题展开:如何建立几种常用多元图表示方法的统一描述模型?如何对传统多元图表示方法进行优化以使其更适合模式识别应用?如何将机器算法和图表示方法集成以实现可视化分类?
     本文首先研究了几种多元图表示方法的表示原理和特性,在此基础上给出了多元数据子空间坐标图表示的一般模型,该模型将散点图、散点图矩阵、平行列线图、平行坐标、三角多项式和雷达图等统一到同一个表示框架,从而不仅有助于研究这些图表示方法的区别与联系,还有助于研究和发展新的图表示方法。
     接着,本文定义了二维对偶坐标映射,研究了二维对偶坐标的表示特性并证明了相关定理,在此基础上提出了一种多元数据可视化新方法——多元平行对偶图。该方法在同一个视图中将多个散点图和平行坐标有机集成,同一样本的对偶坐标表示和平行坐标表示具有确定的几何关系,可以根据需要在这两种形式间切换,从而综合利用两种方法的优点而弥补其不足。本文还研究了二维对偶坐标的三维显示以及多元数据的三维对偶坐标表示,并给出了表示的示例。
     本文最后研究了多元图表示的图形特征优化问题,提出了基于凸壳的平行坐标优化、基于复线性判别分析的星座图权系数优化和Radviz快速优化方法。并且将机器学习算法和平行坐标相结合,提出了三种基于优化平行坐标的可视化分类器:可视化BP神经网络、平行筛可视化分类器和贝叶斯可视化分类器,并针对蔬菜油分类、故障诊断、疾病诊断等某些领域问题进行了实验研究。
     研究结果表明,本文提出的可视化模式识别方法具有模式可视化(使看不到的看得到)、复杂系统表示简单化和有利于专家知识的利用和生成等特点。有望进一步发展和完善该方法,并将其应用于某些领域的复杂模式识别问题。
Pattern recognition is one of the basic intelligence of human and other senior animals. Human have excellent pattern recognition capability in most cases, and this capability is considered as a nature. However, teaching machine to deal with the same pattern recognition problem is not so easy. After a long research time of several decades, up to this day the mechanism of human pattern recognition is not well grasped. Although the theories and methods of automatic pattern recognition by computers have been fully studied and great successes have been made, there are some well known open problems such as small samples problem, dimension curse, black-box problem and so on. Fully automation is still one of the design criteria of pattern recognition system, the interactions of human and machine are reduced to the least. Although in the stage of designing classifier, some techniques of exploratory data analysis and visualization are used occasionally, these methods are not combined with the pattern recognition algorithms compactly. Usually only the original data or classification results are visualized.
     As an important way of data analysis, multivariate data visualization techniques have been applied in many domains. Up to now, the relationship of these multivariate data visualization techniques has not been fully researched. A united theoretical basis of various graphical representation methods is still not found. In order to realize visual pattern recognition by integration of multivariate graphical representation methods and machine algorithms, there are some basic problems to be solved. Work of this thesis focuses on three basic problems: How to construct a describing model of several popular multivariate graphical representation methods? How to optimize these multivariate graphical representation methods for pattern recognition application? How to integrate the machine algorithms and multivariate graphical representation methods for visual classification?
     Firstly, the representation principles and characteristics of several popular multivariate graphical representation methods are investigated. And then a general graphical representation model of multivariate data subspace coordinates is presented. This model united the scatter plot, scatter plot matrix, nomogram, parallel coordinates, Andrews’plot and star glyph to the same representation framework, so as to facilitate not only researches on the differences and relationships of these methods but also the development of new graphical representation methods.
     Secondly, 2D dual coordinates is defined,the representation characteristics are studied and several theorems are proved. Consequently a new multivariate visualization method named multivariate parallel dual plot is developed. This method integrates multiple scatter plots with the parallel coordinates, moreover the dual coordinates representation and parallel coordinates representation of the same sample has determined geometrical relationship. The two representation forms can be switched according to actual needing, consequently combing the merits of both methods and overcome their shortcomings. The three dimensional display of 2D dual coordinates and 3D dual coordinates representation are also investigated and representation examples are provided.
     Lastly, the problem of graphical features optimization is studied. The optimization of parallel coordinates by convex hull, the weights optimization of constellation graph by complex linear discriminant analysis and the rapid optimization of Radviz are proposed. Some machine learning algorithms are combined with parallel coordinates, and three visual classifiers based on the optimization of parallel coordinates are proposed: the visual BP neural network, the parallel filter visual classifier and the Bayes visual classifier. Some experiments are done using data sets such as vegetable oil classification, fault diagnostics and disease diagnostics.
     This research indicate these visual pattern recognition methods have the merits of pattern visualization (making the invisible visible), making the representation of complex system simple and facilitating the utilizing and generating of expert knowledge. It is expected to develop this method further and apply it to some domains’complex pattern recognition problems.
引文
1 A.K. Jain, R.P.W. Duin, J. Mao. Statistical pattern recognition: A review. IEEE Trans. Pattern Anal. Mach. Intell., 2000, 22(1): 4-37
    2刘成林,谭铁牛.模式识别研究进展.中国计算机学会通讯, 2007, 3(12): 45-52
    3边肇祺,张学工.模式识别,第二版.清华大学出版社, 2000
    4 R.O. Duda, P.E. Hart, D.G. Stork. Pattern Classification, second edition, John Wiley & Sons, New York, 2001
    5 Fu. K. S. Syntactic pattern recognition and application. Englewood Cliffs N.J., Prentice-Hall, 1982
    6 L. Goldfarb. On the foundations of intelligent processes– I. An evolving model for pattern recognition [J]. Pattern Recognition, 1990, 23(6):595–616
    7 L. Goldfarb, Golubitsky. What is a structural measurement process? University of New Brunswick, Fredericton, Canada, 2001.
    8 R.P.W. Duin, F. Roli, and D. de Ridder. A note on core research issues for statistical pattern recognition [J]. Pattern Recognition Letters, 2002, 23(4):493–499
    9 Oleg Golubitsky. On the Formalization of the Evolving Transformation System Model, Ph.D. Thesis, 2004
    10 R.P.W. Duin, E. P, ekalska, P. Pacl′?k, et al. The dissimilarity representation, a basis for domain based pattern recognition. In L. Goldfarb, editor, Pattern representation and the future of pattern recognition [C], ICPR 2004 Workshop Proceedings, 2004: 43–56.
    11 L. Goldfarb and D. Gay. What is a structural representation? Fifth variation [R], University of New Brunswick, Fredericton, Canada, 2005.
    12张涛,洪文学,景军等.模式识别中的表示问题.燕山大学学报, 2008, 32(5): 382-388
    13 E. P, ekalska and R.P.W. Duin. The Dissimilarity Representation for Pattern Recognition. Foundations and Applications[M]. World Scientific, Singapore, 2005.
    14洪文学.基于多元图图形几何特征的模式识别新方法,燕山大学学报, 2008, 32(5): 377-381
    15洪文学,宋佳霖.基于2D图表示原理的模式识别研究开放性问题,燕山大学学报, 2008, 32(5): 468-470
    16洪文学,李昕,徐永红等.基于多元统计图表示原理的信息融合和模式识别技术,国防工业出版社, 2008
    17胡包钢,王泳等.如何增加人工神经元网络的透明度.模式识别与人工智能. 2007, Vol. 20, No.1
    18 Tyers, M., and Mann, M. From genomics to proteomics. Nat.Rev.Genet. 2003, 193-197
    19 Diamandis, E.P. Mass spectrometry as a cancer biomarker discovery tool: opportunities and potential limitations. Mol.Cell.Proteomics 3, 2004, 367-378
    20 Rose P. E. Flash of genius. Forbes. 1998, Nov, pp: 98-104
    21 Xu Rui, and Wunsch II Donald. Survey of clustering algorithms. IEEE Transactions on Neural network, 2005, 16(3): 645-678
    22 Sushmita Mitra, Sankar K. Pal. Fuzzy sets in pattern recognition and machine intelligence. Fuzzy Sets and Systems, 2005,156: 381–386
    23 Nagy G. State of the art in pattern recognition. Processing of IEEE, 1968, vol56: 836-862
    24 Kanal I. N. Pattern in pattern recognition: 1968-1974. IEEE Transactions on Information Theory, 1974, 20(6): 697-722
    25 Matteo Pardo, Giorgio Sberveglieri. Learning From Data: A Tutorial With Emphasis on Modern Pattern Recognition Methods. IEEE SENSORS JOURNAL, 2003, 2(3): 203-217
    26 Fu K. S. Syntactic pattern recognition and application. Englewood Cliffs N.J.: Prentice-Hall, 1982
    27 Guoqiang Peter Zhang. Neural Networks for Classification: A survey. IEEE Tran on Systems. Man, and Cybernetics- part c: Applications and reviews, 2000, 30(4): 451-462
    28沈清,汤霖编著.模式识别导论.国防科技大学出版社, 1991
    29 Vapnik V.N., Chervonenkis A.Ja.Theory of Pattern Recognition. [M]. Nauka, Moscow, 1974
    30 Baser B, Guyon I and Vapnik V.N. A training algorithm for optimal margin classifiers. Fifth Annual workshop on computational learning Theory. Pittsburgh: ACM, 1992: 144-152
    31 J. J. Glen, Mathematical programming models for piecewise-linear discriminant analysis models, Operational Research Society, 2005, 50: 1043–1053
    32 W. Chaovalitwongse, Y.J. Fan, R.C. Sachdeo. Support Feature Machine for Classification of Abnormal Brain Activity. SIGKDD 2007: 113-122
    33 Haken H. Synergetic computers and cognition——a top—down aproach to neural nets. Berlin Heigel—berg: Springer—Verlag, 1991.
    34 Alan Rogersa, John Keatingb, Robert Shortenc. Anovel pattern classification scheme using the Baker’s map. Neurocomputing, 2003, 55: 779– 786.
    35 Yijun He, Dezhao Chen, Weixiang Zhao. Ensemble classifier system based on ant colony algorithm and its application in chemical pattern classification. Chemometrics and Intelligent Laboratory Systems, 2006, 80: 39– 49
    36 Vlad Popovici , Jean-Philippe Thiran, Pattern recognition using higher-order local autocorrelation coefficients, Pattern Recognition Letters, 2004,25(10): 1107-1113
    37 Novel method and system for pattern recognition and processing using data encoded as Fourier series and Fourier space, Randell L. Mills, Engineering Applications of Artificial Intelligence 19, 2006: 219-234
    38 L. Breiman, Bagging predictors. machine learning, 1996, 24(2):123-140
    39 Y. Freund, R.E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 1997, 55(1): 119-139
    40 I. Ulusoy, C.M. Bishop, Generative versus discriminative methods for object recognition, CVPR 2005(2): 258-265
    41王守觉.仿生模式识别(拓扑模式识别)——一种模式识别新模型的理论与应用,电子学报,2002, 30(10): 1417-1420
    42王守觉,曲延锋,李卫军等.基于仿生模式识别与传统模式识别的人脸识别效果比较研究. 2004, 32(7): 1057-1061
    43裴继红,杨炬,谢维信.一种用于模式识别的多色Voronoi图.系统工程与电子技术, 2004, 26(7): 963-966
    44熊志勇,沈理.基于双属性图表示的通用人脸图像识别系统.计算机学报, 2001, 24(7): 764-769
    45 Pak Chung Wong, R. Daniel Bergeron.“30 Years of Multidimensional Multivariate Visualization,”Scientific Visualization Overviews, Methodologies, and Techniques, IEEE CS Press, 1997: 3-33
    46 J. J. Gibson. The Perception of the Visual World. Houghton Mifflin Co., Boston, Mass, 1950.
    47 R. M. Pickett, B.W. White. Constructing data pictures. In Proceedings of Society for Information Display Seventh National Symposium, 1966: 75–81
    48 H. Chernoff. The use of faces to represent points in k-dimensional space graphically. Journal of American Statistical Association, 1973, 68: 361–368
    49 John W. Tukey. Exploratory Data Analysis. Addison-Wesley, 1977.
    50 Alfred Inselberg and Bernard Dimsdale. Parallel coordinates for visualizing multi-dimensional geometry. In T. L. Kunii, editor, Proceedings of Computer Graphics International’87, Tokyo, Springer-Verlag. 1987
    51 Daniel Asimov. The grand tour: A tool for viewing multidimensional data. SIAM Journal on Scientific and Statistical Computing, 1985, 6(1): 128–143
    52 Bowen Alpern and Larry Carter. Hyperbox. In Gregory M. Nielson and Larry Rosenblum, editors, Proceedings of IEEE Visualization’91, San Diego, California, 1991: 133–139
    53 Jarke J. vanWijk and Robert van Liere. HyperSlice. In Gregory M. Nielson and R. Daniel Bergeron, editors, Proceedings IEEE Visualization’93, San Jose, California, 1993: 119–125
    54 Clifford Beshers and Steven Feiner. Autovisual: rule-based design of interactive multivariate visualizations. IEEE Computer Graphics and Applications, 1993, 13(4): 41–49
    55 Matthew O. Ward. XmdvTool: Integrating multiple methods for visualizing multivariate data. In R. Daniel Bergeron and Arie E. Kaufman, editors, Proceeding IEEE Visualization’94, Washington, DC, 1994: 326–336
    56 http://www.public.iastate.edu/~dicook/xgobi/xgobi.html
    57 http://www.ailab.si/orange
    58 Swayne, D.F., Lang, D.T., Buja, A., and Cook, D. GGobi: evolving from XGobi into an extensible framework for interactive data visualization.”Computational Statistics and Data Analysis, 2003, 43(4): 423-444
    59 d’Ocagne, M. Coordonnees Paralleles et Axiales: Methode de transformation geometrique et procede nouveau de calcul graphique deduits de la consideration des coordonnees parallelles [M]. Paris: Gauthier-Villars. 1885
    60 Griffen, H. D. Graphic computation of tau as a coefficient of disarray. J. Am. Statist. Assoc., 1958, 53: 441-447
    61 A. Zirakzadeh, A mapping of the Projective n-Space on the Projective plane, The American Mathematical Monthly. 1963, 70(4): 399-401
    62 Hartigan, John A. Clustering Algorithms. New York: John Wiley and Sons, Inc. 1975
    63 Diaconis, P. and Friedman, J. M and N plots. in Recent Advances in Statistics, New York: Academic Press, 1958: 425-447
    64 Inselberg, A., The Plane with Parallel coordinates. The Visual Computer, 1985, 1: 69-91
    65 Wegman, E., Hyper-dimensional data analysis using parallel coordinates. Journal of the American Statistical Association, 1990: 664-675
    66 Daniel A. Keim, Mike Sips and Mihael Ankerst. Visual Data-Mining Techniques. Visualization Handbook, 2005: 831-843
    67 Myung-Hoe Huh, Dong Yong Park. Enhancing parallel coordinate plots. Journal of the Korean Statistical Society, 2008, 37: 129-133
    68 Natsuhiko Kumasakaa, Ritei Shibatab. High-dimensional data visualization. The textile plot Computational Statistics and Data Analysis. 2008, 52: 3616–3644
    69 Harri Siirtola, Kari-Jouko Raiha. Interacting with parallel coordinates. Interacting with Computers, 2006, 18: 1278–1309
    70 Hong Ye, Zhiping Lin. Speed-up simulated annealing by parallel coordinates . European Journal of Operational Research. 2006, 173: 59–71
    71 Hamza Albazzaz, Xue Z. Wang. Historical data analysis based on plots of independent and parallel coordinates and statistical control limits. Journal of Process Control. 2006, 16: 103–114
    72 Alfred Inselberg. Parallel Coordinates: Visualization, Exploration and Classification of High-Dimensional Data. Handbook of Data Visualization. Springer Berlin Heidelberg, 2007
    73 Maria Cristina Ferreira de Oliveira and Haim Levkowitz. From Visual Data Exploration to Visual Data Mining: A Survey, IEEE Transactions on visualization and computer graphics, 2003, 9(3): 378-393
    74 M. Ankerst, M. Ester, and H.P. Kriegel,“Towards an Effective Cooperation of the User and the Computer for Classification,”Proc. Int’l Conf. Knowledge Discovery and Data Mining (ACM SIGKDD’00), 2000: 179-188
    75 J. Han and N. Cercone,“RuleViz: A Model for Visualizing Knowledge Discovery Process,”Proc. Int’l Conf. Knowledge Discovery and Data Mining (ACM SIGKDD’00), 2000: 244-253
    76 Robert A. Amar, John T. Stasko.“Knowledge Precepts for Design and Evaluation of Information Visualizations”, IEEE Transactions on visualization and computer graphics, 2005, 11(4): 432-442
    77 Amit P. Sawant, Christopher G. Healey,A Survey of Display Device Properties and Visual Acuity for Visualization,Technical Report TR-2005-32
    78 Arnulf B. A. Graf,Felix A. Wichmann,Heinrich H. Bulthoff,Bernhard Scholkopf,Classification of Faces in Man and Machine,Neural Computation 18, 2006: 143-165
    79 Ying Tao, Yang Liu, Carol Friedman and Yves A. Lussier. Information visualization technology in bioinformatics during the postgenomic era. DDT: BIOSILICO 2004, 2(6)
    80 Daniel A. Keim, George G. Robertson, Jim J. Thomas and Jarke J. van Wijk,“Guest editorial: special section on visual analytics”, IEEE Transactions on Visualization and Computer Graphics, 2006, 12(6)
    81 M. Ankerst, C.Elsen, M.Ester and H.-P. Kriegel,“Visual classification: an interactive approach to decision tree construction,”Proc. 5th Intl. Conf. On knowledge Discovery and Data Mining (KDD’99), 1999: 392-396
    82 M. Ankerst, M. Ester and H.P. Kriegel,“Towards an effective cooperation of the user and the computer for classification,”Proc. Int’l Conf. Knowledge Discovery and Data Mining (ACM SIGKDD’00), 2000: 179-188
    83 S.T. Teoh and K.-L. Ma,“StarClass: interactive visual classification using star coordinates”,Proceddings of the 3rd SIAM International Conference on Data Mining, 2003
    84 S.T. Teoh and K.-L. Ma,“PaintingClass: Interactive construction, visualization and exploration of Decision Trees,”Proceedings of the 9th ACM SIGKDD International Conference on Knoeledge Discorvery and Data mining, 2003.
    85 Danyu Liu, Alan Sprague and Jeff Gray,“PolyCluster: an interactive visualization approach to construct classification rules”, International Conference on Machine Learning and Applications (ICMLA), Louisville, KY, December 2004: 280-287
    86石赫,李洪波.几何代数和几何计算(一).科学, 2005, 57(9)
    87 A. Lasenby & C. Doran. Geometric Algebra for Physicists, Cambridge U. Press, Cambridge, 2002.
    88 L. Dorst, C. Doran & J. Lasenby (Eds.). Applications of Geometrical Algebra in Computer Science and Engineering, Birkhauser, Boston, 2002
    89李洪波.共形几何代数—几何代数的新理论和计算框架.计算机辅助设计与图形学学报, 2005 , 17(11)
    90 Stephen Mann, Alyn Rockwood. Computing Singularities of 3D Vector Fields with Geometric Algebra. IEEE Visualization , Oct 27-Nov1, 2002: 283-289
    91李洪波. Clifford代数,几何计算和几何推理.数学进展. 2003, 32(4)
    92谢维信,曹文明,蒙山.基于Clifford代数的混合型传感器网络覆盖理论分析.中国科学. 2007 , 37 (8): 1018-1031
    93叶壬癸. Clifford代数很值得在物理学中推广.大学物理, 1997, 16(6): 30-34
    94 Atthew R. Francis and Arthur Kosowsky. Geometric algebra techniques for general relativity. Annals of Physics , 2004, 311: 459-502
    95 Eduardo Bayro-Corrochano. Modeling the 3D kinematics of the eye in the geometric algebra framework. Pattern Recognition, 2003, 36: 2993-3012
    96 Janne Pesonen. Vibration-rotation kinetic energy operators: A geometric algebra approach. Journal of Chemical Physics , 2001, 114(24)
    97 Julia Ebling, Gerik Scheuermann. Clifford Convolution and Pattern Matching On Vector Fields. IEEE Visualization, October 19-24, 2003: 193-200
    98 Eduardo Bayro-Corrochano, Refugio Vallejo.Geometric preprocessing and neurocomputing forpattern recognition and pose estimation. Pattern Recognition 2003, 36: 2909-2926
    99 Eduardo Bayro-Corrochano, Refugio Vallejo, Nancy Arana-Daniel.Geometric preprocessing, geometric feedforward neural networks and Clifford support vector machines for visual learning. Neuralcomputing 2005, 67: 54-105
    100 Sven Buchholz, Gerald Sommer. On Clifford neurons and Clifford multi-layer perceptrons. Neural Networks, 2008
    101 Dacheng Tao, Xuelong Li, Xindong Wu, Steve Maybank. Tensor Rank One Discriminative multilinear subspace selection. Neuralcomputing, 2008, 71: 1866-1882
    102 http://www.perwass.de/clu/index.html
    103 http://www.cgl.waterloo.ca/~smann/GABLE
    104 http://www.mrao.cam.ac.uk/~majal/software/GA
    105 http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp
    106 Darinka Brodnjak-Voncina, Zdenka Cencic Kodbba, Marjana Novic. Multivariate data analysis in classification of vegetable oils characterized by the content of fatty acids. Chemometrics and Intelligent Laboratory Systems, 2005, 75: 31– 43

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700