多维数据的多元图特征基元表示与分类研究

英文题名：Research Based on Multi_Map Graphics Primitives and Classification of Multi_Dimensional
作者：刘艳菊
论文级别：硕士
学科专业名称：生物医学工程
中文关键词：数据可视化 ; 径向坐标图 ; 特征融合 ; 分层递阶 ; 特征选取
英文关键词：Data Visualization ; Radar Chart ; Feature Extraction ; Multi-layer Deportation ; Feature selection
学位年度：2010
导师：洪文学
学科代码：0831
学位授予单位：燕山大学

摘要

在模式识别中多维数据分类是一个重要的研究课题。如今,分类算法存在一些问题:传统的分类算法需要大量计算的问题、分类识别的目标更加复杂的问题、分类结果的可解释性差、分类过程的不可知性等问题。
     为了解决上述问题,本文研究了怎样以多元图图形特征基元表示和特征融合、特征提取技术为手段来降低分类算法的计算代价及实现分类结果可视化、分类过程可视化,并且提出了基于多元图图形特征基元表示的多维数据可视化分类的一般性方法。
     首先,在深入分析多维数据多元图表示原理的基础上,对多元图形特征进行了更加深入的挖掘。针对维数在3~15维之间(小高维数据)的数据,根据全息分类(不舍弃任何一个特征信息)的思想,提出了多元图表示和特征提取、变量融合相结合的多维数据可视化分类方法。此方法首先采用径向坐标图表示多维数据,不同类别的多维数据形成的多元图也各不相同,然后应用单原型图形分类器对径向坐标图形进行自动识别。最后进行实验,证明此方法的有效性。
     其次,针对维数在15~30维之间(中高维数据)的数据,为了实现多元图形的自动识别,需要研究多元图图形的描述方法和有利于机器判别的特征。为此,本文提出了一种特征提取、特征融合与多元图图形特征基元表示相结合的可视化分类方法。该方法先对数据进行特征提取,从而实现多维数据的降维,为了不损失数据信息,将其余的数据进行矢量模长融合,最后进行可视化表示,得到利于分类的标准模板,
     最后,在此基础上对于维数远大于30维(大高维数据)的数据,本文采用分层、递阶的方法将数据进行多元图表示,也可以将此方法推广到维数更多的数据分类情况。最后应用经典数据集进行实验证明此方法,实现了分类过程、分类结果可视化,并且取得较高的分类精度。
Pattern recognition in multi-dimensional data classification is an importa- ant research topic.So far ,the classification algorithms exist the following issues :traditional classification algorithms for compute-intensive problem that requir- es an object classification and recognition of the increasingly complex issues, classification results may be an explanatory difference,and the unknown of cl- assified process.
     To address the above problems,this paper study how to reduce compution cost and Realize visualization classification by the method of multi-map graphics primitives,the characteristics of multi-primitve and the feature extraction technolgy.Visualization classification general method of multi-dimension data is proposed based on multi-map graphics primitives and the characteristics of multi-primitve.
     Firstly,after thoroughly study the principle of multi-map graphics, multi-map graphics charcteristicis is excavated.In view of dimension between 3~15 data,visuallization classification method is proposed based on the principle of multi-map graphic unioning varable fusion,which uses the total dimensional data.Fistly.Different multi-dimension data formes different multi-map graphics and distinguished different category.And then,a shortest distance mean classifier is structured to implement automatic recognize multi-map graphics.The experimental results prove that it has better classified precision compared to the traditional classification algorithm.
     Secondly,regarding the dimensions between 15~30 data(middle-high dimension),in order to achieve automatic classification of multi-map graphics,we must study the multi-map graphic description of methods and equipment suitable for distinguishing characteristics.So this paper proposes the visualizaion classification method which applied Feature extraction to recognize the multi-map graphics.then realize the feature reduction,In order not to miss the data information,the other data use the features fusion based on vectors composite method then find a standard template to classfy.
     Finally,in view of multi- dimensional data of more than 30 dimensions through multi-dimensional data decomposition,the method of hierarchical and multi-layer has been proposed which is used to classification of higher dimension data .this method is extended to more dimensions of data classification,expanded the scope of application of this method.Finally the experimental results prove that it has realized visualization classification with the higher classified precision.

引文

1傅京孙著,戴汝为,胡启恒译.模式识别及其应用.北京:科学出版社, 1983. 10:3-10
    2 T.帕夫利迪斯著,张寿萱,徐立明译.结构模式识别.上海:上海科学技术文献出版社, 1981:4-12
    3 Terry Caelli. Structural, Syntactic and Statistical Pattern Recognition, 1st. Springer: 2002.09:3-9
    4 R. C. G.onzalez, M.G. Thomason,濮群,徐凤家,徐光佑译.句法模式识别,北京:清华大学出版社, 1984. 7:6-15
    5 Jurek, Janusz. Recent developments of the syntactic pattern recognition model based on quasi-context sensitive languages. Pattern Recognition Letters, 2005, 26(7):1011-1018
    6 Jurek, Janusz. Towards grammatical inferencing of GDPLL(k) grammars for applications in syntactic pattern recognition-based expert systems. Lecture Notes in Artificial Intelligence (Sub series of Lecture Notes in Computer Science), Artificial Intelligent and Soft Computing-ICAISC, 2004(3070):604-609
    7 Janez Dem?ar, Gregor Leban,Bla? Zupan. FreeViz. An intelligent multivariate visualization approach to explorative analysis of biomedical data. Journal of Biomedical Informatics, 2007, 40(6):661-671
    8 John W. Tukey. Exploratory Data Analysis. Addison-Wesley, 1977:22-56
    9 Jianlong Zhou, Chun Xiao, Zhiyan Wang, Masahiro Takatsuka .A concept of volume rendering guided search process to analyze medical data set. Computerized Medical Imaging and Graphics, 2008, 32(2):140-149
    10 Christian Hennig, Norbert Christlieb . Validating visual clusters in large datasets: fixed point clusters of spectral features. Computational Statistics & Data Analysis, 2002, 40(4): 723-739.
    11 Anas Quteishat, Chee Peng Lim .A modified fuzzy min–max neural network with ruleextraction and its application to fault detection and classification. Applied Soft Computing, 2008, 8(2):985-995
    12 Abbiw-Jackson, R., Golden, B., Raghavan, S, Wasil, E. A divide-and-conquer local search heuristic for data visualization. Computers and Operations Research 2006.(33): 3070–3087.
    13 http://davis.wpi.edu/~xmdv/
    14 P. Compieta, S. Di Martino, M. Bertolotto, F. Ferrucci, T. Kechadi. Exploratory spatio-temporal data mining and visualization. Journal of Visual Languages & Computing, 2007, 18( 3):255-279.
    15 Y. W. Choong, et al., Mining multiple-level fuzzy blocks from multidimensional data, Fuzzy Sets and Systems (2008).
    16 Robert A. Amar and John T. Stasko. Knowledge Precepts for Design and Evaluation of Information Visualizations. IEEE Transactions on visualization and computer graphics, 2005, 11(4):432-442,
    17 Arnulf B. A. Graf, Felix A. Wichmann, Heinrich H. Bulthoff, Bernhard Scholkopf. Classification of Faces in Man and Machine. Neural Computation 18, 2006:143-165
    18 Ying Tao,Yang Liu, Carol Friedman, Yves A. Lussier. Information visualization technology in bioinformatics during the post genomicera. DDT: BIOSILICO 2004, 2(6):33-48
    19 Daniel A. Keim, George G. Robertson, Jim J. Thomas and Jarke J. van Wijk, Guest editorial: special section on visual analytics, IEEE Transactions on Visualization and Computer Graphics, 2006, 12(6):88-86
    20 Hong Wenxue, Wang Liqiang, et al. Research on intelligent electroacupuncture technique based on the symbolized measurement theory. Proceedings of the 27th Annual International Conference of the Engineering in Medicine and Biology Society, 2005:4255-4258.
    21 Li Xin, Hong Wenxue. Research on the Radar Chart Theory Applied to the Indoor Environmental Comfort Level Evaluation. IEEE wcica06, 2006, 6: 5214-5217
    22 Li Xin, Hong Wenxue. Research on Linguistic Concept Creation Method Applied toEnvironmental Comfort Sensors in Health Smart Home, IEEE, EMBS’05: 6052-6055
    23徐永红,洪文学.基于图表达和双变量判别节点模型的蔬菜油分类研究. IEEE WCICA’06
    24孟辉,洪文学.基于多维数据雷达图表示原理与模糊推理规则的分类器研究. IEEE WCICA’06
    25 Yonghong Xu, Wenxue Hong, et al. Visual pattern recognition method based on optimized parallel coordinates. IEEE ICIT, 2007
    26 Yonghong Xu, Wenxue Hong, et al. Parallel dual visualization of multidimensional multivariate data. IEEE ICIT, 2007
    27洪文学,宋佳霖,孟辉,等,基于血清蛋白质组质谱数据断层成像原理癌症标志物模式构成研究.中国生物医学工程学术年会, CBME’07.
    28 Wenyuan Liu, Hui Meng, Wenxue Hong, et al. A new method for dimensionality reduction based on multivariate feature fusion. IEEE ICIT, 2007
    29 E. Pekalska, R.P.W. Duin. The Dissimilarity Representation for Pattern Recognition. Foundations and Applications. World Scientific, Singapore, 2005.
    30 H. Haken. Pattern Recognition and Synchronization in Pulse-Coupled Neural Networks. Nonlinear Dynamics, 2006, 44(1-4):269-276
    31 Alan Rogersa, John Keatingb, Robert Shortenc. A novel pattern classification scheme using the Baker’s map. Neurocomputing, 2003, 55:779– 786.
    32王守觉.仿生模式识别(拓扑模式识别)—模式识别新模型的理论与应用,电子学报, 2002, 3O(10):1417-1420
    33王守觉,曲延锋,李卫军,等,基于仿生模式识别与传统模式识别的人脸识别效果比较研究. 2004, 32(7):1057-1061
    34汪加才,张金城,江效尧.一种有效的可视化孤立点发现与预测新途径.计算机科学, 2007, 134(16):200-203
    35方开泰.多变量样本的图分析(一).数学的认识与实践, 1981, (3):31-35
    36陆发春,李晓辉.残损文献的文字图像处理及识别技术.国家图书馆学刊, 2003, 4:69-73
    37 Robert P.W. Duin, Elzbieta Pekalska. Possibilities of zero-error recognition bydissimilarity representations,Classification. Journal of Machine Learning Research, 2001,2(2): 175-211
    38方开泰.多变量样本的图分析(二).数学的认识与实际, 1981, (3):44-80
    39李伟明.多元描述统计方法.上海:华东师范大学出版社, 2000:16-58
    40 Janez Dem?ar, Gregor Leban,Bla? Zupan. FreeViz. An intelligent multivariate visualization approach to explorative analysis of biomedical data. Journal of Biomedical Informatics, 2007, 40(6):661-671
    41 John W. Tukey. Exploratory Data Analysis. Addison-Wesley, 1977.
    42洪文学,高海波,崔建新,等.多元图图形基元和特征基元提取与表示法.燕山大学学报. 2008, 32(5):405-411
    43王金甲.基于多元数据图表示的可视化模式识别研究. [燕山大学工学博士学位论文], 2008:80-82
    44 Robert A. Amar and John T. Stasko. Knowledge Precepts for Design and Evaluation of Information Visualizations. IEEE Transactions on visualization and computer graphics, 2005, 11(4):432-442,
    45 Arnulf B. A. Graf,Felix A. Wichmann, Heinrich H. Bulthoff, Bernhard Scholkopf. Classification of Faces in Man and Machine. Neural Computation 18, 2006:143-165
    46李昕,洪文学,康健楠.基于图表达原理的室内舒适度评价方法研究及其应用.传感技术学报, 2006, 19(4):1094-1096
    47 Daniel A. Keim, George G. Robertson, Jim J. Thomas and Jarke J. van Wijk, Guest editorial: special section on visual analytics, IEEE Transactions on Visualization and Computer Graphics, 2006, 12(6)
    48刘卓.高维数据分析中的降维方法研究. [国防科技大学硕士论文]. 2002
    49边肇祺,张学工编著.模式识别(第二版).北京:清华大学出版社,2000.1
    50李芳.基于雷达图表示的高维数据可视化分类研究. [燕山大学工学硕士学位论文], 2005
    51 http://www.zgbm.com/tools/36.htm
    52 http://baike.baidu.com/view/32470.html?wtp=tt
    53陈伏兵,杨静宇,小样本情况下Fisher线性鉴别分析的理论及其验证.中国图像图形学报, 2005, 10(8):984-991

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700