色谱指纹图谱的智能聚类分析在中医湿证辨别方面的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
用中医来进行是否有病以及是否有湿证的辨别一直是中医学界所研究和探讨的话题。本文研究健康人、湿证病人和非湿证病人的新鲜尿液的色谱指纹图谱,对这些图谱进行了一系列的研究工作,并取得了一定的成果。
     本文首先对色谱法原理及其特点进行探究,根据分析化学中常用的色谱指纹图谱来建立数学模型的方法来分析共有峰与重叠率以及n强峰的实际意义。
     其次,本文针对聚类分析的各种算法进行了研究和对比分析。现有的聚类分析算法可划分为:划分方法、层次的方法、基于密度的方法、基于网格的方法和基于模型的方法。
     划分方法:给定一个n个对象的数据库,一个划分方法构建数据的k个划分,每个划分表示一个簇,并且k≤n,如k-平均法,k-中心点算法,它对小数据库有效,计算复杂度为O(n~2)。
     层次的方法:对给定数据对象集合进行层次分解。根据层次的分解如何形成,层次的方法又分为凝聚的和分裂的方法,如BIRCH算法。其计算其复杂度为O(n)。
     基于密度的方法的主要思想是:只要邻近区域的密度(对象或数据点的数目)超过某个阈值,就继续聚类。这种方法可以用来过滤“噪声”孤立点数据,发现任意形状的簇。如DBSCAN算法,如果用空间索引,DBSCAN的计算复杂度是O(nlogn),否则计算复杂度为O(n~2)。
     基于网格的方法:把对象空间量化为有限数目的单元,形成了一个网格结构。所有的聚类操作都在这个网格结构上进行。这种方法的主要优点是处理速度快,其处理时间独立与数据对象的数目,只与量化空间中每一维的单元数目有关。如STING算法,产生聚类的时间复杂度为O(n),但查询处理时间是O(g),g是最低层网格单元的数目,通常g远远小于n。
    
     色谱指纹谱的智能聚类分析在中医湿证辨别方面的研究
     基于模型的方法:为每个簇假定了一个模型,寻找数据对给定模型的最佳
    拟合。如COB场王B,计算复杂度会因输入属性的数目和属性值的不同而剧烈变
    化。
     基于模糊集的聚类分析:如模糊聚类的最大树法。
     再次。本次研究利用n强峰、共有峰的重叠率和向量夹角正余弦值对样品
    色谱指纹图谱分别建立了相似度矩阵、相异性矩阵或相似度表,以这些数据模
    型为基础,分别用了k-平均、模糊聚类的最大树法和改进的COBWEB法进行了
    聚类研究,得到了不同的效果。其中改进的COB场吧B法利用共有峰的重叠率作
    为类内相似性(P(再二玲!q)),把谱峰向量夹角的正弦值作为类间相异性
    (P(再=玲),在处理谱峰数据过程中,减少或剔除了所有样品中共有峰中占总
    峰面积的较大面积的谱峰在聚类中的权重,以放大大部分相异成分在分类时的
    比重。通过比较COBWEB法取得了较好的效果。
     最后,通过VC++实现聚类算法。同时提出了改善样本采集方法和改进聚
    类的方法以进一步提高聚类分析在中医辨别有病无病、湿证与非湿证的应用水
    平。
Discrimination on disease and damp-syndrome is the topic which is researched and discussed all the times in the traditional Chinese medicine field. Fingerprint chromatograms of fresh urine derived from healthy people, patients with damp-syndrome and patients with non-dampsyndrome are researched in this paper. And a series of studies are performed on these fingerprint chromatograms and certain achievements have been gamed.
    First, theory and characteristics of chromatography are explored in the paper. The method based on math model, which is established according to fingerprint chromatogram, is used to analyze the actual significance of common peak, overlap rate and n-strong peaks.
    Secondly, some methods on cluster analysis have been studied and analyzed. And the methods can be divided into portioning method, hierarchical method, density-based method, grid-based method and model-based method.
    Portioning method: Construct a partition of a database D of n objects into a set of k clusters and each portioning means a cluster with the time complexity of O (n2), where k    Hierarchical method: Create a hierarchical decomposition of the set of data objects.. And this method can be divided into agglomerative and divisive hierarchical method with the time complexity of O (n) according to the decomposition process, e.g. BIRCH algorithm.
    Density-based method: If the density of neighborhood, that is the number of data objects, exceeds a certain value, the clustering process will be continued. The method can be used to filtrate the outlier data and discover clusters of arbitrary shape. As to DBS CAN algorithm, if the spatial index is used, the tune complexity is O(nlogn), oritisO(n2).
    Grid-based method: Change the objects into the cell with limited number and construct a grid structure. All the clustering operation should be done on the grid structure. The advantage of the method is that the time complexity is independent of the number of objects, and is relevant with the number of cells of each dimension in the measured space. As to the STING algorithm, the tune complexity of clustering is
    
    
    
    O(n), but the time complexity of query is O(g), where g is the number of grid cells at the lowest level and g is far smaller than n..
    Model-based method: Suppose some mathematical models for each cluster, and attempt to optimize the fit between the data and some mathematical model. The time complexity will be different according to the number and value of input properties, such as COBWEB algorithm.
    Thirdly, similarity matrix, dissimilarity matrix or similarity table are established based on the n-strong peaks, the overlap rate of common peaks and the cosine/sine of vectors' angle which are derived from the fingerprint chromatograms of samples. And based on these data model, clustering research has been done by k-means algorithm, biggest tree in fuzzy clustering and improved COBWEB algorithm, where different results have been gained. By comparing, COBWEB algorithm is the best. In the improved COBWEB algorithm, the overlap rate of
    common peaks has been regarded as intra-class similarity (P(Ai = Vij|Ck)) while
    the sine of vectors' angle has been regarded as inter-class similarity (P(Ai= Vij)). In
    addition, the weightiness of common peaks whose area are quite high in the total area of all peaks has been reduced or eliminated so that the proportion of most dissimilar ingredients can be magnified.
    Finally, these clustering method are achieved by VC++. And at the same time, the way to collect samples and the method to improve clustering have been brought forward so that the application level of clustering analysis to discrimination on disease and damp-syndrome in the traditional Chinese medicine can be improved.
    Hulin(Control theory and Control engineering) Supervised by Shao Yuexiang
引文
[1] 洪筱坤、王智华,中文数字化色谱指纹谱,2003,5,上海科技出版社,上海,65~67
    [2] 黄惠勇、朱文峰,中医辨证学现代研究述评,湖南中医学院学报,1996,VoL16,NO.1,75
    [3] 刘新民等,97中医博士论坛,1997,9,北京科技出版社,北京,36~37
    [4] 孙毓庆、王延琮,现代色谱法及其在医药中的应用,1998,人民卫生出版社,北京,126
    [5] http ://www.bwaic.com/service/hplcdoc0001 .htm
    [6] Jiawei Han, Michetine Kamber.Data Mining Concepts and Techniques, 2001,231~235
    [7] 何清,模糊聚类分析理论与应用研究进展,模糊系统与数学,1998,Vol 12,No.2,89~94
    [8] Zkim Le. Fuzzy relation compositions and pattern recognition. Inf Sci, 1996, 89:107~130
    [9] Li R P, Mukaidino M. A maximum entropy approach to fuzzy clustering. IEEEFUZZ'95, 1995, 2227~2232
    [10] Krishnapuram R, Kim J W. A clustering algorithm based on minimum volume. IEEEFUZZ'96, 1996,1387~1392
    [11] Liaw J N, Kashyap R L. A new sequential classifier using information criterion window. Pattern Recognition, 1994, Vol.27,No. 10, 1423~1438
    
    
    [12] Dunn J C. A fuzzy relative of the ISODATA process and its use in detecting compact well separated cluster. J Cybernet, 1974, No.3, 32~57
    [13] Bezdek J C. Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum Press, 1981
    [14] Bensaid A M, Hall L Q, Bezdek J C, et al. Partially supervised clustering for image segmentation. Pattern Recognition, 1996, Vol.29,No.5, 859~871
    [15] Dave R N, Krishnapuram R. Robust clustering methods: A unified view. IEEE Fuzzy Systems, 1997, Vol .5,No.2, 270~293
    [16] Chaudhuri D, Chaudhuri B B. A novel multiseed nonhierarchical data clustering technique. IEEE SMC, 1997, Vol .27,No.5, 871~877
    [17] Dave R N. Validating fuzzy partitions obtained through c-Shells clustering Pattern Recognition Letters, 1996, Vol. 17,613~626
    [18] Qu Lingbo,Xiang Bingren,An Deng-kui The Application of ANN in Pattern Recognition of Chinese Traditional Medicine Computers and Applied Chemistry 2002,Vol. 19,No.4
    [19] 刘健庄,基于二维直方图的图像模糊聚类分割方法,电子学报,1992, Vol.20,No.9,40~46
    [20] Trivedi M M, Bezdek J C. Low-level segmentation of aerial image with fuzzy clustering. IEEE SMC, 1986, Vol.16,No.4, 589~598
    [21] Porter R, Canagarajah N. A robust automatic clustering scheme for image segmentation using wavelets. IEEE Image Processing, 1996,Vol. 5,No.4, 662~665
    [22] 陈闽军、程翼宇.基于遗传算法的色谱指纹峰配对识别方法.分析化学2003.5 513~517
    [23] Chaudhuri B B, Sarkar N. Texture segmentation using fractal
    
    dimension. IEEE PAMI, 1995, Vol.17,No.1, 72~77
    [24] Chen S W, Chen C F, Chen M S, et al. Neuralfuzzy classification for segmentation of remotely sensed images. IEEE Signal Processing, 1997, Vol.45No. 11, 2639~2654
    [25] Shih F Y, Moh Jenlong, Chang Fuchun. A new artbased neural architecture for pattern classification and image enhancement without prior knowledge. Pattern Recognition, 1992, Vol.25,No.5, 533~542
    [26] Lai Weichi. A VLSI neural processor for image data compression using selforganization networks. IEEE Neural Networks, 1993, Vol.3,No.3 : 506~517
    [27] ALSultan K S, Fediji C A. A tabu searchbased algorithm for the fuzzy clustering problem, Pattern Recognition, 1997, Vol. 30,No. 12, 12023~12039
    [28] Dubes R C, Jain A K. Validity studies in clustering methodologies, Pattern Recognition, 1979, No.11, 235~254
    [29] Windham M P. Cluster validity for fuzzy c-means clustering algorithms, IEEE PAMI, 1982,Vol .4,No.4, 357~359
    [30] Dunn J C. Wellseparated clusters and the optimal fuzzy partitions, J Cybernet, 1974, No.4, 95~100
    [31] Gunderson R. Applications of fuzzy ISODATA algorithms startracker printing systems, In: proc 7th Triennial World IFAC Congr, Helsinki, Finland, 1978, 1319~1323
    [32] Xie X L, Beni G. A validity measure for fuzzy clustering. IEEE PAMI, 1991, No.13, 841~847
    [33] 王晓峰等,Apriori算法在红外光谱数据挖掘中的应用,计算机与应用化学,2001,Vol.18,No.5,478~450
    [34] Vogel M A, Wong A C. PFS clustering method. IEEE PAMI, 1979, No.3, 237~245
    
    
    [35] 程翼宇、陈闽军、吴永江,化学指纹图谱的相似性测度及其评价方法,化学学报,2002,Vol.60,2017~2021
    [36] Jain A K, Moreau J V..Bootstrap techniques in cluster analysis. Pattern Recognition, 1987, Vol.20,No.5,547~563
    [37] Beni C, Liu X M. A least biased fuzzy clustering method. IEEE PAMI, 1992, Vol.16,No.9, 954~960
    [38] Dave R N. Validating fuzzy partitions obtained through c-Shells clustering Pattern Recognition Letters, 1996, Vol. 17, 613~623
    [39] Huntsbergery T L, Jacabs C L, Cannon R L. Iterative fuzzy image segmentation, Pattern Recognition, 1985,Vol.18,No.2, 131~138
    [40] Bensaid A M, Halt L O, Bezdek J C, et al, Validityguided (re)clustering with applications to image segmentation, IEEE Fuzzy Systems, 1996, Vol.4,No.2, 112~116

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700