局部质变因子在医学数据分析中的应用与研究

英文题名：Applied Research on Local Qualitative Factors in the Analysis of Medical Data
作者：杨洋
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：医学数据挖掘 ; 聚类 ; 边界点检测
英文关键词：Medical Data Mining ; Clusters ; Boundary detection ; High-dimensional
英文关键词：data
学位年度：2013
导师：邱保志
学科代码：081202
学位授予单位：郑州大学
论文提交日期：2013-05-01

摘要

近年来随着生物医学工程的迅猛发展,测量技术的提高使得大量的医学信息以电子格式被记录下来,这些信息不仅包括CT影像,X光片,各项生理指标还包括病人的年龄,性别,体重,身高,既往病史等等资料。随着时间的推移,这些医院的数据库信息量不断的膨胀,成倍的增长,数据库技术的出现虽然使得这些信息的存储和检索变的非常容易,但是仍无法改变“数据丰富但知识贫乏”的现象。如何在计算机的帮助下利用这些宝贵的数据为疾病的诊断和治疗提供依据,发现这些数据背后隐藏的有价值的医学信息逐渐受到人们的关注,并成为热点问题。
     数据挖掘技术的出现为解决这些问题提供了可能。数据挖掘技术是指从数据库中自动提取那些隐含在其中的,人们事前未知的信息的过程,所提取的信息可以表示为模式,规则,概念等多种形式。目前数据挖掘技术已在疾病诊断,医学图像分析,疾病相关因素分析等领域取得了较好的成果。
     聚类是数据挖据中一项重要的技术,边界检测是聚类技术的一个细分,而边界检测技术为医学上疾病的预防与预测提供了可能。本文针对目前现有的聚类边界检测算法经行研究并取得了相应的成果：
     (1)针对目前所提出的大部分聚类边界算法不能适用于高维数据的问题进行相关的研究,提出了一种适用于高维数据的基于局部质变因子的聚类边界检测算法(BRINK),该算法使用加权的欧式距离解决现有的大部分聚类边界检测算法不能适用高维数据的问题,利用局部可达密度确定每个对象的局部质变因子,依据每个对象的局部质变因子在聚类的边界对象具有稍大于1的特性来识别聚类的边界,在综合数据集和真实数据集的实验结果表明,该算法能够在含有噪声的,任意形状的多维数据集上有效的检测出聚类的边界。
     (2)针对目前还没有专门的医学数据挖掘平台,作者开发了一个专门针对医学数据的挖掘决策平台,该平台通过数据预处理技术,利用BRINK, Band等多种聚类和边界检测算法对真实医学数据集进行聚类和边界的检测,实验结果表明,该平台的某些算法能够有效的完成既定目标,实现对真实医学数据的聚类和聚类边界检测功能。
With the rapid development of biomedical engineering, improvement of measurement techniques makes a lot of medical information to be recorded in electronic format, this information includes not only CT imaging, X-rays, the physiological indicators, including the patient's age, sex, weight, height, past history, etc. With the development of the time,the amount of database information in these hospitals to continue the expansion, to grow exponentially, although the emergence of the database technology makes very easy to store and retrieve such information,it still can not change the phenomenon of the data-rich but knowledge poor. How to take advantage of these valuable data with help of computer to provide a basis for the diagnosis and treatment of disease,how to find the these data behind valuable medical information which become gradually attentions, these have become a hot issue.
     It might to solve these problems become data mining technology emerged. Data mining techniques are implicit in which unknown information automatically extracted from the database, the extracted information can be expressed as the mode, rule, concept and other forms. Data mining technology already achieved better results in disease diagnosis, medical image analysis,and the analysis of disease-related factors.
     Clustering is an important technology in data mining, boundary detection is a breakdown of the clustering technology, boundary detection technology provides the possibility of prevention and prediction of disease for medical.This paper research on the existing clustering boundary detection algorithm and achieved the following results:
     (1) This paper studied the problem for most of clustering boundary algorithm can not be applied to high-dimensional data, it proposed clustering boundary detection algorithm based on local qualitative factor(BRINK), This algorithm uses weighted euclidean distance to solve high dimensional data problem which most of the existing clusters detecting algorithm can not deal with, Firstly employing the local reachability density to determine the local qualitative factors for each object, Then according to the feature of local qualitative factors,the individual find that it is lightly larger than1in boundary points of clusters.At last,we can detect the boundary points with the former two processes,according to the experimental results of integrated data sets and real data sets,this algorithm can detect boundary points in noisy high-dimensional datasets containing clusters of arbitrary shapes, sizes and different densities.
     (2) Because there is no specialized medical data mining platform, the authors developed a specific data mining decision-making platform for medical data, The platform uses data preprocessing techniques,and then ues BRINK, Band and other clustering and boundary detection algorithm to cluster and detect boundary on real medical data set,according to the experimental results,The platform can effectively completed the goal,and achieve clustering and clustering boundary detection function on real medical set.

引文

[l]谭建豪.数据挖掘技术[M].中国水利水电出版社,2009
    [2]Chenyi Xia, Wynne Hsu, Mong Li Lee, Beng Chin Ooi. BORDER:Efficient Computation of Boundary Points[J]. IEEE transaction on knowledge and data engineering. 2006(18):289-303
    [3]M. Ester,H.P.Kriegel,J.Sander,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial-Databases with Noise[C]. Proceedings of KDD'96, AAAI Press,1996: 226-231
    [4]邱保志,杨洋,杜效伟.BRINK:基于局部质变因子的聚类边界检测算法[J].郑州大学学报(工学版),2012,33(03)：117-120
    [5]邱保志,刘洋,陈本华.基于网格熵的边界点检测算法[J].计算机应用,2008,28(3)：732-734
    [6]邱保志,岳峰.基于引力的边界点检测算法[J].小型微型计算机系统,2008,29(2)：279-282
    [7]薛丽香,邱保志.基于变异系数的边界点检测算法[J].模式识别与人工智能,2009,22(5)：799-802
    [8]Baozhi Qiu,Lixiang Xue.A boundary points detection algorithm based on entropy of grid[J].Journal of information&computational science.2009,6(1):15-22
    [9]邱保志,琚长涛.具有聚类功能的边界检测技术的研究[J].计算机工程与应用,2010,46(20)：138-141
    [10]邱保志,曹鹤玲.一种高效的基于联合熵的边界点检测算法[J].控制与决策,2011,26(1)：71-74
    [11]邱保志,许敏.无参数聚类边界检测算法的研究[J].计算机工程,2011,37(15)：23-26
    [12]Wang W, Yang J,Muntz RR. STING:A statistical information grid approach to spatial data mining[C], Proceedings Of the 23rd Int'l Conf. on VLDB,1997,186-195
    [13]M. Ankerst,M. Breunig, et al. OPTICS:Ordering points to identify the clustering structure[C], Proceedings of 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'99), Philadelphia PA,1999,49-60
    [14]A. Hinneburg,D. A. Keim. DENCLUE:An Efficient Approach to Clustering in Large Multimedia Databases with Noise[C]. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD'98), AAAI Press,1998,58-65
    [15]J. McQueen. Some Methods for Classification and Analysis of Multivariate Observations[C]. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability,1967,281-297
    [16]Z. Huang. A fast clustering algorithm to cluster very large categorical data sets in data mining[C]. Proceedings of SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery,1997
    [17]Taher Niknam, Elahe Taherian Fard, Narges Pourjafarian, et al. An efficient hybrid algorithm based on modified imperialist competitive algorithm and K-means for data clustering[J]. Engineering Applications of Artificial Intelligence,2011,24:306-317
    [18]Jiadong Ren, Lili Meng, Changzhen Hu. CABGD:An Improved Clustering Algorithm Based on Grid-Density[C].Proceedings of 2009 Fourth International Conference on Innovative Computing, Information and Control,2009,381-384
    [19]Guha Sudipto, Rastogi Rajeev, Shim Kyuseok. Cure:an efficient clustering algorithm for large databases[J]. Information Systems,2001,26:35-58
    [20]George Karypis,Eui-Hong(Sam)Han, Vipin Kumar. CHAMELEON:A Hierarchical Clustering Algorithm Using Dynamic Modeling[C]. IEEE Computer,1999,32(8):68-75
    [21]G. Karypis, V. Kumar. hMETIS:A hypergraph partitioning package[CP]. Department of Computer Science,University of Minnesota,1998, http://www.cs.umn.edu/-metis
    [22]Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, et al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications[C]. Proceedings Of the ACM SIGMOD Int'l Conference on Management of Data, Seattle Washington,1998,94-105
    [23]Gholamhosein Sheikholeslami.Surojit Chatterjee,Aidong Zhang. WaveCluster:a wavelet-based clustering approach for spatial data in very large databases[J]. VLDB,2000,8:289-304
    [24]D. Fisher. Improving inference through conceptual clustering[C]. AAAIPress,1987:461-465
    [25]G. A. carpenter, S. Grmsberg. Pattern Recognition by Self-Organizing Neural Networks[M].Mrr Press,1991
    [26]Baozhi Qiu,Feng Yue,Jun-Yi Shen. BRIM:An Efficient Boundary Points Detecting Algorithm[C].Proc.of Advances in Knowledge Discovery and Data Mining.Heidelberg: Springer,2007:761-768
    [27]邱保志,王波.分类数据的边界检测技术[J].计算机应用,2012,32(6)：1654-1656
    [28]Nosovsiy Gleb V., Liu Dongquan, Sourina Olga. Automatic clustering and boundary detection algorithm based on adaptive influence function[J]. Pattern Recognition,2008, 41(9):2757-2776
    [29]陈阵,于炯.FRINGE边界点的有效检测.新疆大学学报(自然科学版)[J],2008,25(3)：263-268
    [30]黄王非,陈黎飞,姜青山等.基于子空间维度加权的密度聚类算法[J].计算工程,2010.5,36(9)：65-67.
    [31]李晓菲.数据预处理算法的研究与应用.[硕士学位论文],西南交通大学,2006
    [32]朱明.数据挖掘[M].中国科学大学出版社.2008

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700