用户名: 密码: 验证码:
基于区间数的不确定性数据聚类算法:UD-OPTICS
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:UD-OPTICS: An uncertain data clustering algorithm based on interval number
  • 作者:吴翠先 ; 何少元
  • 英文作者:WU Cui-xian;HE Shao-yuan;School of Telecommunication and Information Engineering,Chongqing University of Posts and Telecommunications;Research Center of New Telecommunication Technology Applications,Chongqing University of Posts and Telecommunications;Chongqing Information Technology Designing Company Limited;
  • 关键词:不确定性数据 ; 区间数 ; 密度聚类算法 ; OPTICS
  • 英文关键词:uncertain data;;interval number;;density clustering algorithm;;OPTICS
  • 中文刊名:JSJK
  • 英文刊名:Computer Engineering & Science
  • 机构:重庆邮电大学通信与信息工程学院;重庆邮电大学通信新技术应用研究中心;重庆信科设计有限公司;
  • 出版日期:2019-07-15
  • 出版单位:计算机工程与科学
  • 年:2019
  • 期:v.41;No.295
  • 语种:中文;
  • 页:JSJK201907023
  • 页数:9
  • CN:07
  • ISSN:43-1258/TP
  • 分类号:163-171
摘要
在不确定性数据聚类算法的研究中,普遍需要假设不确定性数据服从某种分布,继而获得表示不确定性数据的概率密度函数或概率分布函数,然而这种假设很难保证与实际应用系统中的不确定性数据分布一致。现有的基于密度的算法对初始参数敏感,在对密度不均匀的不确定性数据聚类时,无法发现任意密度的类簇。鉴于这些不足,提出基于区间数的不确定性数据对象排序识别聚类结构算法(UD-OPTICS)。该算法利用区间数理论,结合不确定性数据的相关统计信息来更加合理地表示不确定性数据,提出了低计算复杂度的区间核心距离与区间可达距离的概念与计算方法,将其用于度量不确定性数据间的相似度,拓展类簇与对象排序识别聚类结构。该算法可很好地发现任意密度的类簇。实验结果表明,UD-OPTICS算法具有较高的聚类精度和较低的复杂度。
        The research on uncertain data clustering algorithms generally assumes that uncertain data obeys a certain distribution, so we can obtain the probability density function or probability distribution function which represents the uncertain data. However, it is difficult to guarantee the consistency between the assumed distribution and the distribution of uncertain data in practical applications. Existing algorithms based on density are sensitive to initial parameters, so they cannot find class clusters of arbitrary density when clustering uncertain data with uneven density. In view of these shortcomings, we propose an algorithm based on interval number for uncertain data object sorting recognition clustering structure(UD-OPTICS). It uses the interval number theory and the statistical information of the uncertain data to represent the uncertain data more reasonably. We propose the concept and calculation method of interval core distance and interval reachable distance with low computational complexity, which are used to measure the similarity between uncertain data and expand the cluster structure of clusters and object sorting. This algorithm can well find clusters of arbitrary density. Experimental results show that the UD-OPTICS algorithm has higher clustering accuracy and lower complexity.
引文
[1] Chi Rong-hua,Cheng Yuan,Zhu Su-xia,et al.Uncertain data analysis algorithm based on fast Gaussian transform[J].Journal of Communications,2017,38(3):101-111.(in Chinese)
    [2] Ren Shi-jin.Research on uncertain data mining based on interval number and its application[D].Hangzhou:Zhejiang University,2006.(in Chinese)
    [3] Qiu Zhi-ping.Interval analysis method for static response and eigenvalue problems of uncertain parameter structures[D].Jilin:Jilin University of Technology,Jilin University,1994.(in Chinese)
    [4] Li Jia-fei,Sun Xiao-yu.Clustering method for uncertain data based on spectral decomposition[J].Journal of Jilin University(Engineering and Technology Edition),2017,47(5):1604-1611.(in Chinese)
    [5] Kriegel H P,Pfeifle M.Hierarchical density-based clustering of uncertain data[C]//Proc of IEEE International Conference on Data Mining,2005:27-30.
    [6] Kriegel H P,Pfeifle M.Density-based clustering of uncertain data[C]//Proc of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining,2005:672-677.
    [7] Chau M,Cheng R,Kao B,et al.Uncertain data mining:An example in clustering location data[C]//Proc of Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining,2006:199-204.
    [8] Kao B,Lee S D,Cheung D W,et al.Clustering uncertain data using voronoi diagrams[C]//Proc of the 8th IEEE International Conference on Data Mining,2008:333-342.
    [9] Cormode G,Mcgregor A.Approximation algorithms for clustering uncertain data[C]//Proc of the 27th ACM Sigmod-Sigact-Sigart Symposium on Principles of Database Systems,2008:191-200.
    [10] Peng Yu,Luo Qing-hua,Peng Xi-yuan.UIDK-means:A multi-dimensional uncertain measurement data clustering algorithm[J].Chinese Journal of Scientific Instrument,2011,32(6):1201-1207.(in Chinese)
    [11] He Yun-bin,Zhang Zhi-chao,Wan Jing,et al.Research for uncertain data clustering algorithm:U-PAM and UM-PAM algorithm[J].Computer Science,2016,43(6):263-269.(in Chinese)
    [12] Wei Fang-yuan,Huang De-cai.UID-DBSCAN clustering algorithm of multi-dimensional uncertain data based on interval number[J].Computer Science,2017,44(11A):442-447.(in Chinese)
    [13] Liu Xiu-mei,Zhao Ke-qin.Interval number decision set pair analysis[M].Beijing:Science Press,2014.(in Chinese)
    [14] Moore R E.Error in digital computation,Volume 1[M].New York:John Wiley & Sons,1965.
    [15] Ankerst M.OPTICS:Ordering points to identify the clustering structure[J].ACM Sigmod Record,1999,28(2):49-60.
    [16] Luo Yan-fu,Qian Xiao-dong.Uncertain data clustering algorithm based on local density[J].Data Analysis & Knowledge Discovery,2017,12:84-91.(in Chinese)
    [1] 迟荣华,程媛,朱素霞,等.基于快速高斯变换的不确定数据聚类算法[J].通信学报,2017,38(3):101-111.
    [2] 任世锦.基于区间数的不确定性数据挖掘及其应用研究[D].杭州:浙江大学,2006.
    [3] 邱志平.不确定参数结构静力响应和特征值问题的区间分析方法[D].吉林:吉林工业大学,吉林大学,1994.
    [4] 李嘉菲,孙小玉.基于谱分解的不确定数据聚类方法[J].吉林大学学报(工学版),2017,47(5):1604-1611.
    [10] 彭宇,罗清华,彭喜元.UIDK-means:多维不确定性测量数据聚类算法[J].仪器仪表学报,2011,32(6):1201-1207.
    [11] 何云斌,张志超,万静,等.不确定数据聚类的U-PAM算法和UM-PAM算法的研究[J].计算机科学,2016,43(6):263-269.
    [12] 魏方圆,黄德才.基于区间数的多维不确定性数据UID-DBSCAN聚类算法[J].计算机科学,2017,44(11A):442-447.
    [13] 刘秀梅,赵克勤.区间数决策集对分析[M].北京:科学出版社,2014.
    [16] 罗彦福,钱晓东.基于局部密度的不确定数据聚类算法[J].数据分析与知识发现,2017,12:84-91.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700