大数据分析下多维离散数据高效聚类方法仿真

英文篇名：Simulation of Multi-Dimensional Discrete Data Efficient Clustering Method under Big Data Analysis
作者：姜延文
英文作者：JIANG Yan-wen;College of Computer, Hulunbuir University;
关键词：大数据分析下 ; 多维离散数据 ; 高效聚类方法 ; 量子优化算法
英文关键词：In big data analysis;;Multidimensional discrete data;;Efficient clustering method;;Quantum optimization algorithm
中文刊名：JSJZ
英文刊名：Computer Simulation
机构：呼伦贝尔学院计算机学院;
出版日期：2019-02-15
出版单位：计算机仿真
年：2019
期：v.36
语种：中文;
页：JSJZ201902044
页数：4
CN：02
ISSN：11-3724/TP
分类号：215-218

摘要

针对当前聚类方法计算时间较长、聚类结果不准确等问题,提出基于量子优化算法的大数据分析下多维离散数据高效聚类方法。采用空间重构分析方法对大数据进行离散性映射处理,选取最小嵌入维数和最佳时延来构造大数据时间序列的信息流模型,将信息流模型作为提取时延尺度特征的输入,构建基于提取特征值的聚类搜索目标函数,采用模糊C聚类算法求解初始聚类中心搜索目标函数,从而获得大数据的最优聚类中心。采用量子优化算法抑制聚类中心的小扰动,实现聚类优化,完成大数据分析下多维离散数据高效聚类。仿真结果证明,所提方法有效减少了计算时间,降低了与实际聚类结果的差距,提高了计算效率。
At present, the clustering method needs long computing time, and the clustering result is not accurate. Therefore, a method to efficiently cluster the multidimensional discrete data in big data analysis based on quantum optimization algorithm was presented. Firstly, the spatial reconstruction analysis method was applied to the nonlinear mapping of big data. Then, the minimum embedding dimension and the best time delay were selected to construct the information flow model of time series of big data. Moreover, the information flow model was used as the input to extract the delay-scale feature. Meanwhile, the clustering search objective function based on the extracted feature value was established. In addition, the fuzzy clustering algorithm was used to solve the search objective function of initial clustering center, so as to obtain the optimal clustering center of big data. Finally, the quantum optimization algorithm was used to suppress the small disturbance in cluster center and thus to achieve clustering optimization. Thus, we could complete the efficient clustering of multidimensional discrete data in big data analysis. Simulation results show that the proposed method can effectively reduce the computing time and the difference with the actual clustering result. Meanwhile, this method improves the computational efficiency.

引文

[1] 张成军,刘超,郭强. 大数据网络环境下异常节点数据定位方法仿真[J]. 计算机仿真, 2017,34(5):273-276.
    [2] 马义松,武志刚. 基于Neo4j的电力大数据建模及分析[J]. 电工电能新技术, 2016,35(2):24-30.
    [3] 淦文燕,刘冲. 一种改进的搜索密度峰值的聚类算法[J]. 智能系统学报, 2017,12(2): 229-236.
    [4] 张铁映,等. 采用密度聚类算法的兴趣点数据可视化方法[J]. 测绘科学, 2016,41(5): 157-162.
    [5] 周润物,等. 面向大数据处理的并行优化抽样聚类K-means算法[J]. 计算机应用, 2016,36(2):311-315.
    [6] 苏泽斌,黄梦莹,景军锋。基于参数核图割的纱线毛羽图像处理方法[J].西安工程大学学报,2017,31(4):486-494.
    [7] 郑河荣,陈恳,潘翔. 结合代表点和密度峰的增量动态聚类算法[J]. 浙江工业大学学报, 2017,45(4):427-433.
    [8] 张戈一,等. 基于大数据分析挖掘的地质文献推荐方法研究[J]. 中国矿业, 2017,26(9): 92-97.
    [9] 巩树凤,张岩峰. EDDPC:一种高效的分布式密度中心聚类算法[J]. 计算机研究与发展, 2016,53(6):1400-1409.
    [10] 高继平,等. 大数据领域代表性专家识别与分析——文献计量学视角[J]. 科技管理研究, 2016,36(16):177-182.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700