增广链修复下大数据并行搜索聚类算法

英文篇名：Parallel Search Clustering Algorithm of Large Based on Data Augmented Chain Repair
作者：何玉新
英文作者：He Yuxin;Jilin Engineering Vocational College;
关键词：语义 ; 数据库 ; 大数据聚类 ; 链路修复
英文关键词：semantics;;database;;large data clustering;;link restoration
中文刊名：KJTB
英文刊名：Bulletin of Science and Technology
机构：吉林工程职业学院;
出版日期：2016-03-31
出版单位：科技通报
年：2016
期：v.32;No.211
语种：中文;
页：KJTB201603025
页数：5
CN：03
ISSN：33-1079/N
分类号：117-121

摘要

研究多源语义特征分层数据库中的大数据聚类方法,实现数据的分类识别。多源语义特征分层数据库中由于路由冲突,在链路负载较大的情况下,不能有效实现对大数据语义特征的并行搜索。提出一种基于增广链同态解析的链路分流方法避免路由冲突,实现增广链修复下大数据并行搜索聚类。构建大数据聚类的语义相似度融合模型,基于跨层链路分流算法实现增广链路分流,进行语义本体模型构建,选择采用高阶贝塞尔函数累积量作为增广链修复检验统计量,确定节点数据包的置信度,确立置信区间,在进行缓冲区溢出修复时,进行功率谱幅度特征提取,实现大数据的并行搜索聚类,进行语义本体模型构建,为离群点新建一个簇,依次对每个文档的主题词集进行处理,将每个主题词自动添加入形式背景的属性集中,采用并行搜索算法实现对语义大数据的优化聚类算法改进。仿真结果表明,采用该算法进行大数据聚类,契合度较高,误分率较低,性能优越。
Research on the large data clustering method in the multi source semantic feature database, and realize the classification and recognition of the data. Due to the large link load, the parallel search of the semantic features of large data cannot be realized effectively in the case of large link load. A based on augmented chain analytic homomorphism link shunting method to avoid routing conflict to achieve Chain augmented to repair large data parallel searching and clustering is proposed. Semantic similarity clustering to build large data fusion model, cross layer link algorithm based on extended link shunt shunt, constructing the semantic ontology model, selected by high order cumulant as the augmented Bessel function chain repair test statistics, to determine the node packet confidence, establish the confidence interval, the buffer overflow repair, spectrum the amplitude feature extraction power, realize the parallel searching and clustering large data, construct a semantic ontology model, a new cluster of outliers, followed by the key words of each document set is processed, each subject is automatically added into context attributes, using the parallel search algorithm to realize the optimization of improved clustering algorithm for semantic data the. The simulation results show that the algorithm is used to carry out large data clustering, and the fit is high, the error rate is low, and the performance is superior.

引文

[1]高志春,陈冠玮.倾斜因子K均值优化数据聚类及故障诊断研究[J].计算机与数字工程,2014,42(1):14-18.
    [2]方加娟.基于多特征融合的Web图像聚类算法[J].科技通报,2013,29(8).
    [3]王怀宇,李景丽.网络海量数据中隐私泄露检测方法仿真[J].计算机仿真,2014,31(6).
    [4]柏猛,李敏花,吕英俊.基于对称性分析的棋盘图像角点检测方法[J].信息与控制,2015,44(3):276-283.
    [5]严海芳,蒋卉,张文权.用MCEM加速算法估计多序列无根树最优分支长度[J].湘潭大学自然科学学报,2014,36(2):13-16.
    [6]曾志,王晋,杜震洪,等.一种云格环境下可计算资源与服务高效调配机制[J].浙江大学学报(理学版),2014,41(3):353-357.
    [7]Alexa M,Behr J,Cohen-Or D,et al.Point Set Surfaces[C]//Proceedings of the conference on Visualization 2001.Washington:IEEE Computer Society,2001:21-28.
    [8]Dyn N,Iske A,Wendland H.Meshfree thinning of 3D point clouds[J].Foundations of Computational Mathematics,2008,8(4):409-425.
    [9]黄文明,肖朝霞,温佩芝,等.保留边界的点云简化方法[J].计算机应用,2010,30(2):348-351.
    [10]Yang P,Qian X.Direct computing of surface curvatures for point-set surfaces[C]//Proceeding of the IEEE Eurographics Symposium on Point based Graphics.Prague,Czech Republic:IEEE,2007:29-36.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700