云计算海量高维大数据特征选择算法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

云计算海量高维大数据特征选择算法研究

详细信息查看全文 | 推荐本文 |

英文篇名：Research on Feature Selection Algorithm of Massive High Dimensional Big Data in Cloud Computing
作者：胡晶
英文作者：HU Jing;Fuzhou Institute of Technology;
关键词：云计算 ; 大数据 ; 熵加权 ; 稀疏原理 ; 在线学习
英文关键词：Cloud computing;;Big data;;Entropy weighted;;Sparse principle;;Online learning
中文刊名：JSJZ
英文刊名：Computer Simulation
机构：福州理工学院;
出版日期：2019-04-15
出版单位：计算机仿真
年：2019
期：v.36
基金：2016年教育厅科技类课题JAT160619《基于云存储的高校实时推送技术研究》;; 2017年福建省高等学校学科带头人培养计划国内访问学者项目(闽教师[2017]87号)
语种：中文;
页：JSJZ201904040
页数：4
CN：04
ISSN：11-3724/TP
分类号：196-199

摘要

为了有效分析云计算环境下的海量高维大数据,需要对数据进行特征选择处理,针对云计算大数据的高动态与高维度特征,提出了基于竞争熵加权结合稀疏原理的在线学习特征选择算法。首先在熵加权迭代的过程中,采用了竞争合并方式对熵加权计算进行优化,降低数据处理的维度,提高算法对高维数据的处理能力;然后引入稀疏分数将局部数据对应的特征做标记,同时根据各自的重要程度排序,去除掉大数据源中的冗余数据;最后,将合并熵加权与稀疏原理应用于在线学习算法框架中,进一步提高算法对高维数据流的处理效率。实验结果验证了提出的算法提高了聚类精度,有效提高了云计算环境下海量高维大数据特征选择的准确性。
In order to effectively analyze the massive high-dimensional big data in the cloud computing environment,the data need to be processed by feature selection. Aiming at the high dynamic and high dimensional characteristics of cloud computing big data,an online learning feature selection algorithm based on competitive entropy weighting and sparse principle was proposed. First of all,in the process of entropy weighted iteration,the method of competitive combination was adopted to optimize the entropy weighted calculation and lower the dimensions of data processing. The processing ability of the algorithm to high dimensional data was improved. Then,sparse score was introduced to mark the corresponding features of local data,at the same time,according to their importance,redundant data were removed from large data sources. Finally,the combined entropy weighting and sparse principle were applied to the framework of online learning algorithm. The processing efficiency of the algorithm for high dimensional data streams was further improved. The experimental results show that the proposed algorithm can improve the clustering accuracy,and the accuracy of feature selection of massive high-dimensional big data in cloud computing environment is improved.

引文

[1] 许丞,刘洪,谭良.Hadoop云平台的一种新的任务调度和监控机制[J].计算机科学,2013,40(1):112-117.
    [2] 张彬桥.云环境下计算资源调度策略与仿真研究[J].计算机仿真,2013,30(11):392-395.
    [3] 吴涛,陈黎飞,郭躬德.优化子空间的高维聚类算法[J].计算机应用,2014,34(8):2279-2284.
    [4] 万中英,王明文,左家莉,等.结合全局和局部信息的特征选择算法[J].山东大学学报(理学版),2016,51(5):87-93.
    [5] 李志杰,李元香,王峰,等.面向大数据分析的在线学习算法综述[J].计算机研究与发展,2015,52(8):1707-1721.
    [6] 邱保志,贺艳芳,申向东.熵加权多视角核 K-means 算法[J].计算机应用,2016,36(6):1619-1623.
    [7] 吴杰祺,李晓宇,袁晓彤,等.利用坐标下降实现并行稀疏子空间聚类[J].计算机应用,2016,36(2):372-376.
    [8] Lin Li,Fengjing Shao.Building Red Tide Data Mart and Research on Replacing Algorithm about OLAP Query Results Based on Cache Technology[C].2011 3rd IEEE International Conference on Information Management and Engineering.Vol.03.55-58.2011.5
    [9] Wang Changying,Chu Jialan,Tan Meng,Shao Fengjing,Sui Yi,Li Shujing.Automatic detection of green tide using multi windows with their adaptive threshold from Landsat TM/ETM+image,Acta Oceanologica Sinica,2017,36(11):106-114.
    [10] Yi Sui,Fengjing Shao,Changying Wang,Rencheng Sun,Jun Ji.Complex network modeling of spectral remotely sensed imagery:A case study of massive green algae blooms detection based on MODIS data.Physica A 464(2016) 138-148.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700