摘要
利用虚拟资源池搭建基于Hadoop的大数据存储架构,将海量自动站文本数据、数字化历史图片以及二进制雷达基数据按照自定义ETL存储规则进行数据清洗之后存入大数据框架,在并发读取效率测试中取得了良好的效果,为应对海量气象资料增长在扩展性和系统性能方面提出的挑战提供解决思路和基本模型。
A large data storage architecture based on Hadoop is constructed by using virtual resource pool. The massive text data, digitized historical pictures and binary radar-based data are cleaned according to the customized ETL storage rules and stored in the large data frame. Good results have been achieved in the concurrent reading efficiency test, which provides solutions and basic models to meet the challenges posed by the growth of mass meteorological data in terms of scalability and system performance.
引文
[1]张金标,张恩红.基于多线程流水线的光盘自动刻录技术研究[J].气象研究与应用,2018,39(2):94-97.
[2]崔杰,李陶深,兰红星.基于Hadoop的海量数据存储平台设计与开发[J].计算机研究与发展,2012,(S1):12-18.
[3]王军,刘文化,于伟东,等.一种基于Hadoop的纺织海量生产数据存储设计[J].微型电脑应用,2013,(6):53-57.
[4]王海荣,刘珂.基于Hadoop的海量数据存储系统设计[J].科技通报,2014,30(9):127-130.
[5]薛胜军,剑寅.基于Hadoop的气象信息数据仓库建立与测试[J].计算机测量与控制,2012,20(4):926-932.
[6]尹颖,林庆,林涵阳.HDFS中高效存储小文件的方法[J].计算机工程与设计,2015,36(2):406-409.