QAR数据仓库在Hive中的构建
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Data warehouse of QAR based on Hive
  • 作者:冯兴杰 ; 吴稀钰 ; 赵杰 ; 贺阳 ; 房戍
  • 英文作者:FENG Xingjie;WU Xiyu;ZHAO Jie;HE Yang;FANG Shu;College of Computer Science & Technology, Civil Aviation University of China;
  • 关键词:Hive ; 快速存取记录器(QAR) ; 数据仓库 ; 数据处理 ; Hadoop
  • 英文关键词:Hive;;Quick Access Recorde(rQAR);;data warehouse;;data processing;;Hadoop
  • 中文刊名:JSGG
  • 英文刊名:Computer Engineering and Applications
  • 机构:中国民航大学计算机科学与技术学院;
  • 出版日期:2016-09-28 16:13
  • 出版单位:计算机工程与应用
  • 年:2017
  • 期:v.53;No.882
  • 基金:国家自然科学青年基金(No.61301245,No.61201414)
  • 语种:中文;
  • 页:JSGG201711016
  • 页数:5
  • CN:11
  • 分类号:95-99
摘要
分析QAR数据是一种非常有效的监控飞机状态的方法。但随着民航领域的快速发展,QAR数据的规模急剧增大,现有基于关系型数据库的QAR数据仓库不足以支撑海量数据下的存储与分析,导致海量的QAR数据因无法处理变成了信息垃圾。因此,针对现有数据仓库的不足,提出基于Hive的QAR数据仓库。通过对Hive特点及QAR数据结构分析,设计了基于Hive的QAR数据仓库的总体架构和存储结构。通过将现有数据仓库中的数据移植到基于Hive的QAR数据仓库,实现了对已有数据仓库的兼容。实验结果表明基于Hive的QAR数据仓库在面对QAR数据剧增的情况下,处理所需时间依然保持着线性增长。
        QAR data analysis is an effective method for detecting the state of the aircraft. However, with the rapid development of civil aviation, the scale of QAR is increasing rapidly. The existing QAR data warehouse based on relational database is not sufficient to support massive data storage and analysis, resulting in massive data into the information garbage. In this paper, to solve the deficiency of existing data warehouse, it proposes the QAR data warehouse based on Hive. Based on the analysis of Hive features and QAR data structure, the overall architecture and storage structure of QAR data warehouse based on Hive are designed. By porting the data in the existing data warehouse to the QAR data warehouse based on Hive, it can realize the compatibility of existing data warehouse. Experimental results show that the QAR data warehouse based on Hive in the face of the sharp increase of QAR data processing time maintains a linear growth.
引文
[1]周百政.QAR数据处理系统的设计与实现[D].天津:中国民航大学,2009.
    [2]窦红霞,杨慧,梁领军.基于SQL Server的QAR数据仓库实现[J].航空维修与工程,2013(2):64-67.
    [3]郇秀霞,王红.基于数据仓库的QAR数据分析[J].计算机工程与设计,2008,29(10):2685-2688.
    [4]陈锦宇.SQL Server 2000数据仓库系统及实用技术研究[D].北京:北京邮电大学,2006.
    [5]孙凯.数据仓库查询优化及索引技术的研究[D].济南:山东大学,2007.
    [6]Hive[EB/OL].[2015-04-15].http://hive.apache.org/.
    [7]Thusoo A,Sarna J S,Jain N,et al.Hive:a warehousing solution over a map-reduce framework[J].Proceedings of the VLDB Endowment,2009,2(2):1626-1629.
    [8]Awadallah A,Graham D.Hadoop and data warehouse when to use which[EB/OL].[2015-12-11].https://kannandreams.files.wordpress.com/2013/10/hadoop-use-case-1.pdf.
    [9]王德文,肖凯,肖磊.基于Hive的电力设备状态信息数据仓库[J].电力系统保护与控制,2013,41(9):125-130.
    [10]赵龙,江荣安.基于Hive的海量搜索日志分析系统研究[J].计算机应用研究,2013,30(11):3343-3345.
    [11]Apache Hadoop[EB/OL].(2014-08-15).http://hadoop.apache.org/.
    [12]Borthakur D.The hadoop distributed file system:architecture and design[J].Hadoop Project Website,2007,11.
    [13]Dean J,Ghemawat S.Map Reduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
    [14]Dean J,Ghemawat S.Map Reduce:a flexible data processing tool[J].Communications of the ACM,2010,53(1):72-77.
    [15]Sqoop[EB/OL].[2015-04-15].http://sqoop.apache.org/.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700