摘要
在充分研究大数据采集、大数据存储、HDFS和Flume基础上,综合分析并利用相关领域知识,给出了一种基于Flume和HDFS相结合的大数据采集系统BDAS的概念模型和体系结构.并根据BDAS的体系结构,可以明确实现一种大数据采集的具体工作,即:Flume Agent的配置.根据体系结构,给出一个实现Web Server日志采集的具体实现方法和步骤. BDAS概念模型和体系结构在大数据分析和研究领域具有重要的理论意义和实际意义,也为大数据领域的研究提供了一种通用的大数据获取手段.
On the basis of fully studying big data collection,big data storage,HDFS and Flume,comprehensively analyzing and utilizing the relevant domain knowledge,a conceptual model and architecture were presented based on Flume and HDFS combined with big data acquisition system BDAS. According to the architecture of BDAS,the specific work of a big data collection can be clearly realized,namely,the configuration of the Flume Agent. According to the architecture,a specific implementation method and steps to achieve Web Server log collection were offered. The BDAS conceptual model and architecture have important theoretical and practical significance in the field of big data analysis and research,providing a universal means of big data acquisition for the research of big data.
引文
[1]程学旗,靳小龙,王元卓,等.大数据系统和分析技术综述[J].软件学报,2014,25(09):1889-1908.
[2] Apache Software Foundation. HDFS Users Guide[EB/OL]. https://hadoop. apache. org/docs/stable/hadoop-project-dist/hadoop-hdfs/Hdfs Design. html,2018-05-01.
[3] White T,Hadoop.权威指南:大数据的存储与分析(第4版)[M].北京:清华大学出版社,2017.
[4] Apache Software Foundation. Flume 1. 8. 0 Developer Guide[EB/OL]. http://flume. apache. org/Flume Developer Guide. html,2018-05-01.
[5]徐海荣,陈闵叶,张兴媛.基于Flume,Kafka,Storm,HDFS的航空维修大数据系统[J].上海工程技术大学学报,2015,29(04):303-305+311.
[6]于金良,朱志祥,梁小江.基于Flume的MySQL数据自动收集系统[J].计算机技术与发展,2016,26(12):137-141.
[7] Madani Y,Bengourram J,Erritali M. Social login and data storage in the Big Data File System HDFS[A]. In:Proc. of the International Conference on Compute and Data Analysis ICCDA’17[C]. Lakeland:ACM Press,2017,91-97.
[8]詹玲,马骏,陈伯江,等.分布式I/O日志收集系统的设计与实现[J].计算机工程与应用,2010,46(36):88-90.