摘要
目的利用大数据分析平台,对中国疾病预防控制信息系统网络日志进行分析,为优化信息技术基础资源配置和提高网络安全管理提供依据。方法基于Hadoop和Spark为核心的大数据分析平台,完成对2016年1—7月中国疾病预防控制信息系统网络日志数据的处理、分析和结果展示。结果初步掌握了系统资源利用、用户行为、用户分布、系统非正常访问等情况,同时也发现了日志格式不规范、内容有缺失等不足之处,提出了改进办法。结论利用大数据平台可实现大型业务系统海量日志的高效分析和展示,及时发现有效信息,对于优化系统资源,掌握用户情况及网络安全管理至关重要。
Objective The purpose is to make use of big data analysis technology to analyze the network logs of China Information System for Disease Control and Prevention, so as to provide a basis for optimizing resource allocations and improving information security management. Methods Based on big data analysis platform centered on Hadoop and Spark, data processing, analysis and result display of the network logs from January 2016 to July 2016 were accomplished. Results By analyzing the logs of device running status and user behavior, we initially got the information of system resources, user behavior, user distribution, system illegal access and so on. We also found that the log file format is not standardized and the content is missing. Improvement measures were proposed. Conclusions The use of big data platform can achieve efficient analysis and display of massive logs of large-scale business systems, timely discover effective information, which is very important for optimizing system resources, obtaining user information and information security management.
引文
[1]姜传菊.网络日志分析在网络安全中的作用[J].现代图书情报技术,2004,118(12):58-60.
[2]李言飞,马家奇.中国疾病预防控制信息系统问题分析与对策[J].中国卫生信息管理杂志,2013,10(3):230-232.
[3]颜伟,李俊青.基于Python网络日志分析系统研究与实现[J].曲阜师范大学学报(自然科学版),2017,43(4):48-50.
[4]Apache Hadoop.[M/OL].[2018-03-16].http://hadoop.apache.org/docs/stable/.
[5]Overview-Spark 2.4.0 Documentation.[M/OL].[2018-03-16].http://spark.apache.org/docs/latest/.
[6]Our Documentation|Python.org.[M/OL].[2018-03-16].https://docs.python.org/3.5/.
[7]Project Jupyter|Home.[M/OL].[2018-03-16].http://jupyter.org/.
[8]Apache Parquet.[M/OL].[2018-03-16].http://parquet.apache.org/.
[9]李言飞,鲍一丹,葛辉,等.用户统一认证与授权在中国疾控信息系统中的应用[J].中国公共卫生管理,2011,27(6):571-572.
[10]秦文聪.大数据国家战略实施势在必行[J].软件和集成电路,2015,32(8):48-51.
[11]任凯,邓武,俞琰.基于大数据技术的网络日志分析系统研究[J].现代电子技术,2016,40(2):39-41,44.