摘要
为探究水质分析与大数据技术结合的可行方案,以MySQL+Hive+SparkR为主体框架搭建一整套从数据输入、存储、调度到应用的SparkR水质分析平台。设置室内培养模拟人工湖藻类生长实验组及其重复实验组,监测各项指标数据,通过SparkR平台,在本地应用Adaptive-Lasso算法识别出对照组和苦草组藻类生长主要影响因子,并建立回归方程进行验证,在集群分布式部署GBTs藻类预测模型,经重复试验验证预测模型未来3天的相对误差均值分别为15.3%、14.8%。
In order to explore the feasible scheme of combining water quality analysis with big data technology, a set of SparkR water quality analysis platform from data input, storage, dispatch to application is built with MySQL+Hive+SparkR as the main framework. Seting up experiment groups indoors to simulate algae growth of artificial lake and its repeated experimental groups, various indicators was monitored. Based on SparkR platform, the adaptive-Lasso algorithm was applied locally to identify the main influencing factors of algae growth in control group and validate the regression equation, and GBTs algae prediction model was deployed in the cluster, and repeated experiments showed that the relative error of GBRT algae prediction models in the next three days was 15.3% and 14.8% respectively.
引文
[1]赵黎明,王海刚,王英珏.大数据在线技术在水质监测中的应用[J].中国环保产业,2017(12):70-72.
[2]周煜申,康望星,沈存,赵贤林.大数据在水环境综合评价预警中的应用研究[J].江苏科技信息,2017(35):52-54+64.
[3]原广平.大数据技术在滇池流域水环境监测网络及信息平台中的应用[J].环境与发展,2018,30(11):146-147.
[4]邵璇,田文君.基于大数据的水质监测技术初探[J].科技传播,2018,10(07):75-76.
[5]魏复盛,国家环境保护总局.水和废水监测分析方法(第4版)[M].北京:中国环境科学出版社,2002.
[6]Robert Tibshirani.(1996),Regression Shrinkahe and Selection via the Lasso.Journal of the Royal Statistical Society.Series B,Vol.58,No.1.267-288.
[7]Hui Zuo.Trevor Hastie.(2005),Regularization and variable selection via the elastic net.
[8]吕依蓉,孙斌,喻之斌,等.基于梯度提升回归树的处理器性能数据挖掘研究[J/OL].集成技术,2018(05):1-10.
[9]张兴.基于Spark大数据平台的火电厂节能分析[D].太原:太原理工大学,2016.