摘要
【目的/意义】针对多组时间序列的海量数据集和以预测为目标的信息分析方法,提出了基于数据挖掘技术的预测模型,在大数据环境下,提高了预测精度,以期在其他领域的信息分析和情报预测能有所借鉴。【方法/过程】以集装箱海运价格预测为例,提出集装箱海运价格预测模型,设计自适应的网格搜索策略,高效准确地确定数据挖掘算法中的超参数组合,提出基于时间序列留出法的评估方法,降低了集装箱运价这种多组时间序列数据集在数据挖掘结果上的泛化误差,针对海量运价信息,对GBDT算法进行并行计算设计和预排序后的损失函数迭代计算优化策略,提高了算法在大数据环境下的计算效率。【结果/结论】模型和算法运行结果仿真显示:对于传统的时间序列问题,基于数据挖掘方法的预测模型取得了比传统时间序列方法更优的结果。
【Purpose/significance】In the face of the multi-type and multi-feature mass data, the conventional information analysis method faces the challenge. Pointing at the mass data set of multi-group time series and the information analysis method aiming at prediction, a prediction model based on data mining technology is proposed. In the environment of big data, the accuracy of prediction is improved, so that the information analysis and intelligence prediction in other fields can be used for reference.【Method/process】Taking container shipping price prediction as an example, a container shipping price prediction model is proposed, and an adaptive grid search strategy is designed to efficiently and accurately determine the super-parameter combination in the data mining algorithm. An evaluation method based on time series reserving method is proposed to reduce the generalization error of multi-group time series data sets in data mining. The parallel computing design of GBDT algorithm and the iterative optimization strategy of loss function after pre-sorting can improve the efficiency of the algorithm under big data environment.【Result/conclusion】The simulation results of the model and the algorithm show that the prediction model based on the data mining method is better than the traditional time series method for the traditional time series problems.
引文
1尚介丽,骆温平.运用神经网络模型预测铁矿石即期海运运价[J].水运管理, 2012, 34(4):21-24.
2 徐萍.基于小波分析和神经网络的BFI预测研究[D].大连:大连海事大学, 2006.
3 杨华龙,东方.基于支持向量机的干散货航运市场运价预警[J].中国航海,2009,(3):101-105.
4 曾庆成.神经网络在波罗的海运价指数预测中的应用研究[J].大连海事大学学报:自然科学版, 2004, 30(3):45-47.
5 吕靖,陈庆辉.海运价格指数波动规律[J].大连海事大学学报,2003,(1):1-4.
6 朱小婷,林国龙.基于BP神经网络的干散货航运市场运价预警[J].水运管理, 2012, 34(4):14-17.
7 Chen T, Guestrin C. XGBoost:A Scalable Tree Boosting System[C]//The 22nd ACM SIGKDD International Conference,2016:785-794.
8 Introduction to Boosted Trees, Tianqi Chen, 2014.[EB/OL].http://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf,2014-08-09.
9 王晓佳,杨善林,陈志强.大数据时代下的情报分析与挖掘技术研究:电信客户流失情况分析[J].情报学报,2013,32(6):564-574.