基于机器学习的焦油预测模型研究

英文论文题名：Study on Predicting Models for Tar Yields of Cigarettes Based on Machine Learning
论文作者：郭东锋 ; 刘新民 ; 姚忠达 ; 舒俊生 ; 胡海洲
英文论文作者：GUO Dong-feng ; Liu Xin-min ; Yao Zhong-da ; Shu Jun-sheng ; HU Hai-zhou ; Technology Center of Anhui Cigarette Industrial Company Co. ; Ltd. ; Tobacco Research Institute of Chinese Academy of Agricultural Sciences
年：2016
作者机构：安徽中烟工业有限责任公司技术中心;中国农业科学院烟草研究所;
论文关键词：机器学习 ; 焦油 ; 回归模型 ; 预测
英文论文关键词：machine learning ; tar yield ; regression models ; predict
会议召开时间：2016-12-01
会议录名称：中国烟草学会2016年度优秀论文汇编——烟草工业主题
语种：中文
分类号：TS452
学会代码：ZGYG
学会名称：中国烟草学会
页数：6
文件大小：1658k
原文格式：D

摘要

为研究卷烟焦油预测模型,本研究以焦油的释放量为研究对象,运用不同的回归方法进行焦油预测研究,以各个模型的标准化均方误差为评判尺度,对各个模型的预测效果进行了比较,结果表明:各模型的预测精度差别较大,整体来看机器学习方法对于焦油的预测精度较高,其中以随机森林算法回归对于焦油的预测精度最高,表现出较高的预测精度和良好的稳定性,其次表现较好的机器学习算法为支持向量机回归方法。因此,在焦油预测应用或研究中可以运用随机森林或其它机器学习方法对焦油进行建模预测。
For the purpose to lift the accuracy of predicting tar yield in cigarettes, the tar was set as the research observation. There were several machine learning methods and ordinary liner regression which were used to predicting tar yields. The standardized mean square error was set as the criterion to judge the model's predicting accuracy. The results indicated that there were significant difference in each regression model, but the machine learning methods showed higher accuracy of predicting tar yield than that of traditional simple liner regression. Random forest regression performance best for predicting tar yield in these models. The capability of random forest regression showed stably and precisely. The second model should be support vector machine regression. So machine learning methods could be applied widely in the tar and other tobacco research works to improve the predicting ability.

引文

[1]汪银生,路国行,王晓婷.论“反吸烟运动”在烟草科技进步中的地位和作用[J].中国烟草科学,1999(4):44-46.
    [2]雷樟泉,杨进,储国海,等.我国卷烟降焦历程回顾、现状与展望[J].烟草科技,2003(5):29-31.
    [3]朱尊权,穆怀静,方淑杰.我国卷烟焦油的现状和问题[J].烟草科技,1987(6):18-19.
    [4]王允白.烤烟原料总粒相物与烟叶内在化学成分关系及预测模型研究[J].中国烟草学报,1998(2):1-5.
    [5]梁德成,王德吉,邱道尹,等.卷烟焦油预测研究[J].东南大学学报(自然科学版),2009(S1):195-198.
    [6]殷发强,李丹,宋旭艳,等.投影追踪回归(PPR)法建立卷烟焦油预测数学模型[J].烟草科技,2009(6):15-18.
    [7]张志刚,王二彬,苏东赢.卷烟常规化学成分与焦油的线性回归分析[J].烟草科技,2003(11):32-33.
    [8]刘华.卷烟材料与焦油量关系的回归设计与分析[J].烟草科技,2008(5):9-11.
    [9]王德吉,李广才,栗卫军.基于信息几何的卷烟焦油SVM(支持相量机)预测[J].中国烟草学报,2009(4):22-25.
    [10]Corinna Cortes,Vladimir Vapnik.Support Vector Networks[J].Machine Learning.1995,20(3):273-297.
    [11]Breiman L.Bagging predictors[J].Machine Learning,1996,24(2):123-140.
    [12]Alexander Gammerman.Machine Learning:Progress and Prospects.1996.
    [13]Nilsson J.Introduction to Machine Learning.1996.
    [14]http://www.nyuinformatics.org/downloads/supplements/SVM_Tutorial_2010/Final_WB.pdf.
    [15]http://staff.ustc.edu.cn/~ketang/PPT/PR Lec5.pdf.
    [16]L.Breiman.Random forests.Mach.Learning,2001,45(1):5-32.
    [17]G.R?tsch,T.Onoda,K.-R.Müller.Soft Margins for Ada Boost[J].Machine Learning,2001(3).
    [18]Dietterich,T.G.An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees:Bagging,Boosting,and Randomization.Machine Learning,2000.
    [19]Chernick M R.Bootstrap Methods:A Guide for Practition-ers and Researchers,2007.
    [20]Freund Y,Schapire RE.Experiments with a new boosting algorithm.Proceedings of the Thirteenth International Conference on Machine Learning.1996.
    [21]L.Breiman,J.Friedman,R.Olshen,and C.Stone.Classification and Regression Trees.1984
    [22]Mooney C Z,Duval R D.Bootstrapping:A Nonparametric Approach to Statistical Inference.1993
    [23]Good P.Permutation,Parametric and Bootstrap Tests of Hypotheses..2005
    [24]Jiaqi Wang,Qing Tao,Jue Wang.Kernel projection algorithm for large-scale SVM problems[J].Journal of Computer Science and Technology.2002(5)
    [25]Joachims T.Transductive inference for text classification using support vector machines.Proc 16th Int’l Conf Machine Learning.1999.
    [26]Olivier Chapelle,Vladimir Vapnik,Olivier Bousquet,Sayan Mukherjee.Choosing Multiple Parameters for Support Vector Machines[J].Machine Learning,2002(1-3)
    [27]Sebald D J,Bucklew J A.Support Vector Machine Techniques for Nonlinear Equalization.IEEE Transactions on Signal Processing,2000.
    [28]Valiant L G.A Theory of Learnable.Communications of the ACM,1984,27(11):1134-1142.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700