摘要
决策树算法用于井漏分类时,由于井漏数据离散化后多值属性占比较大,且具有多值偏向的缺点,分类效果不理想。为此,提出一种基于改进ID3的AFIV-ID3算法。在ID3的基础上引入属性重要度计算新的信息熵,属性重要度大小由决策者依靠先验或领域知识决定。在信息增益计算中加入关联度函数比,对信息增益值做出修正。AFIV-ID3算法克服了ID3多值偏向的缺点,提高了数据中重要属性的权重,从而提升井漏类型分类精度。4组UCI数据集和真实井漏数据测试结果表明,该算法的分类精度优于ID3和C4. 5算法,并能够将人工经验法不稳定的分类精度提高至约72. 23%。
When the decision tree algorithm is used in well leakage classification,the classification effect is not satisfactory because of the large proportion of multi-valued attributes after the well leakage data is discretized,and because the algorithm has the shortcoming of multi-value bias. Therefore,an improved AFIV-ID3 algorithm based on ID3 is proposed. On the basis of ID3,attribute importance is introduced to calculate new information entropy. Attribute importance is determined by the decision maker depending on prior knowledge or domain knowledge. The association function ratio is added to the information gain calculation to modify the information gain value. The AFIV-ID3 algorithm overcomes the shortcoming of ID3 multi-value bias,improves the weight of important attributes in the data,and effectively improves the classification accuracy of well leakage type. The test results of four UCI data sets and real well leakage data show that the classification accuracy of this algorithm is better than that of ID3 and C4. 5 algorithm,and the unstable classification accuracy of artificial experience method can be improved to about 72. 23%.
引文
[1]蔡汶君.基于神经网络融合技术的钻井井漏诊断模型研究[D].成都:西南石油大学,2014.
[2]徐哲,李建,王兵,等.基于贝叶斯网络的钻井井漏问题研究[J].石油天然气学报,2013,35(12):125-129.
[3]QUINLAN J R.Induction of decision trees[J].Machine Learning,1986,1(1):81-106.
[4]WAGACHA P W.Induction of decision trees[EB/OL].[2017-11-28].http://erepository.uonbi.ac.ke/bitstream/handle/11295/44263/decisionTrees.pdf?sequence=1.
[5]韩松来,张辉,周华平.基于关联度函数的决策树分类算法[J].计算机应用,2005,25(11):2655-2657.
[6]韩松来.基于关联度函数的决策树分类算法研究[D].长沙:国防科学技术大学,2005.
[7]LUO H,CHEN Y,ZHANG W.An improved ID3algorithm based on attribute importance-weighted[C]//Proceedings of the 2nd International Workshop on Database Technology and Applications.Washington D.C.,USA:IEEE Press,2010:1-4.
[8]陆秋,程小辉.基于属性相似度的决策树算法[J].计算机工程,2009,35(6):82-84.
[9]胡学钢,李楠.基于属性重要度的随机决策树学习算法[J].合肥工业大学学报(自然科学版),2007,30(6):681-685.
[10]张琳,陈燕,李桃迎,等.决策树分类算法研究[J].计算机工程,2011,37(13):66-67.
[11]郝胜轩,宋宏,周晓锋.基于近邻噪声处理的KNN缺失数据填补算法[J].计算机仿真,2014,31(7):264-268.
[12]王小巍,蒋玉明.决策树ID3算法的分析与改进[J].计算机工程与设计,2011,32(9):3069-3072.
[13]郑捷.机器学习算法原理与编程实践[M].北京:电子工业出版社,2015.
[14]周志华.机器学习[M].北京:清华大学出版社,2016.
[15]温雪岩,陈家男,景维鹏,等.面向不平衡数据集分类模型的优化研究[J].计算机工程,2018,44(4):268-273,293.