基于优化ID3的井漏类型分类算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Classification Algorithm of Well Leakage Type Based on Optimized ID3
  • 作者:李建 ; 付小斌 ; 吴媛媛
  • 英文作者:LI Jian;FU Xiaobin;WU Yuanyuan;School of Computer Science,Southwest Petroleum University;
  • 关键词:井漏类型 ; ID3算法 ; 关联度函数比 ; 属性重要度 ; 多值偏向
  • 英文关键词:well leakage type;;ID3 algorithm;;association function ratio;;attribute importance;;multi-value bias
  • 中文刊名:JSJC
  • 英文刊名:Computer Engineering
  • 机构:西南石油大学计算机科学学院;
  • 出版日期:2018-02-28 13:52
  • 出版单位:计算机工程
  • 年:2019
  • 期:v.45;No.497
  • 基金:国家科技重大专项(2016ZX05020-006)
  • 语种:中文;
  • 页:JSJC201902048
  • 页数:6
  • CN:02
  • ISSN:31-1289/TP
  • 分类号:296-301
摘要
决策树算法用于井漏分类时,由于井漏数据离散化后多值属性占比较大,且具有多值偏向的缺点,分类效果不理想。为此,提出一种基于改进ID3的AFIV-ID3算法。在ID3的基础上引入属性重要度计算新的信息熵,属性重要度大小由决策者依靠先验或领域知识决定。在信息增益计算中加入关联度函数比,对信息增益值做出修正。AFIV-ID3算法克服了ID3多值偏向的缺点,提高了数据中重要属性的权重,从而提升井漏类型分类精度。4组UCI数据集和真实井漏数据测试结果表明,该算法的分类精度优于ID3和C4. 5算法,并能够将人工经验法不稳定的分类精度提高至约72. 23%。
        When the decision tree algorithm is used in well leakage classification,the classification effect is not satisfactory because of the large proportion of multi-valued attributes after the well leakage data is discretized,and because the algorithm has the shortcoming of multi-value bias. Therefore,an improved AFIV-ID3 algorithm based on ID3 is proposed. On the basis of ID3,attribute importance is introduced to calculate new information entropy. Attribute importance is determined by the decision maker depending on prior knowledge or domain knowledge. The association function ratio is added to the information gain calculation to modify the information gain value. The AFIV-ID3 algorithm overcomes the shortcoming of ID3 multi-value bias,improves the weight of important attributes in the data,and effectively improves the classification accuracy of well leakage type. The test results of four UCI data sets and real well leakage data show that the classification accuracy of this algorithm is better than that of ID3 and C4. 5 algorithm,and the unstable classification accuracy of artificial experience method can be improved to about 72. 23%.
引文
[1]蔡汶君.基于神经网络融合技术的钻井井漏诊断模型研究[D].成都:西南石油大学,2014.
    [2]徐哲,李建,王兵,等.基于贝叶斯网络的钻井井漏问题研究[J].石油天然气学报,2013,35(12):125-129.
    [3]QUINLAN J R.Induction of decision trees[J].Machine Learning,1986,1(1):81-106.
    [4]WAGACHA P W.Induction of decision trees[EB/OL].[2017-11-28].http://erepository.uonbi.ac.ke/bitstream/handle/11295/44263/decisionTrees.pdf?sequence=1.
    [5]韩松来,张辉,周华平.基于关联度函数的决策树分类算法[J].计算机应用,2005,25(11):2655-2657.
    [6]韩松来.基于关联度函数的决策树分类算法研究[D].长沙:国防科学技术大学,2005.
    [7]LUO H,CHEN Y,ZHANG W.An improved ID3algorithm based on attribute importance-weighted[C]//Proceedings of the 2nd International Workshop on Database Technology and Applications.Washington D.C.,USA:IEEE Press,2010:1-4.
    [8]陆秋,程小辉.基于属性相似度的决策树算法[J].计算机工程,2009,35(6):82-84.
    [9]胡学钢,李楠.基于属性重要度的随机决策树学习算法[J].合肥工业大学学报(自然科学版),2007,30(6):681-685.
    [10]张琳,陈燕,李桃迎,等.决策树分类算法研究[J].计算机工程,2011,37(13):66-67.
    [11]郝胜轩,宋宏,周晓锋.基于近邻噪声处理的KNN缺失数据填补算法[J].计算机仿真,2014,31(7):264-268.
    [12]王小巍,蒋玉明.决策树ID3算法的分析与改进[J].计算机工程与设计,2011,32(9):3069-3072.
    [13]郑捷.机器学习算法原理与编程实践[M].北京:电子工业出版社,2015.
    [14]周志华.机器学习[M].北京:清华大学出版社,2016.
    [15]温雪岩,陈家男,景维鹏,等.面向不平衡数据集分类模型的优化研究[J].计算机工程,2018,44(4):268-273,293.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700