A cost sensitive decision tree algorithm based on weighted class distribution with batch deleting attribute mechanism
详细信息    查看全文
文摘
Minimal cost classification is an important issue in data mining and machine learning. Recently, many enhanced algorithms based on the C4.5 algorithm have been proposed to tackle this issue. One disadvantage in these methods is that they are inefficient for medium or large data sets. To overcome this problem, we present a cost-sensitive decision tree algorithm based on weighted class distribution with a batch deleting attribute mechanism (BDADT). In the BDADT algorithm, a heuristic function is designed for evaluating attributes in node selection. This contains a weighted information gain ratio, a test cost, and a user-specified non-positive parameter for adjusting the effect of the test cost. Meanwhile, a batch deleting attribute mechanism is incorporated into our algorithm. This mechanism deletes redundant attributes according to the values of the heuristic function in the process of assigning nodes to improve the efficiency of decision tree construction. Experiments are conducted on 20 UCI data sets with representative test cost normal distribution to evaluate the proposed BDADT algorithm. The experimental results show that the average total costs obtained by the proposed algorithm are smaller than the existing CS-C4.5 and CS-GainRatio algorithms. Furthermore, the proposed algorithm significantly increases the efficiency of cost-sensitive decision tree construction.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700