Online adaptive decision trees based on concentration inequalities
详细信息    查看全文
文摘
Classification trees are a powerful tool for mining non-stationary data streams. In these situations, massive data are constantly generated at high speed and the underlying target function can change over time. The iadem family of algorithms is based on Hoeffding’s and Chernoff’s bounds and induces online decision trees from data streams, but is not able to handle concept drift. This study extends this family to deal with time-changing data streams. The new online algorithm, named iadem-3, performs two main actions in response to a concept drift. Firstly, it resets the variables affected by the change and maintains unbroken the structure of the tree, which allows for changes in which consecutive target functions are very similar. Secondly, it creates alternative models that replace parts of the main tree when they significantly improve the accuracy of the model, thereby rebuilding the main tree if needed. An online change detector and a non-parametric statistical test based on Hoeffding’s bounds are used to guarantee this significance. A new pruning method is also incorporated in iadem-3, making sure that all split tests previously installed in decision nodes are useful. The learning model is also viewed as an ensemble of classifiers, and predictions of the main and alternative models are combined to classify unlabeled examples. iadem-3 is empirically compared with various well-known decision tree induction algorithms for concept drift detection. We empirically show that our new algorithm often reaches higher levels of accuracy with smaller decision tree models, maintaining the processing time bounded, irrespective of the number of instances processed.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700