Subsampling the Concurrent AdaBoost Algorithm: An Efficient Approach for Large Datasets
详细信息    查看全文
文摘
In this work we propose a subsampled version of the Concurrent AdaBoost algorithm in order to deal with large datasets in an efficient way. The proposal is based on a concurrent computing approach focused on improving the distribution weight estimation in the algorithm, hence obtaining better capacity of generalization. On each round, we train in parallel several weak hypotheses, and using a weighted ensemble we update the distribution weights of the following boosting rounds. Instead of creating resamples of size equal to the original dataset, we subsample the datasets in order to obtain a speed-up in the training phase. We validate our proposal with different resampling sizes using 3 datasets, obtaining promising results and showing that the size of the resamples does not affect considerably the performance of the algorithm, but the execution time improves greatly.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700