Subsampling the Concurrent AdaBoost Algorithm: An Efficient Approach for Large Datasets

详细信息查看全文

关键词：Concurrent AdaBoost ; Subsampling ; Classification ; Machine Learning ; Large data sets classification
刊名：Lecture Notes in Computer Science
出版年：2017
出版时间：2017
年：2017
卷：10125
期：1
页码：318-325
丛书名：Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
ISBN：978-3-319-52277-7
卷排序：10125

文摘

In this work we propose a subsampled version of the Concurrent AdaBoost algorithm in order to deal with large datasets in an efficient way. The proposal is based on a concurrent computing approach focused on improving the distribution weight estimation in the algorithm, hence obtaining better capacity of generalization. On each round, we train in parallel several weak hypotheses, and using a weighted ensemble we update the distribution weights of the following boosting rounds. Instead of creating resamples of size equal to the original dataset, we subsample the datasets in order to obtain a speed-up in the training phase. We validate our proposal with different resampling sizes using 3 datasets, obtaining promising results and showing that the size of the resamples does not affect considerably the performance of the algorithm, but the execution time improves greatly.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700