Centralized vs. distributed feature selection methods based on data complexity measures

详细信息查看全文

作者：L. Mor&aacute ; n-Fern&aacute ; ndez ; ^{laura.moranf@udc.es} ; V. Boló ; n-Canedo ^{vbolon@udc.es} ; A. Alonso-Betanzos ^{ciamparo@udc.es}
关键词：Distributed learning ; Feature selection ; Data complexity measures ; Classification
刊名：Knowledge-Based Systems
出版年：2017
出版时间：1 February 2017
年：2017
卷：117
期：Complete
页码：27-45
全文大小：1072 K
卷排序：117

文摘

A methodology for distributing the process of feature selection based on several data complexity measures is proposed. We tackled the two strategies to partition the datasets: horizontal (i.e. by samples) and vertical (i.e. by features). We present an experimental study on 11 datasets (five of them microarrays) in terms of number of selected features, classification accuracy and running time. The novel procedures are able to reduce significantly the running time while maintaining (or even improving) the classification performance.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700