Unsupervised feature selection through Gram-Schmidt orthogonalization

Unsupervised feature selection through Gram-Schmidt orthogonalization—A word co-occurrence perspective

详细信息查看全文

作者：Deqing Wang^a ; ^{dqwang@buaa.edu.cn" class="auth_mail" title="E-mail the corresponding author}Author Vitae ; Hui Zhang^a ; ^{hzhang@nlsde.buaa.edu.cn" class="auth_mail" title="E-mail the corresponding author}Author Vitae ; Rui Liu^a ; ^{liurui@nlsde.buaa.edu.cn" class="auth_mail" title="E-mail the corresponding author}Author Vitae ; Xianglong Liu^a ; ^{xlong_liu@nlsde.buaa.edu.cn" class="auth_mail" title="E-mail the corresponding author}Author Vitae ; Jing Wang^b ; ^{jim08@buaa.edu.cn" class="auth_mail" title="E-mail the corresponding author}Author Vitae
关键词：Feature selection ; Random projection ; Gram&ndash ; Schmidt orthogonalization ; Basis features ; Word co-occurrence matrix
刊名：Neurocomputing
出版年：2016
出版时间：15 January 2016
年：2016
卷：173
期：part_P3
页码：845-854
全文大小：1163 K

文摘

Feature selection is a key step in many machine learning applications, such as categorization, and clustering. Especially for text data, the original document-term matrix is high-dimensional and sparse, which affects the performance of feature selection algorithms. Meanwhile, labeling training instance is time-consuming and expensive. So unsupervised feature selection algorithms have attracted more attention. In this paper, we propose an unsupervised feature selection algorithm through fae8fd8f4c9c0e0ce7a2d86a6f">

\underset{̲}{R}

andom

\underset{̲}{P}

rojection and

\underset{̲}{G}

ram-

\underset{̲}{G}

chmidt

\underset{̲}{O}

rthogonalization (RP-GSO) from the word co-occurrence matrix. The RP-GSO algorithm has three advantages: (1) it takes as input dense word co-occurrence matrix, avoiding the sparseness of original document-term matrix; (2) it selects “basis features” by Gram–Schmidt process, guaranteeing the orthogonalization of feature space; and (3) it adopts random projection to speed up GS process. Extensive experimental results show our proposed RP-GSO approach achieves better performance comparing against supervised and unsupervised feature selection methods in text classification and clustering tasks.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700