Study of Engineered Features and Learning Features in Machine Learning

Study of Engineered Features and Learning Features in Machine Learning - A Case Study in Document Classification

详细信息查看全文

关键词：Deep learning ; Feature extraction ; Autoencoder ; Restricted Boltzmann Machine ; Semantic association ; N ; gram Model
刊名：Lecture Notes in Computer Science
出版年：2017
出版时间：2017
年：2017
卷：10127
期：1
页码：161-172
丛书名：Intelligent Human Computer Interaction
ISBN：978-3-319-52503-7
卷排序：10127

文摘

Document classification is challenging due to handling of voluminous and highly non-linear data, generated exponentially in the era of digitization. Proper representation of documents increases efficiency and performance of classification, ultimate goal of retrieving information from large corpus. Deep neural network models learn features for document classification unlike the engineered feature based approaches where features are extracted or selected from the data. In the paper we investigate performance of different classifiers based on the features obtained using two approaches. We apply deep autoencoder for learning features while engineering features are extracted by exploiting semantic association within the terms of the documents. Experimentally it has been observed that learning feature based classification always perform better than the proposed engineering feature based classifiers.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700