Learning robust uniform features for cross-media social data by using cross autoencoders

详细信息查看全文

作者：Quan Guo^a ; ^{guoquanscu@gmail.com" class="auth_mail" title="E-mail the corresponding author} ; Jia Jia^b ; ^{jjia@mail.tsinghua.edu.cn" class="auth_mail" title="E-mail the corresponding author} ; Guangyao Shen^b ; ^{thusgy2012@gmail.com" class="auth_mail" title="E-mail the corresponding author} ; Lei Zhang ; ^a ; ^{leizhang@scu.edu.cn" class="auth_mail" title="E-mail the corresponding author} ; Lianhong Cai^b ; ^{clh-dcs@tsinghua.edu.cn" class="auth_mail" title="E-mail the corresponding author} ; Zhang Yi^a ; ^{zhangyi@scu.edu.cn" class="auth_mail" title="E-mail the corresponding author}
关键词：Cross-media ; Social data ; Cross modality ; Deep learning ; Autoencoder ; Convolutional network
刊名：Knowledge-Based Systems
出版年：2016
出版时间：15 June 2016
年：2016
卷：102
期：Complete
页码：64-75
全文大小：1885 K

文摘

Cross-media analysis exploits social data with different modalities from multiple sources simultaneously and synergistically to discover knowledge and better understand the world. There are two levels of cross-media social data. One is the element, which is made up of text, images, voice, or any combinations of modalities. Elements from the same data source can have different modalities. The other level of cross-media social data is the new notion of aggregative subject (AS)— a collection of time-series social elements sharing the same semantics (i.e., a collection of tweets, photos, blogs, and news of emergency events). While traditional feature learning methods focus on dealing with single modality data or data fused across multiple modalities, in this study, we systematically analyze the problem of feature learning for cross-media social data at the previously mentioned two levels. The general purpose is to obtain a robust and uniform representation from the social data in time-series and across different modalities. We propose a novel unsupervised method for cross-modality element-level feature learning called cross autoencoder (CAE). CAE can capture the cross-modality correlations in element samples. Furthermore, we extend it to the AS using the convolutional neural network (CNN), namely convolutional cross autoencoder (CCAE). We use CAEs as filters in the CCAE to handle cross-modality elements and the CNN framework to handle the time sequence and reduce the impact of outliers in AS. We finally apply the proposed method to classification tasks to evaluate the quality of the generated representations against several real-world social media datasets. In terms of accuracy, CAE gets 7.33% and 14.31% overall incremental rates on two element-level datasets. CCAE gets 11.2% and 60.5% overall incremental rates on two AS-level datasets. Experimental results show that the proposed CAE and CCAE work well with all tested classifiers and perform better than several other baseline feature learning methods.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700