A cross-media distance metric learning framework based on multi-view correlation mining and matching

设为首页

收藏本站

网站地图 | English | 公务邮箱

About the library

Background
History
Leadership
Organization

Readers' Guide

Opening Hours
Collections
Help Via Email

Publications

Electronic Information Resources

A cross-media distance metric learning framework based on multi-view correlation mining and matching

详细信息查看全文

作者：Hong Zhang ; Xingyu Gao ; Ping Wu ; Xin Xu
关键词：Cross ; media ; Distance metric ; Sparse feature selection ; Multi ; view matching
刊名：World Wide Web
出版年：2016
出版时间：March 2016
年：2016
卷：19
期：2
页码：181-197
全文大小：1,280 KB
参考文献：1.Bao, L., Cao, J., Zhang, Y., Li, J., Chen, M., Hauptmann, A.G.: Explicit and implicit concept-based video retrieval with bipartite graph propagation model. In: Proceedings of the 18th International Conference on Multimedia, pp 939–942 (2010)
2.Barnard, K., Duygulu, P., Forsyth, D.A., de Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003)MATH
3.Feng, S., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: CVPR (2), pp 1002–1009 (2004)
4.Feng, Y.F., Xiao, J., Zhuang, Y.T., Liu, X.M.: Adaptive unsupervised mutli-view feature selection for visual concept recognition. In: Proceedings of the 11-th Asian Conference on Computer Vision (ACCV) (2012)
5.Gupta, S.K., Phung, D.Q., Adams, B., Tran, T., Venkatesh, S.: Nonnegative shared subspace learning and its application to social media retrieval. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1169–1178 (2010)
6.Han, Y.H., Wu, F., Tao, D.C., Shao, J., Zhuang, Y.T., Jiang, J.M.: Sparse unsupervised dimensionality reduction for multiple view data. IEEE Trans. Circuits Syst. Video Technol. 22(10), 1485–1496 (2012)CrossRef
7.Han, Y.H., Wu, F., Zhuang, Y.T., He, X.F.: Multi-label transfer learning with sparse representation. IEEE Trans. Circuits Syst. Video Technol. (IEEE T-CSVT) 20(8), 1110–1121 (2010)CrossRef
8.Han, Y.H., Yang, Y., Ma, Z.G., Shen, H.Q., Sebe, N., Zhou, X.F.: Image attribute adaptation. IEEE Trans. Multimedia (IEEE T-MM) 16(4), 1115–1126 (2014)CrossRef
9.Hardoon, D.R., Shawe-Taylor, J.: Sparse canonical correlation analysis. Mach. Learn. 83(3), 331–353 (2011)CrossRef MathSciNet MATH
10.Hardoon, D.R., Szedmàk, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)CrossRef MATH
11.Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: state of the art and challenges. TOMCCAP 2(1), 1–19 (2006)CrossRef
12.Liu, Y., Wu, F., Zhuang, Y., Xiao, J.: Active post-refined multimodality video semantic concept detection with tensor representation. In: Proceedings of the 16th International Conference on Multimedia, pp 91–100 (2008)
13.Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)CrossRef
14.Ma, Q., Nadamoto, A., Tanaka, K.: Complementary information retrieval for cross-media news content. Inf. Syst. 31(7), 659–678 (2006)CrossRef
15.Shrager, J., Hogg, T., Huberman, B.A.: Observation of phase transitions in spreading activation networks. Science 236(4805), 1092–1094 (1987)CrossRef
16.Snoek, C., Worring, M., Smeulders, A.W.M.: Early versus late fusion in semantic video analysis. In: Proceedings of the 13th ACM International Conference on Multimedia, pp 399–402 (2005)
17.Sun, T., Chen, S.: Locality preserving cca with applications to data visualization and pose estimation. Image Vis. Comput. 25(5), 531–543 (2007)CrossRef MATH
18.Tan, M., Wang, L., Tsang, I.W.: Learning sparse SVM for feature selection on very high dimensional datasets. In: Proceedings of the 27th International Conference on Machine Learning (ICML), pp 1047–1054 (2010)
19.Tang, J., Yan, S., Hong, R., Qi, G., Chua, T.: Inferring semantic concepts from community-contributed images and noisy tags. In: Proceedings of the 17th International Conference on Multimedia, pp 223–232 (2009)
20.Vogt, C.C., Cottrell, G.W.: Fusion via a linear combination of scores. Inf. Retr. 1(3), 151–173 (1999)CrossRef
21.Wang, Z., Feng, Y.F., Yang, X.S., Zhang, J.J.: Adaptive multi-view feature selection for human motion retrieval. Signal Process. (2014). doi:10.1016/j.sigpro.2014.11.015
22.Wu, Y., Chang, E.Y., Chang, K.C., Smith, J.R.: Optimal multimodal fusion for multimedia data analysis. In: Proceedings of the 12th ACM International Conference on Multimedia, pp 572–579 (2004)
23.Xi, W., Fox, E.A., Fan, W., Zhang, B., Chen, Z., Yan, J., Zhuang, D.: Simfusion: measuring similarity using unified relationship matrix. In: SIGIR 2005: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 130–137, Salvador (2005)
24.Xiao, J., Feng, Y.F., Ji, M.M., Zhuang, Y.T.: Fast view-based 3D model retrieval via unsupervised multiple feature fusion and online projection learning. Signal Process. (2014). doi:10.1016/j.sigpro.2014.11.020
25.Yang, Y., Ma, Z.G., Hauptmann, A., Sebe, N.: Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans. Multimedia 15(3), 661–669 (2013)CrossRef
26.Yang, Y., Nie, F., Xu, D., Luo, J., Zhuang, Y., Pan, Y.: A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 723–742 (2012)CrossRef
27.Yang, Y., Song, J., Huang, Z., Ma, Z., Sebe, N., Hauptmann, A.G.: Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans. Multimedia 15(3), 572–581 (2013)CrossRef
28.Yang, S., Yuan, L., Lai, Y., Shen, X., Wonka, P., Ye, J.: Feature grouping and selection over an undirected graph. In: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pp 922–930 (2012)
29.Yang, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multimedia 10(3), 437–446 (2008)CrossRef
30.Yu, J., Tao, D., Wang, M.: Adaptive hypergraph learning and its application in image classification. IEEE Trans. Image Process. 21(7), 3262–3272 (2012)CrossRef MathSciNet
31.Zhang, H., Liu, Y., Ma, Z.: Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval. Neurocomputing 119, 10–16 (2013)CrossRef
32.Zhang, H., Yu, J., Wang, M., Liu, Y.: Semi-supervised distance metric learning based on local linear regression for data clustering. Neurocomputing 93, 100–105 (2012)CrossRef
33.Zhang, H., Zha, Z., Yang, Y., Yan, S., Gao, Y., Chua, T.: Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In: ACM Multimedia Conference, MM ’13, pp 33–42, Barcelona (2013)
34.Zhang, H., Zhuang, Y., Wu, F.: Cross-modal correlation learning for clustering on image-audio dataset. In: Proceedings of the 15th International Conference on Multimedia, pp 273–276, Augsburg (2007)
35.Zhang, J.G., Han, Y.H., Tang, J.H., Hu, Q.H., Jiang, J.M.: What can we learn about motion videos from still images?. In: Proceedings of the 17th International Conference on Multimedia, pp 973–976 (2014)
36.Zhuang, Y., Yang, Y., Wu, F.: Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans. Multimedia 10(2), 221–229 (2008)CrossRef
37.Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems NIPS (2003)
38.Zhou, D., et al.: Ranking on data manifolds. Advances in Neural Information Processing Systems NIPS (2003)
作者单位：Hong Zhang (1) (2)
Xingyu Gao (3)
Ping Wu (1)
Xin Xu (1)

1. College of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430081, China
2. Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan, China
3. Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
刊物类别：Computer Science
刊物主题：Information Systems Applications and The Internet
Database Management
Operating Systems
出版者：Springer Netherlands
ISSN：1573-1413

文摘

With the explosion of multimedia data, it is usual that different multimedia data often coexist in web repositories. Accordingly, it is more and more important to explore underlying intricate cross-media correlation instead of single-modality distance measure so as to improve multimedia semantics understanding. Cross-media distance metric learning focuses on correlation measure between multimedia data of different modalities. However, the existence of content heterogeneity and semantic gap makes it very challenging to measure cross-media distance. In this paper, we propose a novel cross-media distance metric learning framework based on sparse feature selection and multi-view matching. First, we employ sparse feature selection to select a subset of relevant features and remove redundant features for high-dimensional image features and audio features. Secondly, we maximize the canonical coefficient during image-audio feature dimension reduction for cross-media correlation mining. Thirdly, we further construct a Multi-modal Semantic Graph to find embedded manifold cross-media correlation. Moreover, we fuse the canonical correlation and the manifold information into multi-view matching which harmonizes different correlations with an iteration process and build Cross-media Semantic Space for cross-media distance measure. The experiments are conducted on image-audio dataset for cross-media retrieval. Experiment results are encouraging and show that the performance of our approach is effective. Keywords Cross-media Distance metric Sparse feature selection Multi-view matching