A cross-media distance metric learning framework based on multi-view correlation mining and matching
详细信息    查看全文
  • 作者:Hong Zhang ; Xingyu Gao ; Ping Wu ; Xin Xu
  • 关键词:Cross ; media ; Distance metric ; Sparse feature selection ; Multi ; view matching
  • 刊名:World Wide Web
  • 出版年:2016
  • 出版时间:March 2016
  • 年:2016
  • 卷:19
  • 期:2
  • 页码:181-197
  • 全文大小:1,280 KB
  • 参考文献:1.Bao, L., Cao, J., Zhang, Y., Li, J., Chen, M., Hauptmann, A.G.: Explicit and implicit concept-based video retrieval with bipartite graph propagation model. In: Proceedings of the 18th International Conference on Multimedia, pp 939–942 (2010)
    2.Barnard, K., Duygulu, P., Forsyth, D.A., de Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003)MATH
    3.Feng, S., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: CVPR (2), pp 1002–1009 (2004)
    4.Feng, Y.F., Xiao, J., Zhuang, Y.T., Liu, X.M.: Adaptive unsupervised mutli-view feature selection for visual concept recognition. In: Proceedings of the 11-th Asian Conference on Computer Vision (ACCV) (2012)
    5.Gupta, S.K., Phung, D.Q., Adams, B., Tran, T., Venkatesh, S.: Nonnegative shared subspace learning and its application to social media retrieval. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1169–1178 (2010)
    6.Han, Y.H., Wu, F., Tao, D.C., Shao, J., Zhuang, Y.T., Jiang, J.M.: Sparse unsupervised dimensionality reduction for multiple view data. IEEE Trans. Circuits Syst. Video Technol. 22(10), 1485–1496 (2012)CrossRef
    7.Han, Y.H., Wu, F., Zhuang, Y.T., He, X.F.: Multi-label transfer learning with sparse representation. IEEE Trans. Circuits Syst. Video Technol. (IEEE T-CSVT) 20(8), 1110–1121 (2010)CrossRef
    8.Han, Y.H., Yang, Y., Ma, Z.G., Shen, H.Q., Sebe, N., Zhou, X.F.: Image attribute adaptation. IEEE Trans. Multimedia (IEEE T-MM) 16(4), 1115–1126 (2014)CrossRef
    9.Hardoon, D.R., Shawe-Taylor, J.: Sparse canonical correlation analysis. Mach. Learn. 83(3), 331–353 (2011)CrossRef MathSciNet MATH
    10.Hardoon, D.R., Szedmàk, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)CrossRef MATH
    11.Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: state of the art and challenges. TOMCCAP 2(1), 1–19 (2006)CrossRef
    12.Liu, Y., Wu, F., Zhuang, Y., Xiao, J.: Active post-refined multimodality video semantic concept detection with tensor representation. In: Proceedings of the 16th International Conference on Multimedia, pp 91–100 (2008)
    13.Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)CrossRef
    14.Ma, Q., Nadamoto, A., Tanaka, K.: Complementary information retrieval for cross-media news content. Inf. Syst. 31(7), 659–678 (2006)CrossRef
    15.Shrager, J., Hogg, T., Huberman, B.A.: Observation of phase transitions in spreading activation networks. Science 236(4805), 1092–1094 (1987)CrossRef
    16.Snoek, C., Worring, M., Smeulders, A.W.M.: Early versus late fusion in semantic video analysis. In: Proceedings of the 13th ACM International Conference on Multimedia, pp 399–402 (2005)
    17.Sun, T., Chen, S.: Locality preserving cca with applications to data visualization and pose estimation. Image Vis. Comput. 25(5), 531–543 (2007)CrossRef MATH
    18.Tan, M., Wang, L., Tsang, I.W.: Learning sparse SVM for feature selection on very high dimensional datasets. In: Proceedings of the 27th International Conference on Machine Learning (ICML), pp 1047–1054 (2010)
    19.Tang, J., Yan, S., Hong, R., Qi, G., Chua, T.: Inferring semantic concepts from community-contributed images and noisy tags. In: Proceedings of the 17th International Conference on Multimedia, pp 223–232 (2009)
    20.Vogt, C.C., Cottrell, G.W.: Fusion via a linear combination of scores. Inf. Retr. 1(3), 151–173 (1999)CrossRef
    21.Wang, Z., Feng, Y.F., Yang, X.S., Zhang, J.J.: Adaptive multi-view feature selection for human motion retrieval. Signal Process. (2014). doi:10.​1016/​j.​sigpro.​2014.​11.​015
    22.Wu, Y., Chang, E.Y., Chang, K.C., Smith, J.R.: Optimal multimodal fusion for multimedia data analysis. In: Proceedings of the 12th ACM International Conference on Multimedia, pp 572–579 (2004)
    23.Xi, W., Fox, E.A., Fan, W., Zhang, B., Chen, Z., Yan, J., Zhuang, D.: Simfusion: measuring similarity using unified relationship matrix. In: SIGIR 2005: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 130–137, Salvador (2005)
    24.Xiao, J., Feng, Y.F., Ji, M.M., Zhuang, Y.T.: Fast view-based 3D model retrieval via unsupervised multiple feature fusion and online projection learning. Signal Process. (2014). doi:10.​1016/​j.​sigpro.​2014.​11.​020
    25.Yang, Y., Ma, Z.G., Hauptmann, A., Sebe, N.: Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans. Multimedia 15(3), 661–669 (2013)CrossRef
    26.Yang, Y., Nie, F., Xu, D., Luo, J., Zhuang, Y., Pan, Y.: A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 723–742 (2012)CrossRef
    27.Yang, Y., Song, J., Huang, Z., Ma, Z., Sebe, N., Hauptmann, A.G.: Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans. Multimedia 15(3), 572–581 (2013)CrossRef
    28.Yang, S., Yuan, L., Lai, Y., Shen, X., Wonka, P., Ye, J.: Feature grouping and selection over an undirected graph. In: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pp 922–930 (2012)
    29.Yang, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multimedia 10(3), 437–446 (2008)CrossRef
    30.Yu, J., Tao, D., Wang, M.: Adaptive hypergraph learning and its application in image classification. IEEE Trans. Image Process. 21(7), 3262–3272 (2012)CrossRef MathSciNet
    31.Zhang, H., Liu, Y., Ma, Z.: Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval. Neurocomputing 119, 10–16 (2013)CrossRef
    32.Zhang, H., Yu, J., Wang, M., Liu, Y.: Semi-supervised distance metric learning based on local linear regression for data clustering. Neurocomputing 93, 100–105 (2012)CrossRef
    33.Zhang, H., Zha, Z., Yang, Y., Yan, S., Gao, Y., Chua, T.: Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In: ACM Multimedia Conference, MM ’13, pp 33–42, Barcelona (2013)
    34.Zhang, H., Zhuang, Y., Wu, F.: Cross-modal correlation learning for clustering on image-audio dataset. In: Proceedings of the 15th International Conference on Multimedia, pp 273–276, Augsburg (2007)
    35.Zhang, J.G., Han, Y.H., Tang, J.H., Hu, Q.H., Jiang, J.M.: What can we learn about motion videos from still images?. In: Proceedings of the 17th International Conference on Multimedia, pp 973–976 (2014)
    36.Zhuang, Y., Yang, Y., Wu, F.: Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans. Multimedia 10(2), 221–229 (2008)CrossRef
    37.Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems NIPS (2003)
    38.Zhou, D., et al.: Ranking on data manifolds. Advances in Neural Information Processing Systems NIPS (2003)
  • 作者单位:Hong Zhang (1) (2)
    Xingyu Gao (3)
    Ping Wu (1)
    Xin Xu (1)

    1. College of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430081, China
    2. Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan, China
    3. Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
  • 刊物类别:Computer Science
  • 刊物主题:Information Systems Applications and The Internet
    Database Management
    Operating Systems
  • 出版者:Springer Netherlands
  • ISSN:1573-1413
文摘
With the explosion of multimedia data, it is usual that different multimedia data often coexist in web repositories. Accordingly, it is more and more important to explore underlying intricate cross-media correlation instead of single-modality distance measure so as to improve multimedia semantics understanding. Cross-media distance metric learning focuses on correlation measure between multimedia data of different modalities. However, the existence of content heterogeneity and semantic gap makes it very challenging to measure cross-media distance. In this paper, we propose a novel cross-media distance metric learning framework based on sparse feature selection and multi-view matching. First, we employ sparse feature selection to select a subset of relevant features and remove redundant features for high-dimensional image features and audio features. Secondly, we maximize the canonical coefficient during image-audio feature dimension reduction for cross-media correlation mining. Thirdly, we further construct a Multi-modal Semantic Graph to find embedded manifold cross-media correlation. Moreover, we fuse the canonical correlation and the manifold information into multi-view matching which harmonizes different correlations with an iteration process and build Cross-media Semantic Space for cross-media distance measure. The experiments are conducted on image-audio dataset for cross-media retrieval. Experiment results are encouraging and show that the performance of our approach is effective. Keywords Cross-media Distance metric Sparse feature selection Multi-view matching
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.