Image visual attention computation and application via the learning of object attributes
详细信息    查看全文
  • 作者:Junwei Han (1)
    Dongyang Wang (1)
    Ling Shao (2)
    Xiaoliang Qian (1)
    Gong Cheng (1)
    Jungong Han (3)
  • 关键词:Visual attention ; Eye tracking ; Object bank ; Image categorization
  • 刊名:Machine Vision and Applications
  • 出版年:2014
  • 出版时间:October 2014
  • 年:2014
  • 卷:25
  • 期:7
  • 页码:1671-1683
  • 全文大小:2,412 KB
  • 参考文献:1. Achanta, R., Estrada, F., Wils, P., S眉sstrunk, S.: Salient region detection and segmentation. In: ICVS, pp. 66鈥?5 (2008)
    2. Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: CVPR, pp. 1597鈥?604 (2009)
    3. Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: CVPR, pp. 73鈥?0 (2010)
    4. Bruce, N., Tsotsos, J.: Saliency based on information maximization. In: NIPS, pp. 155鈥?62 (2006)
    5. Chen, Z., Han, J., Ngan, K.N.: Dynamic bit allocation for multiple video object coding. IEEE Trans. Multimed. 8(6), 1117鈥?124 (2006) CrossRef
    6. Cheng, M.M., Zhang, G.X., Mitra, N.J., Huang, X., Hu, S.M.: Global contrast based salient region detection. In: CVPR, pp. 409鈥?16 (2011)
    7. Einh盲user, W., Spain, M., Perona, P.: Objects predict fixations better than early saliency. J. Vis. 8(14), 1鈥?6 (2008) CrossRef
    8. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627鈥?645 (2010) CrossRef
    9. Gao, D., Mahadevan, V., Vasconcelos, N.: On the plausibility of the discriminant center-surround hypothesis for visual saliency. J. Vis. 8(7), 1鈥?8 (2008) CrossRef
    10. Goferman, S., Zelnik-Manor, L., Tal, A.: Context-aware saliency detection. In: CVPR, pp. 2376鈥?383 (2010)
    11. Gopalakrishnan, V., Hu, Y., Rajan, D.: Salient region detection by modeling distributions of color and orientation. IEEE Trans. Multimed. 11(5), 892鈥?05 (2009) CrossRef
    12. Guo, C., Ma, Q., Zhang, L.: Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. In: CVPR, pp. 1鈥? (2008)
    13. Han, B., Zhu, H., Ding, Y.: Bottom-up saliency based on weighted sparse coding residual. In: ACM Multimedia, pp. 1117鈥?120 (2011)
    14. Han, J., Ngan, K.N., Li, M., Zhang, H.J.: Unsupervised extraction of visual attention objects in color images. IEEE Trans. Circuit. Syst. Video Technol. 16(1), 141鈥?45 (2006) CrossRef
    15. Han, J., Pauwels, E., Zeeuw, P.: Fast saliency-aware multi-modality image fusion. Neurocomputing 111, 70鈥?0 (2013) CrossRef
    16. Harel, J., Koch, C., Perona, P.: Graph-Based Visual Saliency. In: NIPS, pp. 545鈥?52 (2007)
    17. Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. ACM Trans. Graphics 24, 577鈥?84 (2005) CrossRef
    18. Hou, X., Zhang, L.: Saliency detection: a spectral residual approach. In: CVPR, pp. 1鈥? (2007)
    19. Hou, X., Harel, J., Koch, C.: Image signature: highlighting sparse salient regions. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 194鈥?01 (2012) CrossRef
    20. Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254鈥?259 (1998) CrossRef
    21. Itti, L., Koch, C.: A saliency-based search mechanism for overt and covert shifts of visual attention. Vis. Res. 40(10鈥?2), 1489鈥?506 (2000) CrossRef
    22. Jiang, P., Qin, X.: Keyframe-based video summary using visual attention clues. IEEE MultiMed. 17(2), 64鈥?3 (2010)
    23. Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: ICCV, pp. 2106鈥?113 (2009)
    24. Khuwuthyakorn, P., Robles-Kelly, A., Zhou, J.: Object of interest detection by saliency learning. In: ECCV, pp. 636鈥?49 (2010)
    25. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR, pp. 2169鈥?178 (2006)
    26. Le Meur, O., Le Callet, P., Barba, D., Thoreau, D.: A coherent computational approach to model bottom-up visual attention. IEEE Trans. Pattern Anal. Mach. Intell. 28(5), 802鈥?17 (2006) CrossRef
    27. Le Meur, O., Baccino, T., Roumy, A.: Prediction of the inter-observer visual congruency (IOVC) and application to image ranking. In: ACM Multimedia, pp. 373鈥?82 (2011)
    28. Lee, W.F., Huang, T.H., Yeh, S.L., Chen, H.H.: Learning-based prediction of visual attention for video signals. IEEE Trans. Image Process. 20(11), 3028鈥?038 (2011) CrossRef
    29. Li, L.J., Su, H., Lim, Y., Fei-Fei, L.: Objects as attributes for scene classification. In: ECCV, pp. 1鈥?3 (2010)
    30. Liu, F., Gleicher, M.: Video retargeting: automating pan and scan. In: ACM Multimedia, pp. 241鈥?50 (2006)
    31. Liu, T., Wang, J., Sun, J., Zheng, N., Tang, X., Shum, H.Y.: Picture collage. IEEE Trans. Multimed. 11(7), 1225鈥?239 (2009) CrossRef
    32. Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., Shum, H.Y.: Learning to detect a salient object. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 353鈥?67 (2011) CrossRef
    33. Ma, Y.F., Lu, L., Zhang, H.J., Li, M.: A user attention model for video summarization. In: ACM Multimedia, pp. 533鈥?42 (2002)
    34. Ma, Y.F., Zhang, H.J.: Contrast-based image attention analysis by using fuzzy growing. In: ACM Multimedia, pp. 374鈥?81 (2003)
    35. Ngo, C.W., Ma, Y.F., Zhang, H.J.: Video summarization and scene detection by graph modeling. IEEE Trans. Circuit. Syst. Video Technol. 15(2), 296鈥?05 (2005) CrossRef
    36. Nuthmann, A., Henderson, J.M.: Object-based attentional selection in scene viewing. J. Vis. 10(8), 1鈥?9 (2010) CrossRef
    37. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145鈥?75 (2001) CrossRef
    38. Seo, H.J., Milanfar, P.: Static and space-time visual saliency detection by self-resemblance. J. Vis. 9(12), 1鈥?7 (2009) CrossRef
    39. Shao, L., Brady, M.: Specific object retrieval based on salient regions. Pattern Recognit. 39(10), 1932鈥?948 (2006) CrossRef
    40. Shao, L., Kadir, T., Brady, M.: Geometric and photometric invariant distinctive regions detection. Inf. Sci. 177(4), 1088鈥?122 (2007) CrossRef
    41. Subramanian, R., Yanulevskaya, V., Sebe, N.: Can computers learn from humans to see better?: inferring scene semantics from viewers鈥?eye movements. In: ACM Multimedia, pp. 33鈥?2 (2011)
    42. Tatler, B.W., Hayhoe, M.M., Land, M.F., Ballard, D.H.: Eye guidance in natural vision: reinterpreting salience. J. Vis. 11(5), 1鈥?3 (2011) CrossRef
    43. Torralba, A., Oliva, A., Castelhano, M.S., Henderson, J.M.: Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol. Rev. 113(4), 766鈥?86 (2006) CrossRef
    44. Wang, L., Xue, J., Zheng, N., Hua, G.: Automatic salient object extraction with contextual cue. In: ICCV, pp. 105鈥?12 (2011)
    45. Han, J., He, S., Qian, X., Wang, D., Guo, L., Liu, T.: An object-oriented visual saliency detection framework based on sparse coding representations. IEEE Trans. Circuit. Syst. Video Technol. (2013)
    46. Han, J., Shao, L., Xu, D., Shotton, J.: Enhanced computer vision with Microsoft Kinect sensor: a review. IEEE Trans. Cybern. 43(5), 1318鈥?334 (2013) CrossRef
    47. Hong, R., Tang, J., Tan, H., Ngo, C., Yan, S., Chua, T.: Beyond search: event-driven summarization for web videos. TOMCCAP 7(4), 35 (2011) CrossRef
    48. Wang, M., Hong, R., Li, G., Zha, Z., Yan, S., Chua, T.: Event driven web video summarization by tag localization and key-shot identification. IEEE Trans. Multimed. 14(4), 975鈥?85 (2012) CrossRef
    49. Hong, R., Wang, M., Li, G., Nie, L., Zha, Z., Chua, T.: Multimedia question answering. IEEE Multimed. 19(4), 72鈥?8 (2012) CrossRef
  • 作者单位:Junwei Han (1)
    Dongyang Wang (1)
    Ling Shao (2)
    Xiaoliang Qian (1)
    Gong Cheng (1)
    Jungong Han (3)

    1. School of Automation, Northwestern Polytechnical University, Xi鈥檃n, 710072, China
    2. University of Sheffield, Sheffield, UK
    3. Civolution Technology, Eindhoven, The Netherlands
  • ISSN:1432-1769
文摘
Visual attention aims at selecting a salient subset from the visual input for further processing while ignoring redundant data. The dominant view for the computation of visual attention is based on the assumption that bottom-up visual saliency such as local contrast and interest points drives the allocation of attention in scene viewing. However, we advocate in this paper that the deployment of attention is primarily and directly guided by objects and thus propose a novel framework to explore image visual attention via the learning of object attributes from eye-tracking data. We mainly aim to solve three problems: (1) the pixel-level visual attention computation (the saliency map); (2) the image-level visual attention computation; (3) the application of the computation model in image categorization. We first adopt the algorithm of object bank to acquire the responses to a number of object detectors at each location in an image and thus form a feature descriptor to indicate the occurrences of various objects at a pixel or in an image. Next, we integrate the inference of interesting objects from fixations in eye-tracking data with the competition among surrounding objects to solve the first problem. We further propose a computational model to solve the second problem and estimate the interestingness of each image via the mapping between object attributes and the inter-observer visual congruency obtained from eye-tracking data. Finally, we apply the proposed pixel-level visual attention model to the image categorization task. Comprehensive evaluations on publicly available benchmarks and comparisons with state-of-the-art methods demonstrate the effectiveness of the proposed models.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700