Category-Specific Video Summarization

详细信息查看全文

作者：Danila Potapov (19)
Matthijs Douze (19)
Zaid Harchaoui (19)
Cordelia Schmid (19)
关键词：video summarization ; temporal segmentation ; video classification
刊名：Lecture Notes in Computer Science
出版年：2014
出版时间：2014
年：2014
卷：8694
期：1
页码：540-555
全文大小：7,176 KB
参考文献：1. Liu, Y., Zhou, F., Liu, W., De la Torre, F., Liu, Y.: Unsupervised summarization of rushes videos. In: ACM Multimedia (2010)
2. de Avila, S., Lopes, A., et al.: VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognition Letters?32(1), 56-8 (2011) CrossRef
3. Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: CVPR (2012)
4. Wang, M., Hong, R., Li, G., Zha, Z.J., Yan, S., Chua, T.S.: Event driven web video summarization by tag localization and key-shot identification. Transactions on Multimedia?14(4), 975-85 (2012) CrossRef
5. Khosla, A., Hamid, R., Lin, C.J., Sundaresan, N.: Large-scale video summarization using web-image priors. In: CVPR (2013)
6. Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: CVPR (2013)
7. Truong, B.T., Venkatesh, S.: Video abstraction: A systematic review and classification. ACM Transactions on Multimedia Computing, Communications, and Applications?3(1), 3 (2007) CrossRef
8. Over, P., Smeaton, A.F., Awad, G.: The Trecvid 2008 BBC rushes summarization evaluation. In: 2nd ACM TRECVID Video Summarization Workshop (2008)
9. Ma, Y.F., Hua, X.S., Lu, L., Zhang, H.J.: A generic framework of user attention model and its application in video summarization. Transactions on Multimedia (2005)
10. Li, K., Oh, S., Perera, A.G.A., Fu, Y.: A videography analysis framework for video retrieval and summarization. In: BMVC (2012)
11. Ngo, C.W., Ma, Y.F., Zhang, H.J.: Video summarization and scene detection by graph modeling. Circuits and Systems for Video Technology 15(2) (2005)
12. Divakaran, A., Peker, K., Radhakrishnan, R., Xiong, Z., Cabasson, R.: Video summarization using Mpeg-7 motion activity and audio descriptors. In: Video Mining, vol.?6. Springer (2003)
13. Xie, L., Xu, P., Chang, S.F., Divakaran, A., Sun, H.: Structure analysis of soccer video with domain knowledge and hidden markov models. Pattern Recognition Letters?25(7) (2004)
14. Rui, Y., Gupta, A., Acero, A.: Automatically extracting highlights for TV baseball programs. In: ACM Multimedia (2000)
15. Sundaram, H., Xie, L., Chang, S.F.: A utility framework for the automatic generation of audio-visual skims. In: ACM Multimedia (2002)
16. Zhao, B., Xing, E.P.: Quasi real-time summarization for consumer videos. In: CVPR (2014)
17. Cong, Y., Yuan, J., Luo, J.: Towards scalable summarization of consumer videos via sparse dictionary selection. Transactions on Multimedia (2012)
18. Kim, G., Sigal, L., Xing, E.P.: Joint summarization of large-scale collections of web images and videos for storyline reconstruction. In: CVPR (2014)
19. Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: ACL Workshop on Text Summarization Branches, pp. 74-1 (2004)
20. Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. ACM Transactions on Graphics?24(3), 577-84 (2005) CrossRef
21. Tighe, J., Lazebnik, S.: SuperParsing: Scalable nonparametric image parsing with superpixels. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol.?6315, pp. 352-65. Springer, Heidelberg (2010) CrossRef
22. Lezama, J., Alahari, K., Sivic, J., Laptev, I.: Track to the future: Spatio-temporal video segmentation with long-range motion cues. In: CVPR (2011)
23. Grundmann, M., Kwatra, V., Han, M., Essa, I.: Efficient hierarchical graph-based video segmentation. In: CVPR (2010)
24. Massoudi, A., Lefebvre, F., Demarty, C.H., Oisel, L., Chupeau, B.: A video fingerprint based on visual digest and local fingerprints. In: ICIP (2006)
25. Chasanis, V., Kalogeratos, A., Likas, A.: Movie segmentation into scenes and chapters using locally weighted bag of visual words. In: CIVR (2009)
26. Kay, S.M.: Fundamentals of Statistical signal processing, vol. 2: Detection theory. Prentice Hall PTR (1998)
27. Harchaoui, Z., Bach, F., Moulines, E.: Kernel change-point analysis. In: NIPS (2008)
28. Harchaoui, Z., Cappé, O.: Retrospective mutiple change-point estimation with kernels. In: IEEE Workshop on Statistical Signal Processing, pp. 768-72 (2007)
29. Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer (2009)
30. Arlot, S., Celisse, A., Harchaoui, Z.: Kernel change-point detection. arXiv:1202.3878 (2012)
31. Crow, F.C.: Summed-area tables for texture mapping. ACM SIGGRAPH Computer Graphics?18, 207-12 (1984) CrossRef
32. Oneata, D., Verbeek, J., Schmid, C.: Action and Event Recognition with Fisher Vectors on a Compact Feature Set. In: ICCV (2013)
33. Cao, L., Mu, Y., Natsev, A., Chang, S.-F., Hua, G., Smith, J.R.: Scene aligned pooling for complex video recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol.?7573, pp. 688-01. Springer, Heidelberg (2012) CrossRef
34. Gaidon, A., Harchaoui, Z., Schmid, C.: Temporal localization with actoms. PAMI (2013)
35. Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol.?6314, pp. 143-56. Springer, Heidelberg (2010) CrossRef
36. Wang, H., Kl?ser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. IJCV (2013)
37. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, Cambridge, vol.?1 (2008)
作者单位：Danila Potapov (19)
Matthijs Douze (19)
Zaid Harchaoui (19)
Cordelia Schmid (19)

19. Inria, France
ISSN：1611-3349

文摘

In large video collections with clusters of typical categories, such as “birthday party-or “flash-mob- category-specific video summarization can produce higher quality video summaries than unsupervised approaches that are blind to the video category. Given a video from a known category, our approach first efficiently performs a temporal segmentation into semantically-consistent segments, delimited not only by shot boundaries but also general change points. Then, equipped with an SVM classifier, our approach assigns importance scores to each segment. The resulting video assembles the sequence of segments with the highest scores. The obtained video summary is therefore both short and highly informative. Experimental results on videos from the multimedia event detection (MED) dataset of TRECVID-1 show that our approach produces video summaries with higher relevance than the state of the art.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700