A multimodal approach for extracting content descriptive metadata from lecture videos

详细信息查看全文

作者：Vidhya Balasubramanian…
关键词：Multimodal metadata extraction ; Content descriptive metadata ; Keyphrase extraction ; Topic based segmentation ; Lecture videos
刊名：Journal of Intelligent Information Systems
出版年：2016
出版时间：February 2016
年：2016
卷：46
期：1
页码：121-145
全文大小：5,488 KB
参考文献：Academic earth (2013). http://academicearth.org/ .
Adcock, J., Cooper, M., Denoue, L., Pirsiavash, H., Rowe, L.A. (2010). Talkminer: a lecture webcast search engine. In: Proceedings of the international conference on Multimedia, MM ’10, pp. 241–250. ACM, New York, NY, USA. doi:10.1145/1873951.1873986 .
Akiba, T., Aikawa, K., Itoh, Y., Kawahara, T., Nanjo, H., Nishizaki, H., Yasuda, N., Yamashita, Y., Itou, K. (2009). Construction of a test collection for spoken document retrieval from lecture audio data. JIP, 17, 82–94.
Balagopalan, A., Balasubramanian, L.L., Balasubramanian, V., Chandrasekharan, N., Damodar, A. (2012). Automatic keyphrase extraction and segmentation of video lectures. In: Technology Enhanced Education (ICTEE), 2012 IEEE International Conference on, pp. 1–10. doi:10.1109/ICTEE.2012.6208622 .
Berkeley webcasts (2013). http://webcast.berkeley.edu/ .
Böhm, K., & Rakow, T.C. (1994). Metadata for multimedia documents. ACM Sigmod Record, 23(4), 21–26.CrossRef
Chen, Y.N., Huang, Y., Kong, S.Y., Lee, L.S. (2010). Automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features. In: Spoken Language Technology Workshop (SLT), 2010 IEEE, pp. 265–270. doi:10.1109/SLT.2010.5700862 .
Fayyad, U., & Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning.
Frantzi, K.T., & Ananiadou, S. (1996). Extracting nested collocations. In: Proceedings of the 16th conference on Computational linguistics - Volume 1, COLING ’96, pp. 41–46. Association for Computational Linguistics, Stroudsburg, PA, USA. doi:10.3115/992628.992639 .
Gocr (2013). http://jocr.sourceforge.net/ .
Haubold, A. (2004). Analysis and visualization of index words from audio transcripts of instructional videos. In: Multimedia Software Engineering, 2004. Proceedings. IEEE Sixth International Symposium on, pp. 570–573. IEEE .
Haubold, A., & Kender, J.R. (2005). Augmented segmentation and visualization for presentation videos. In: Proceedings of the 13th annual ACM international conference on Multimedia, MULTIMEDIA ’05, pp. 51–60. ACM, New York, NY, USA . doi:10.1145/1101149.1101158 .
Haubold, A., & Kender, J.R. (2007). VAST MM: multimedia browser for presentation video. In: Proceedings of the 6th ACM international conference on Image and video retrieval, CIVR ’07, pp. 41–48. ACM, New York, NY, USA . doi: 10.1145/1282280.1282286 .
Hearst, M.A. (1997). Texttiling: segmenting text into multi-paragraph subtopic passages. Computer Linguistic, 23(1), 33–64. http://dl.acm.org/citation.cfm?id=972684.972687 .
Hulth, A. (2003). Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 conference on Empirical methods in natural language processing, EMNLP ’03, pp. 216–223. Association for ComputationalLinguistics, Stroudsburg, PA, USA. doi:10.3115/1119355.1119383 .
Hunter, J., Little, S., Building and indexing a distributed multimedia presentation archive using SMIL. In: ECDL’01, pp. 415–428 (2001).
Kim, S.N., & Kan, M.Y. (2009). Re-examining automatic keyphrase extraction approaches in scientific articles. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, MWE ’09, pp. 9–16. Association for Computational Linguistics, Stroudsburg, PA, USA. http://dl. acm.org/citation.cfm?id=1698239.1698242 .
Liu, F., Liu, F., Liu, Y. (2008). Automatic keyword extraction for the meeting corpus using supervised approach and bigram expansion. In: Spoken Language Technology Workshop, 2008. SLT 2008. IEEE, pp. 181–184. doi:10.1109/SLT.2008.4777870 .
Liu, F., Pennell, D., Liu, F., Liu, Y. (2009). Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL ’09, pp. 620–628. Association for Computational Linguistics, Stroudsburg, PA, USA. http://dl.acm.org/citation.cfm?id= 1620754.1620845 .
Liu, T., & Kender, J.R. (2004). Lecture videos for e-learning: Current research and challenges. In: Proceedings of the IEEE Sixth International Symposium on Multimedia Software Engineering, 2004 pp. 574–578, IEEE.
Manning, C.D., Raghavan, P., Schtze, H. (2008). Introduction to information retrieval. New York, NY, USA: Cambridge University Press.MATH CrossRef
MIT OCW - MIT OpenCourseWare (2013). http://ocw.mit.edu/ .
Medelyan, O., & Witten, I.H. (2006). Thesaurus based automatic keyphrase indexing. In: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, JCDL ’06, pp. 296–297. ACM, New York, NY, USA. doi:10.1145/1141753.1141819 .
Mukhopadhyay, S., & Smith, B. (1999). Passive capture and structuring of lectures. In: Proceedings of the seventh ACM international conference on Multimedia (Part 1), MULTIMEDIA ’99, pp. 477–487. ACM, New York, NY, USA. doi:10.1145/319463.319690 .
NPTEL - National Programme on Technology Enhanced Education (2013). http://nptel.iitm.ac.in/ .
Open Yale Courses (OYC) (2013). http://oyc.yale.edu/ .
Tesseract OCR (2013). https://code.google.com/p/tesseract-ocr/ .
VideoLectures.NET (2013). http://videolectures.net/ .
VideoLectures.Net Challenge (2014). http://acmmm.org/2014/docs/mm/_gc/mediamixer.pdf .
Viertl, R. (2008). Fuzzy models for precision measurements. Mathematics and Computers in Simulation, 79(4), 874–878.MATH MathSciNet CrossRef
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G. (1999). KEA: practical automatic keyphrase extraction. In: Proceedings of the fourth ACM conference on Digital libraries, DL ’99, pp. 254–255. ACM, New York, NY, USA. doi:10.1145/313238.313437 .
Ziółko, B., Manandhar, S., Wilson, R.C. (2007). Fuzzy recall and precision for speech segmentation evaluation. In: Proceedings of 3rd Language & Technology Conference, Poznan, Poland, .
作者单位：Vidhya Balasubramanian (1)
Sooryanarayan Gobu Doraisamy (2)
Navaneeth Kumar Kanakarajan (2)

1. Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham University, Coimbatore, India
2. Amrita E-Learning Research Center, Amrita Vishwa Vidyapeetham University, Coimbatore, India
刊物类别：Computer Science
刊物主题：Data Structures, Cryptology and Information Theory
Artificial Intelligence and Robotics
Document Preparation and Text Processing
Business Information Systems
出版者：Springer Netherlands
ISSN：1573-7675

文摘

The rapidly increasing availability of e-learning content and lecture videos over the internet, has brought forth an imperative need for developing effective content based retrieval systems. Comprehensive metadata extraction and support for topic-level search within videos are key factors in developing such systems. In this paper, we propose a multimodal metadata extraction system which extracts an optimal set of keyphrases and topic based segments that effectively summarize the content of a lecture video. The extraction process utilizes features from both audio transcripts and slide content in video streams. A hybrid approach combining a Naive Bayes classifier and a rule-based refiner is used for effective retrieval of the metadata in a lecture. The proposed content-descriptive metadata extraction technique has been evaluated using actual lecture videos from different sources, and our results show that our multimodal approach is effective in summarizing the lecture’s content, potentially improving the user experience during retrieval and browsing. Keywords Multimodal metadata extraction Content descriptive metadata Keyphrase extraction Topic based segmentation Lecture videos

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700