基于情感语义相似度的音乐检索模型研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着科技的发展,电子数据的存储能力越来越强,使得海量电子多媒体数据的存储和管理成为可能。在线音乐数据数量的爆炸式增长给音乐收听用户带来了很大的选择空间,但同时也给在线音乐收听服务系统带来了巨大挑战,如何识别用户内心最真实的需求和如何推荐给用户最感兴趣的歌曲成了目前很多音乐检索和推荐领域研究人员面临的巨大问题。在音乐信息检索系统中,很多用户无法确切地指定自己想听哪一首歌曲,而是提供给音乐分享和收听系统一些不含有任何和音乐相关的描述性信息(本文中定义为“非描述性音乐查询”),目前流行的基于音乐属性关键字匹配的音乐检索算法就不能满足用户的需求。如何理解用户提供的非描述性音乐查询和如何将音乐数据和非描述性音乐查询进行相似度计算就成了目前流行的音乐检索系统中需要解决的问题。
     情感计算是目前人机交互领域中重要的研究内容,其主要目标是解决如何让机器识别人的情感从而让机器更好地与人交互为人服务的问题。本文提出在音乐检索系统中存在非描述性查询无法准确处理的问题,并基于音乐是情感的表达的观点,提出用情感语义分析方式解决音乐检索系统中非描述性查询的处理问题。本文基于文本情感计算的研究,借助于文本情感分类和识别的方法,对非描述性查询和音乐进行高维情感语义空间建模,并在情感语义空间中计算查询和音乐之间的情感语义相似度,并给出相似度排序。
     本文关注点在于音乐的情感建模和分类以及在基于情感分析的音乐检索模型研究。具体过程如下:
     首先,根据WordNet-Affect定义音乐情感语义空间模型,将音乐情感分成七大类,这种分类方法考虑到了音乐情感表达的内在特征,并扩充了文本情感的分类。
     其次,本文从国际知名音乐分享网站last.fm自动抽取海量音乐相关数据,构建音乐检索数据集(DUTMIR-Dataset),并人工标注了情感信息用于机器学习音乐情感的分类。并根据音乐的属性数据和社会化数据等属性构建不同的特征集合,验证不同的特征选择对于音乐情感语义空间表示的作用和对检索性能的影响。
     再次,为了进行音乐的情感分类,考虑到了音乐语料集短小、精炼、含蓄的特点,本文利用LDA模型和聚类方法扩充和均衡了音乐数据的属性集,解决了音乐数据的稀疏性的问题,并通过实验验证多种音乐情感分类器在音乐情感分类方面的性能。
     最后,为了综合显示本文提出的基于音乐情感识别的音乐检索模型的性能,本文开发出对应的应用原型系统并测试运行。
As the rapid development of science and technology, the fast increase of digital storage capacity makes it possible that enormous multimedia data can be stored and managed automatically. The exploration of online shared music provides more choices to users as well more challenges to music sharing web service systems. The problem of understanding the real intent of system users and recommending them with most relevant music items has been increasingly challenging. In Music Information Retrieval (MIR) System, many system users submit queries that contain no descriptive information about music. This kind of query with no desciptive information are defined as "Non-Descriptive Query" in this paper, and usually can not be tackled well in common music search and download websites. It is urgent to find a solution to understand the implicit emotional inquiry of systems users and compute the similarity between music and non-descriptive queries.
     Affective Computing is a significant research area in Human Computer Interaction. The main purpose is to recognize the emotion of computer users and serve them emotionally. As music is sentimentally expressive, the problem of processing Non-Descriptive Queries can be addressed via detecting implicit emotion. Our study on new model of Music Information Retrieval (MIR) is based on text emotion detection and recognition techniques. Queries and music are represented in a high dimensional emotion space and the similarity is computed according to their relevance in the high dimensional emotion space.
     The focus of this paper is on modeling music in emotion space, including:
     First, we define music emotion space according to WordNet-Affect. The categories are extended to 7 types considering the intrinsic feature of music.
     Second, we download a large dataset from last.fm and build DUTMIR-Dataset with manual annotation, which applied in our machine learning theories based music emotion classification and produce 3 types of different feature sets to evaluate their influence on MIR.
     Third, as music data is short, concise and implicit, we utilize LDA model to attach recommended tags to music, which conquers the sparsity and imbalance of the music dataset. Besides, different classifiers are tested to get the best one for MIR system.
     Last but not least, in order to show our model in an obvious way, we design and develop a prototype system.
引文
[1]潘宇.个性化彩铃推荐系统的设计与实现[D].大连:大连理工大学,2007.
    [2]Casey M. A., Veltkamp R., Goto M., Leman M., Rhodes C., Slaney M.. Content-Based Music Information Retrieval:Current Directions and Future Challenges [J]. Proceedings of the IEEE,2008,96(4):668-696.
    [3]Jyh-Shing Roger Jang, Hong-Ru Lee, Jiang-Chun Chen. Super MBox:An Efficient/Effective Content-based Music Retrieval [C]. The 9th ACM Multimedia Conference.2001:636-637.
    [4]Douglas R. Turnbull, Luke Barrington, Gert Lanckriet, Mehrdad Yazdani. Combining audio content and social context for semantic music discovery [C]. Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (SIGIR), New York, NY, USA,2008:387-394.
    [5]Nooi jer J D, Wiering F, Volk A et al. Cognition-based segmentation for music information retrieval systems [C]. Conference on Interdisciplinary Musicology,2008:1-10.
    [6]R. Picard. Affective Computing [M]. MIT press, Cambridge, MA, USA,2000.
    [7]李静.基于情感标签的音乐检索算法研究[D].大连:大连理工大学,2011.
    [8]Wang L, Huang S, Sheng Hu, Jiaen Liang, Bo Xu. An effective and efficient method for query by humming system based on multi-similarity measurement fusion [C]. International Conference on Audio, Language and Image Processing,2008:471-475.
    [9]Ogihara M. Content based music similarity search and emotion detection[C]. Proceedings of 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Fairmont Queen Elizabeth Hotel, Montreal, Quebec, Canada,2004:17-21.
    [10]Luke Barrington, Mehrdad Yazdani, Douglas Turnbull and Gert Lanckriet. Combining feature kernels for semantic music retrieval [C]. International Society for Music Information Retrieval Conference (ISMIR),2008:614-619.
    [11]Zhang B, Shen J, Xiang Q, Wang Y. CompositeMap:a novel framework for music similarity measure [C]. Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, New York, NY, USA,2009:403-410.
    [12]C. Laurier, J. Grivolla, and P. Herrera. Multimodal music mood classification using audio and lyrics [C]. In Proceedings of the International Conference on Machine Learning and Applications (ICMLA08), San Diego, CA, USA,2008:688-693.
    [13]Dan Yang, Won Sook Lee. Music Emotion Identification from Lyrics [C]. Proceedings of 11th IEEE International Symposium on Multimedia, San Diego, Canada,2009:624-629.
    [14]Menno van Zaanen and Pieter Kanters. AUTOMATIC MOOD CLASSIFICATION USING TF*IDF BASED ON LYRICS [C].11th International Society for Music Information Retrieval Conference (ISMIR 2010), Utrecht, Netherlands,2010:75-80.
    [15]Papiotis P and Purwins H. A Lyrics-matching QBH system for interactive environments [C]. Sound and Music Computing Conference, Barcelona, Spain,2010.
    [16]Hu X and Downie J S. Improving Mood Classification in Music Digital Libraries by Combining Lyrics and Audio [C]. Joint Conference on Digital Libraries, Gold Coast, Queensland, Australia,2010:159-168.
    [17]Begelman G, Keller P and Smadja F. Automated Tag Clustering, Improving Search and Exploration in the Tag Space [C]. The 15th International World Wide Web Conference, Edinburgh, UK (2006).
    [18]Levy M. and Sandler M. A Semantic Space for Music Derived from Social Tags [C]. International Society for Music Information Retrieval, Philadelphia. Pennsylvania, USA, 2007.
    [19]Kim, H., Breslin, J. G., Yang, S. and Kim, H.:Social Semantic Cloud of Tag:Semantic Model for Social Tagging [J]. Lecture Notes in Computer Science,2008,4953:83-92.
    [20]Fei, W., Xin, W., Shao, B., Li, T. and Ogihara, M.:Tag Integrated Multi-Label Music Style Classification with Hypergraph [C]. International Society for Music Information Retrieval (ISMIR). Kobe, Japan,2009:363-368.
    [21]J. Li, H. Lin, and L. Zhou. Emotion tag based music retrieval algorithm [J]. Lecture Notes in Computer Science (LNCS),2010, 6458:599-609.
    [22]Bischoff K, Firan C S, Nejdl W, et al. Can All Tags Be Used For Search? [C]. Proceeding of the 17th ACM conference on Information and knowledge management. Napa Valley, California, USA,2008:193-202.
    [23]梅放,林鸿飞.基于社会化标签的移动音乐检索[C],全国第五届信息检索学术会议(CCIR2009),上海,2009:262-271.
    [24]Simon Tong, Daphne Koller. Support vector machine active learning with applications to text classification [J]. Journal of Machine Learning Research,2001:45-66.
    [25]Sebe N., Lew M. S., Cohen R., Garg A., Huang T. S.. Emotion Recognition Using a Cauchy Naive Bayes Classifier [C]. International Conference on Pattern Recognition (ICPR), 2002:17-20.
    [26]Pang B, Lee L and Vaithyanathan S. Thumbs up?:Sentiment Classication using Machine Learning Techniques [C]. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), ACM, Philadelphia,2002:79-86.
    [27]Cecilia Ovesdotter Alm, Dan Roth and Richard Sproat. Emotions from text:machine learning for text-based emotion prediction [C]. HLT'05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA,2005:579-586.
    [28]Hevner K. Experimental studies of the elements of expression in music [J]. American Journal of Psychology,1936,48(2):246-268.
    [29]Leonard B. Meyer:Emotion and meaning in music [M]. Chicago:University of Chicago Press,1961.
    [30]Carol L. Krumhansl. Music:A link between cognition and emotion [J]. Current Directions in Psychological Science. 2002,11(2):45-50
    [31]佘莉,夏虎,傅彦.音乐评论的情感挖掘研究[J].计算机科学,2009,5(36):172-176.
    [32]Tao Li, Mitsunori Ogihara. Detecting Emotion in Music [C]. The fourth international conference on music information retrieval (ISMIR) 2003, Washington, D. C., USA
    [33]周海宏.音乐及其表现的世界[D].北京:中央音乐学院,1999.
    [34]Thayer, R. E. The biopsychology of mood and arousal [M]. New York:Oxford University Press,1989.
    [35]C. Strapparava and R. Mihalcea. Learning to identify emotions in text [C]. Fortaleza, Cear' a, Brazil, March 2008. Proceedings of the 2008 ACM symposium on Applied computing. 2008:1556-1560.
    [36]Carlo Strapparava and Alessandro Valitutti. WordNet-Affect, an affective extension of WordNet [C].4th International Conference on Language Resources and Evaluation (LREC), Lisbon, Portugal,2004:1083-1086.
    [37]Diman Ghazi, Diana Inkpen and Stan Szpakowicz. Hierarchical approach to emotion recognition and classification in texts [J]. Lecture Notes in Computer Science (LNCS), 2010,6085:40-50.
    [38]Tien-Lin Wu and Shyh-Kang Jeng. Probabilistic estimation of a novel music emotion model [J]. Lecture Notes in Computer Science,2008,4903:487-497.
    [39]Lie Lu, Liu D., Hong-Jiang Zhang. Automatic mood detection and tracking of music audio signals [J]. IEEE Transactions on Audio, Speech, and Language Processing,2006, 14(1):5-18.
    [40]Beukeboom C. J. and Semin G. R.. How mood turns on language [J]. Journal of experimental social psychology,2005,42(5):553-566.
    [41]Yunqing Xia, Linlin Wang, Kam-Fai Wong, Mingxing Xu. Sentiment vector space model for lyric-based song sentiment classification [C]. Proceedings of Association for Computational Linguistics ACL-08:HLT, Columbus, Ohio, USA,2008:133-136.
    [42]Hu X. Downie J. S. Andreas F. Ehmann. Lyric text mining in music mood classification [C]. International Society for Music Information Retrieval (ISMIR),2009.
    [43]Karydis, I., Nanopoulos, A., Gabriel, H. and Spiliopoulou, M.:Tag-aware Spectral Clustering of Music Items. In:International Society for Music Information Retrieval, Kobe, Japan,2009:159-164.
    [44]Charles Burney. A general history of music:from the earliest ages to the present period [M].2nd ed. Cambridge:Cambridge University Press,2010.
    [45]R. Typke, P. Giannopoulos, R. C. Veltkamp, F. Wiering, and R. van Oostrum. Using transportation distances for measuring melodic similarity [C]. International Society for Music Information Retrieval (ISMIR),2003:107-114.
    [46]A. Ghias, J.Logan, D. Chambetlain, B.C. Smith. "Query by Humming-Musical Information Retrieval in an Audio Database", ACM Multimedia, San Francisco,1995
    [47]Ryynanen M, Klapuri A. Query by humming of midi and audio using locality sensitive hashing. IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas, NV, USA,2008:2249-2252.
    [48]Wang L, Huang S, Hu S, et al. An effective and efficient method for query by humming system based on multi-similarity measurement fusion [C]. International Conference on Audio, Language and Image Processing, Shanghai, China,2008:471-475.
    [49]Nattha Phiwma and Parinya Sanguansat. An Improved Melody Contour Feature Extraction for Query by Humming [J]. International Journal of Computer Theory and Engineering, 2010,2(4):1793-8201.
    [50]Douglas Ross Turnbull. Design and development of a semantic music discovery engine [D]. San Diego:University of California San Diego,2008.
    [51]Oscar Celma. Music recommendation and discovery in the long tail [D]. PhD thesis, Universitat Pompeu Fabra, Barcelona, Spain,2008.
    [52]李海芳,焦丽鹏,陈俊杰,王莉,贺静.情感语义图像检索技术研究.计算机工程与应用,2006,18(2):82-85.
    [53]张全,陆长德,余隋怀,于明玖.基于多维情感语义空间的色彩表征方法[J].计算机辅助设计与图形学学报,2006,18(2):289-294.
    [54]赵涓涓,陈俊杰,刘嘉琳,许伟忠.以本体为核心的图像情感语义检索模型.计算机应用.2009,29(5):1430-1436.
    [55]Dellandrea E, Liu N, Chen L. Classification of affective semantics in images based on discrete and dimensional models of emotions[C]. International Workshop on Content-Based Multimedia Indexing (CBMI), Grenoble, France,2010:1-6.
    [56]Yang Y H, Lin Y C, Cheng H T, et al. Toward multi-modal music emotion classification [C]. Proceedings of Pacific Rim Conference on Multimedia, Tainan,2008:70-79.
    [57]徐军,丁宇新,王晓龙.使用机器学习方法进行新闻的情感自动分类[J].中文信息学报,2007,21(6):95-100.
    [58]Kim S B, Han K S, Rim H C, et al. Some Effective Techniques for Naive Bayes Text Classification. IEEE Transactions on Knowledge and Data Engineering,2006, 18(11):1457-1466.
    [59]Domingos P and Pazzani M. Beyond Independence:Conditions for the Optimality of the Simple Bayesian Classifier. Machine Learning 29,1997:103-130.
    [60]Vladimir Naumovich Vapnik. Estimation of dependences based on empirical data [M].2ed ed. Springer,2006.
    [61]张学工.关于统计学习理论与支持向量机[J].自动化学报,2000,26(1):32-42.
    [62]Yang Y H, Lin Y C, Su Y F, Chen H H. A Regression Approach to Music Emotion Recognition [J], IEEE Transactions on Audio, Speech, and Language Processing,2008,16(2):448-457.
    [63]Gray R M. Entropy and information theory [M].2nd ed.2010.
    [64]Tian Z H, Zhao L, Jia Y. Research on Consistent Measurement of Uncertainty Based on Entropy [C]. International Conference on Intelligent Computation Technology and Automation (ICICTA), Hunan,2008:684-687.
    [65]Berger A L, Pietra W J D and Pietra S A D. A maximum entropy approach to natural language processing [J]. Computational Linguistics,1996,22 (1):39-71.
    [66]Jarvelin, K., Kekalainen, J. IR evaluation methods for retrieving highly relevant documents[C]. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, Athens, Greece,2000:41-48.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700