一种基于线性判别分析和支持向量机的音乐分类方法

英文题名：Research of Music Classification Algorithm Based on Linear Discriminative Analysis and Support Vector Machine
作者：姚斯强
论文级别：硕士
学科专业名称：电路与系统
中文关键词：音频检索 ; 音乐分类 ; 线性判别分析 ; 支持向量机
英文关键词：Audio Retrieval ; Music classification ; Linear Discriminative Analysis(LDA) ; Support Vector Machine(SVMs)
学位年度：2007
导师：胡剑凌
学科代码：080902
学位授予单位：上海交通大学
论文提交日期：2006-12-01

摘要

随着互联网络以及广播技术的发展,人们有机会接触到大量的多媒体内容。但是随着数据量的快速增长,如何自动的对这些内容进行管理就成为了一个突出的问题。特别对于身边种类繁多的音乐信号,人们要求有快速高效的方法对它们进行分类管理(根据不同风格或演唱者等),本论文就是希望找到一种较好的算法来解决这个问题。
     本文在现有音乐分类系统的基础上,提出了一种改进的音乐分类结构,在原来的结构中加入了线性判别分析(LDA)降维模块对所提取的高维特征向量进行降维,并在最终的分类阶段使用支持向量机(SVM)分类器,并使用Matlab软件对最终的分类结果进行了仿真。
     目前大部分的音频音乐分类算法都包含了两个阶段:特征提取阶段和分类阶段。许多音乐特征可用于实现这一算法,包括时域的短时能量、短时过零率等,频域的带宽、谱质心等,还有基于听觉感受的MFCC(Mel-frequency cepstral coefficients)系数等。而分类算法可利用模式识别和模式分类中的大量现存的高效算法,例如GMM(高斯混合模型)[29]、NN(神经网络)、HMM(隐马尔可夫模型)等等。
     面对如此多的特征和分类算法,如何组合它们来得到较好的分类精确率,是否有可能对某些特征进行预处理来提高分类精确率,或是根据音乐分类的特殊性对分类器进行优化来取得高精确率。为了解决这个问题,本文在大量现存的音乐分类算法的基础上,提出了一种新的音乐分类结构。
     现存的音乐分类方法都将特征提取和分类这两个阶段孤立开来,提取的特征直接交由分类器进行分类,没有考虑到当前提取的音乐特征并不是最有利于分类的特征(特征向量代表的特征点在高维空间中的可分度并不是最高的),有可能通过一定的线性或非线性变换得到可分度更高的音乐特征。本文设计了一种新的音乐分类方法,该方法充分考虑了信号特征的可分类特性。在音乐特征提取阶
Along with the development of technology in Internet and broadcast, there are great opportunities for people to have access to the large quantities of multimedia contents. But since the fast growing of the data volume, how to manage the contents automatically has emerged as an urgent problem. Especially to the all kinds of music signals around us, fast and efficient methods are required to classify and manage them(according to different styles or singers). This thesis is in hope of finding a better algorithm to solve this problem.
     Based on the existing music classification architecture, this thesis propose an improved music classification architecture, adding LDA module to the former one to perform dimensionality reduction on the original high-dimensional vector, also use SVMs classifier in the final classification stage, then simulate the classification results by Matlab software.
     Most of the contemporary algorithms for audio signal classification include two stages: feature extraction stage and classification stage. Lots of music features can be applied to implement this algorithm, including the short-time energy and short-time zero-crossing-rate etc. from the time domain, the bandwidth and brightness etc. from the frequency domain, also the MFCC(Mel-frequency cepstral coefficients) coefficient which is based on the perception. And the many high efficient algorithms in the Pattern Recognition and Pattern Classificatin such as Gaussian Mixture Model(GMM)[29]、Neural Network(NN)、Hidden Markov Model(HMM) etc. can be utilized to implement the classification.
     When facing such many features and classification algorithms, how to combine them to achieve a better classification accuracy rate? Is it possible to do some preprocessing on some of the features or do some optimization on the classifiers base upon the speciality of music classification to achieve a higher classification accuracy rate? To answer these questions, this thesis propose a new music classification method base on the many already existing ones.
     The now existing music classification methods all isolate the two stages of feature extraction and classification, the extracted features are directly passed to the classifiers for classification, but have not took into account the fact that the already extracted features may not be the best ones for classification(the feature points representing the feature vectors are not the most separable in the high- dimensional space), it's probable to achieve more separable music features by performing some linear or non-linear transforms. This thesis utilize a new music

引文

[1] L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice-Hall, 1993.
    [2] R. Duda, P. Hart, and D. Stork, Pattern Classificati on. New York: Wiley, 2000.
    [3] E. Wold, T. Blum, D. Keislar, and J. Wheaton, “Content-based classification, search and retrieval of audio,” IEEE Multimedia Mag., vol. 3, no. 3, pp. 27–36, 1996.
    [4] J. Foote et al., “Content-based retrieval of music and audio,” Multimedia Storage Archiving Syst. II, vol. 3229, pp. 138–147, 1997.
    [5] S. Li, “Content-based classification and retrieval of audio using the nearest feature line method,” IEEE Trans. Speech Audio Processing, vol. 8, pp. 619–625, Sept. 2000.
    [6] G. Guo and S. Z. Li, “Content-based audio classification and retrieval by support vector machines,” IEEE Trans. Neural Networks, vol. 14, no. 1, pp. 209–215, Jan. 2003.
    [7] Chien-Chang Lin,Shi-Huang Chen, Trieu-Kien Truong, Yukon Chang, “Audio Classification and Categorization Based on Wavelets and Support Vector Machine,” IEEE Trans. Speech Audio Processing, vol. 13, pp. 644–651, Sept. 2005.
    [8] Peter N. Belhumeur , Jo?o P. Hespanha , David J. Kriegman, “Eigenfaces vs Fisherfaces: Recognition Using Class Specific Linear Projection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, v.19 n.7, p.711 -720, July 1997
    [9] UmapathyK., KrishnanS. and JimaaS.,,”Multigroup Classification of Audio Signals using Time-Frequency Parameters``,IEEE Trans. on Multimedia, Vol. 7, No. 2, pp: 308-315, April 2005
    [10] G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,”IEEE Trans. Speech Audio Process., vol. 10, no. 4, pp. 293–302, Jul. 2002.
    [11] T. Zhang and C.-C. J. Kuo, “Audio content analysis for online audiovisual data segmentation and classification,” IEEE Trans. Speech Audio Process., vol. 9, no. 4, pp. 441–457, May 2001.
    [12] Mingchun Liu and Chunru Wan, “A study on content-based classification and retrieval of audio database,” in IEEE International Database Engineering Application Symposium, 2001, pp. 0339-03315
    [13] SH Chen, JF Wang , “Noise-robust pitch detection method using wavelet transform with aliasing compensation”, Vision, Image and Signal Processing, IEE Proceedings-, 2002
    [14] G Guo, HJ Zhang, SZ Li, “Boosting for content-based audio classification and retrieval: an evaluation”, Multimedia and Expo, 2001. ICME 2001. IEEE International
    [15] C Xu, NC Maddage, X Shao, “Automatic music classification and summarization”, Speech and Audio Processing, IEEE Transactions on, 2005
    [16] Yibin Zhang, Jie Zhou, “A study on content-based music classification”, Proceedings of the 7th IEEE International Symposium on Signal Processing and Its Applications
    [17] Martin F. McKinney, Jeroen Breebaart, “Features for audio and music classification”, Proceedings of the 4th International Conference on Music, 2003
    [18] Tao Li, Mitsunori Ogihara, Qi Li, “A comparative study on content-based music genre classification”, Proceedings of the 26th annual international ACM SIGIR, 2003
    [19] Danning Jiang, Lie Lu, Hongjiang Zhang, Jianhua Tao, Lianhong Cai, “Music type classification by spectral contrast feature”, Multimedia and Expo, 2002. ICME'02. Proceedings
    [20] 韩纪庆、张磊、郑铁然编著, 语音信号处理, 清华大学出版社
    [21] 赵力编著, 语音信号处理, 机械工业出版社
    [22] Andrew R. Webb 著, John Wiley & Sons 出版,统计模式识别(第二版),王萍、杨培龙、罗颖昕译, 电子工业出版社
    [23] 肖健华编著, 智能模式识别方法, 华南理工大学出版社
    [24] 邹鲲、袁俊泉、龚享铱编著, MATLAB 6.x 信号处理, 清华大学出版社
    [25] 胡广书编著, 数字信号处理理论、算法与实现, 清华大学出版社
    [26] SH Chen, JF Wang, “Noise-robust pitch detection method using wavelet transform with aliasing compensation”, Vision, Image and Signal Processing, IEE Proceedings, 2002
    [27] Beth Logan, “Mel frequency cepstral coefficients for music modeling”, International Symposium on Music Information Retrieval, 2000
    [28] Changsheng Xu, Namunu C. Maddage, Xi Shao, Fang Cao, Qi Tian, “Musical genre classification using support vector machines”
    [29] David Pye, “Content-based methods for the management of digital music”, Acoustics, Speech, and Signal Processing, 2000. ICASSP'00
    [30] Kris West, Stephen Cox, “Features and classifiers for the automatic classification of musical audio signals”, International Symposium on Music Information Retrieval, 2004
    [31] George Tzanetakis, Georg Essl, Perry Cook, “Automatic Musical Genre Classification of Audio Signals”, International Symposium on Music Information Retrieval, 2001.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700