用户名: 密码: 验证码:
歌词识别辅助的音乐检索研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着数字技术的高速发展以及互联网、无线网的高度普及,数字音乐的获取变得非常容易。如何从海量的数字音乐中检出用户需要的音乐,已成为当前亟待解决的问题。基于内容的音乐检索,例如样例检索、哼唱检索,采用音乐本身的特征进行音乐检索,人工标注量小,用户使用方便,已成为主流研究方向。
     现有音乐检索系统通常仅使用旋律特征对音乐进行查找,当演唱者出现哼唱错误时,易导致检索失败。歌词是歌曲除了旋律之外的另一个重要组成部分,它存在于口语或者音乐中,在很多情况下可以辅助旋律特征提高音乐检索的精度。本文围绕如何利用歌词辅助音乐检索,对口语歌词的识别、基于口语歌词的音乐检索方法,以及清唱音乐的歌词识别、基于歌词和旋律的哼唱检索方法等关键问题进行了深入研究。本文的主要工作及创新包括以下几个方面:
     1.提出了一种基于词激活力的类的语言模型
     口语歌词识别中语言模型数据稀疏问题较为突出。为了提高口语歌词识别的准确率,本文围绕数据稀疏问题进行了相关研究。
     基于类的语言模型与基于词的语言模型插值是常用的解决语言模型数据稀疏问题的方法。但是基于类的语言模型的性能依赖于词类的性能。基于词激活力的亲和度测度在描述词语相似度上取得了很好的效果,本文使用该测度对词进行聚类,并使用聚类结果训练类的语言模型,称之为基于词激活力的类的语言模型。由于同一词类中词相似性强,基于词激活力的类的语言模型能够获得比经典的基于类的语言模型更优越的性能。实验结果表明,基于词激活力的类的语言模型与基于词的语言模型的插值模型在口语歌词识别任务中表现出了优越性能。
     2.提出了一种基于多层滤波的检索算法
     口语歌词经过识别后,如何快速准确地查找到目标歌词是基于口语歌词的音乐检索的关键问题。为此,本文提出了一种基于多层滤波的检索算法。该算法首先对识别结果进行查询扩展,针对完全识别正确的识别结果,第一层滤波器利用索引能够快速匹配到目标歌曲;针对出现误识的识别结果,第二层滤波器能够找到一个较小的候选集合;第三层滤波器采用基于声学相似度的模糊匹配算法实现候选集合与识别结果的精确匹配。实验证明,本文提出的基于多层滤波的检索算法显著提高了基于口语歌词的音乐检索系统的性能。
     3.提出了一种歌词识别辅助的哼唱检索算法
     利用歌词特征辅助哼唱检索是一个值得研究的难点问题。现有的方法采用连续语音识别技术直接对音乐中的歌词进行识别,由于识别出的歌词不够准确,因此性能提升并不明显。本文提出了一种歌词识别辅助的哼唱检索算法,该算法首先利用旋律特征找到多个候选音乐片段,然后利用候选音乐片段的歌词搭建识别网络,并采用孤立词识别技术实现歌词识别,最后结合旋律匹配和歌词匹配的结果对歌曲进行排序。本文提出的算法利用旋律检索显著缩减了歌词识别的范围,大幅度提高了识别准确率。实验证明,歌词识别辅助的哼唱检索算法能够有效地利用音乐中的歌词信息,显著提高哼唱检索系统的性能。
With the rapid development of digital technique and the population of networks, it becomes very easy to access a large quantity of digital music. At the same time, music information retrieval (MIR), which aims to search music from a large-scale music database, becomes an important and challenging research topic. The recent developed content based MIR systems, which work based on content features, such as melody, rhythm, etc, provide richer music retrieval methods for users, and it has become a very popular research topic.
     However, most of MIR systems only make use of melody to match music. Since most of users are non-professional singers, it is very likely that the input queries contain melody errors. In this case, MIR systems, which are only based on melody features, may result in a failed retrieval. In fact, lyric, which is not taken into account in such a MIR system, provides additive complementary information for song identification. This paper tries to improve MIR systems by adding lyric. We focues on two key problems:the first one is extracting lyric from spoken or singing queries, and the second one is searching method. The main contributions and innovations in this paper are described as follows:
     1. Word activation force based language models
     The problem of sparse data becomes an outstanding issue when constructing n-gram language models for lyric recognition of spoken queries. To improve the lyric recognition accuracy, this paper pays attention to this problem.
     Class-based language models suggest an appealing way to solve the problem of sparse data, but the performance of class-based language models depends on the word classes. The word activation force (WAF) based affinity measure has been proved to be effective to measure the similarity between two words. In this paper, we first apply the affinity measure to measure the similarity between two words, and then employ normalized spectral clustering to group words into word classes. Based on word classes,we can easily get a class-based language model. At last, we interpolate our WAF-based language model with a classic word-based n-gram model. Experimental results show the effectiveness of such interpolated model.
     2. A multilayer filter-based searching method for a MR system using spoken queries
     This paper proposes a multilayer filter-based method for searching the target lyric fast and accurately from the lyric database. The proposed method uses multiple hypothesis of recognition output for matching. For each hypothesis, if it is correctly recognized, the level-1filter can fast find the target songs using indexes; while if the level-1filter can not find any "matched" songs, the level-2filtering is performed to pre-select the probable lyric candidates; and then the acoustic similarity between a lyric candidate and its corresponding hypothesis can be calculated using the level-3filtering. Experimental results show the effectiveness of the proposed method.
     3. A lyric recognition-assisted Query-by-Singing/Humming (QBSH) method
     Adding lyric to help QBSH systems is intuitive but challenging. The existing methods use a large vocabulary continues speech recognizer (LVCSR) for lyric recognition of singing queries, but the extracted lyric is inaccurate. This paper proposes a lyric recognition-assisted QBSH method. Before lyric recognition, we first pre-select candidates using melody matching methods; after that, we build a recognition network using the lyrics of candidates; and then, we use the isolated-word recognition technique for lyric scoring; at last, candidates are ranked according to their melody matching and lyric scoring results. In our experiments, a significant improvement is achieved by the proposed method.
引文
[1]http://baike.baidu.com/view/343980.htm
    [2]http://mp3.baidu.com/
    [3]http://www.google.cn/music/homepage?hl=zh-cn&tab=wU
    [4]http://music.yahoo.cn/
    [5]http://y.qq.com/#type=index
    [6]Wang Avery. The Shazam music recognition service. Communications of the ACM-Music information retrieval, vol.49, no.8,2006, pp.44-48.
    [7]Wang Qiang, Guo Zhiyuan, Liu Gang, et al. Audio Fingerprinting Based on N-grams, International Journal of Digital Content Technology and its Applications, vol.6, no.10,2012, pp.361-368.
    [8]Pierre Hanna, Robine Matthias. Query by tapping system based on alignment algorithm. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2009.
    [9]Guo Zhiyuan, Wang Qiang, Liu Gang, et al. A music retrieval system based on spoken lyric queries. International Journal of Advancements in Computing Technology (IJACT), vol.4, no.8,2012, pp.173-180.
    [10]Ghias A., Logan J., Chamberlin D., et al. Query by humming:Musical information retrieval in an audio database, In ACM Multimedia,1995, pp.231-236.
    [11]Jang Jyh-Shing Roger, and Gao Ming-Yang. A query-by-singing system based on dynamic programming. In International Workshop on Intelligent Systms Resolutions (the 8th Bellman Continuum),2000, pp.85-89.
    [12]Jang Jyh-Shing Roger, Lee Hong-Ru, and Kao Ming-Yang. Content-based music retrieval using linear scaling and branch-and-bound tree search. In IEEE International Conference on Multimedia and Expo, Waseda University, Tokyo, Japan,2001.
    [13]Wu Xiao, Li Ming, Liu Jian, et al. A top-down approach to melody match in pitch contour for query by humming. In International Conference of Chinese Spoken Language Processing,2006.
    [14]Jiang Hongchen, and Xu Bo. Query by humming via multiscale transportation distance in random query occurrence context. In IEEE International Conference on Multimedia and Expo,2008, pp.1225-1228.
    [15]Haus Goffredo, and Pollastri Emanuele. An audio front end for query-by-humming systems. In International Symposium on Music Information Retrieval,2001, pp.65-72.
    [16]Suzuki Motoyuki, Hosoya Toru, Ito Akinori, et al. Music information retrieval from a singing voice using lyrics and melody information. EURASIP Journal on Advances in Signal Processing,2007.
    [17]Wang Chung-Che, Jang Jyh-Shing Roger, and Wang Wennen. An improved query by singing/humming system using melody and lyrics information. In International Symposium on Music Information Retrieval,2010, pp.45-50.
    [18]Gi Pyo Nam, Luong Thi Thu Trang, Nam Hyun Ha. Intelligent query by humming system based on score level fusion of multiple classifiers, EURASIP Journal on Advances in Signal Processing,2011.
    [19]Wang Qiang, Guo Zhiyuan, Liu Gang, et al. Query by humming by using locality sensitive hashing based on combination of pitch and note. In IEEE International Conference on Multimedia and Expo Workshops (ICMEW),2012, pp.302-307.
    [20]Stephen Downie, Nelson Michael. Evaluation of a simple and effective music information retrieval method. In the 23rd annual international ACM SIGIR conference on Research and development in information retrieval,2000.
    [21]Annamaria Mesaros, and Virtanen Tuomas. Automatic recognition of lyrics in singing. EURASIP Journal on Audio, Speech, and Music Processing,2010.
    [22]Annamaria Mesaros, and Virtanen Tuomas. Recognition of phonemes and words in singing. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP),2010.
    [23]Hiromasa Fujihara, Masataka Goto, and Hiroshi G. Okuno. A novel framework for recognizing phonemes of singing voice in polyphonic music. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,2009.
    [24]唐林.音乐物理学导论.1991.
    [25]Tetsuya Shimamura, Hajime Kobayashi. Weighted autocorrelation for pitch extraction of noisy speech. IEEE Transactions on Speech and Audio Processing, vol.9, no.7,2001, pp.727-730.
    [26]Ross M., Shaffer H., Cohen Andrew, et al. Average magnitude difference function pitch extractor. IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 22, no.5,1974, pp.353-362.
    [27]Noll A. Michael. Cepstrum pitch determination. The Journal of the Acoustical Society of America, vol.41, no.2,1967, pp.293-309.
    [28]Markel John. The SIFT algorithm for fundamental frequency estimation. IEEE Transactions on Audio and Electroacoustics, vol.20, no.5,1972, pp.367-377.
    [29]McNab Rodger J., Smith Lloyd A., Witten Ian H., et al. Towards the digital music library:Tune retrieval from acoustic input. In the first ACM international conference on Digital libraries,1996, pp.11-18.
    [30]Jang Jyh-Shing Roger, Hsu Chao-Ling, and Lee Hong-Ru. Continuous HMM and its enhancement for singing/humming query retrieval. In 6th International Conference on Music Information Retrieval,2005.
    [31]Yang Jingzhou, Liu Jia, Zhang Wei Qiang. A fast query by humming system based on notes. In InterSpeech,2010, pp.2898-2901.
    [32]http://www.ismir.net/
    [33]http://www.music-ir.org/mirex/wiki/MIREX_HOME
    [34]Yang Yi-Hsuan, Lin Yu-Ching, Su Ya-Fan, et al. A regression approach to music emotion recognition. IEEE Transactions on Audio, Speech, and Language Processing. vol.16, no.2,2008, pp,448-457.
    [35]Chi Chung-Yi, Wu Ying-Shian, Chu Wei-rong, et al. The power of words: Enhancing music mood estimation with textual input of lyrics. In 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops,2009, pp.1-6.
    [36]Fujihara Hiromasa, and Goto Masataka. Lyrics-to-Audio" Alignment and its Application. Multimodal Music Processing, vol.3,2012, pp.23-36.
    [37]Goto Masataka, Fujihara Hiromasa, and Okuno Hiroshi. Automatic system for temporal alignment of music audio signal with lyrics. U.S. Patent No.8,005,666.23 Aug.2011.
    [38]Mauch Matthias, Fujihara Hiromasa, and Goto Masataka. Song Prompter:An accompaniment system based on the automatic alignment of lyrics and chords to audio. In Late-breaking session at the 10th International Conference on Music Information Retrieval,2010.
    [39]Guo Jun, Guo Hanliang, Wang Zhanyi. An activation force-based affinity measure for analyzing complex networks, Scientific reports. vol.1,2011.
    [1]http://en.wikipedia.org/wiki/Semitone
    [2]http://baike.baidu.com/view/7969.htm
    [3]Suykens Johan AK, Vandewalle Joos. Least squares support vector machine classifiers. Neural Processing Letters vol.9, no.3,1999, pp.293-300.
    [4]Ozcan Giyasettin, Isikhan Cihan, Alpkocak Adil. Melody extraction on MIDI music files. In IEEE International Symposium on Multimedia,2005.
    [5]Shih Hsuan-Huei, Narayanan Shrikanth S., Kuo C-CJ. Automatic main melody extraction from MIDI files with a modified Lempel-Ziv algorithm. In IEEE International Symposium on Intelligent Multimedia, Video and Speech Processing, 2001.
    [6]Salamon Justin, Julian Urbano. Current challenges in the evaluation of predominant melody extraction algorithms. In International society for music information retrieval conference,2012.
    [7]Shimamura Tetsuya, and Kobayashi Hajime. Weighted autocorrelation for pitch extraction of noisy speech. IEEE Transactions on Speech and Audio Processing, vol.9, no.7,2001, pp.727-730.
    [8]Ross M., Shaffer H., Cohen Andrew, et al. Average magnitude difference function pitch extractor. IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 22, no.5,1974, pp.353-362.
    [9]Noll A. Michael. Cepstrum pitch determination. The Journal of the Acoustical Society of America, vol.41, no.2,1967, pp.293-309.
    [10]Markel John. The SIFT algorithm for fundamental frequency estimation. IEEE Transactions on Audio and Electroacoustics, vol.20, no.5,1972, pp.367-377.
    [11]Jang Jyh-Shing Roger, and Gao Ming-Yang. A query-by-singing system based on dynamic programming. In International Workshop on Intelligent Systms Resolutions (the 8th Bellman Continuum),2000, pp.85-89.
    [12]Jang Jyh-Shing Roger, Lee Hong-Ru, and Kao Ming-Yang. Content-based music retrieval using linear scaling and branch-and-bound tree search. In IEEE International Conference on Multimedia and Expo, Waseda University, Tokyo, Japan,2001.
    [13]Wu Xiao, Li Ming, Liu Jian, et al. A top-down approach to melody match in pitch contour for query by humming. In International Conference of Chinese Spoken Language Processing,2006
    [14]Jiang Hongchen, and Xu Bo. Query by humming via multiscale transportation distance in random query occurrence context. In IEEE International Conference on Multimedia and Expo,2008, pp.1225-1228.
    [15]Suzuki Motoyuki, Hosoya Toru, Ito Akinori, et al. Music information retrieval from a singing voice using lyrics and melody information. EURASIP Journal on Advances in Signal Processing,2007.
    [16]Wang Chung-Che, Jang Jyh-Shing Roger, and Wang Wennen. An improved query by singing/humming system using melody and lyrics information. In International Symposium on Music Information Retrieval,2010, pp.45-50.
    [17]Rabiner Lawrence, and Juang Biing-Hwang. Fundamentals of speech recognition. 1993.
    [18]Vintsyuk T. K. Speech discrimination by dynamic programming. Cybernetics and Systems Analysis1 vol.4, no.1,1968, pp.52-57.
    [19]Rabiner Lawrence, and Juang B.. An introduction to hidden Markov models. IEEE ASSP Magazine, vol.3 no.1,1986, pp.4-16.
    [20]Lee Kai-Fu. Automatic Speech Recognition:The Development of the Sphinx Recognition System. Kluwer Academic Pub. vol.62,1989.
    [21]Buntschuh Bruce, Kamm C., DiFabbrizio G., et al. VPQ:A spoken language interface to large scale directory information. In International Conference on Spoken Language Processing,1998, pp.2863-2866.
    [22]Zweig Geoffrey, Ju Yun-Cheng, Nguyen Patrick,et al. Voice-rate:A dialog system for consumer ratings." In Human Language Technologies:The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations,2007, pp.31-32.
    [23]Bohus Dan, Puerto Sergio Grau, David Huggins-Daines, et al. ConQuest:An open-source dialog system for conferences. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics,2007, pp.9-12.
    [24]Seide Frank, Li Gang, Yu Dong. Conversational Speech Transcription Using Context-Dependent Deep Neural Networks. In INTERSPEECH,2011.
    [25]Yu Dong, Seltzer Michael L., Li Jinyu, et al., Feature Learning in Deep Neural Networks-A Study on Speech Recognition Tasks. In ICLR,2013.
    [26]Yu Dong, Deng Li, and Seide Frank. The deep tensor neural network with applications to large vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, vol.21, no.2,2013, pp.388-396.
    [27]Siniscalchi Sabato Marco, Yu Dong, Deng Li. Speech Recognition Using Long-Span Temporal Patterns in a Deep Network Model. IEEE Signal Processing Letters, vol.20, no.3,2013, pp.201-204.
    [28]王炳锡.实用语音识别基础.国防工业出版社.2005.
    [29]Young Steve, Evermann Gunnar, Gales Mark, et al. The HTK book. Cambridge University Engineering Department,2002.
    [30]Viikki Olli, Bye David, and Laurila Kari. A recursive feature vector normalization approach for robust speech recognition in noise. In IEEE International Conference on Acoustics, Speech and Signal Processing, vol.2,1998, pp.733-736.
    [31]Huang Xuedong, Acero Alex, Hon Hsiao-Wuen, et al. Spoken Language Processing:A Guide to Theory, Algorithm & System Development.2001.
    [32]Young Steve J., and Young Sj. The HTK hidden Markov model toolkit:Design and philosophy. University of Cambridge, Department of Engineering,1993.
    [33]Smaragdis Paris, and Raj Bhiksha. The Markov selection model for concurrent speech recognition. Neurocomputing, vol.80,2012, pp.64-72.
    [1]Haidar Md, Douglas O'Shaughnessy. Topic n-gram count language model adaptation for speech recognition. In IEEE Spoken Language Technology Workshop (SLT),2012.
    [2]Zhao Yong, Juang Biing-Hwang. Stranded Gaussian mixture hidden Markov models for robust speech recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2012.
    [3]Hakkani-Tur, Zeynep Dilek, Riccardi Giuseppe. System and method for unsupervised and active learning for automatic speech recognition. U.S. Patent No. 8,155,960,2012.
    [4]David Chiang. Machine Translation. Grammars for Language and Genes. Springer Berlin Heidelberg,2012, pp.51-67.
    [5]Mnih Andriy, Yee Whye Teh, A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426,2012.
    [6]Orhan Umut, Hild K E, Erdogmus D. RSVP keyboard:An EEG based typing interface. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2012.
    [7]Shah K R, Patel M, Sheth J, et al., Hybrid Approach for Measuring String Similarity and its Usage in Supply Type Questions'Answer Evaluation. IJCER, vol.1, no.3,2012, pp.81-87.
    [8]Katz, Slava. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, vol.35, no.3,1987, pp.400-401.
    [9]Jelinek Frederick. Interpolated estimation of Markov source parameters from sparse data. Pattern recognition in practice,1980, pp.381-397.
    [10]Bell Timothy C, John G Cleary, Ian H. Witten. Text compression. Prentice-Hall, Inc.,1990.
    [11]Ney Hermann, Ute Essen, Reinhard Kneser. On structuring probabilistic dependences in stochastic language modeling. Computer Speech and Language, vol.8, no.1,1994, pp.1-38.
    [12]Kneser Reinhard, Hermann Ney. Improved backing-off for m-gram language modeling. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP),1995.
    [13]Brown P F, Desouza P V, Mercer R L, et al. Class-based n-gram models of natural language. Computational linguistics, vol.18, no.4,1992, pp.467-479.
    [14]Guo Jun, Guo Hanliang, Wang Zhanyi. An activation force-based affinity measure for analyzing complex networks. Scientific reports vol.4,2011.
    [15]Wang Zhanyi, Lv W., Li H., et al. PRIS at TREC 2011 Entity Track:Related Entity Finding and Entity List Completion. TREC 2011,2011.
    [16]Li Yan, Li X., Huang H. et al. PRIS at TAC2011 KBP Track. TREC 2011,2011.
    [17]Stoer Mechthild, Frank Wagner. A simple min-cut algorithm. Journal of the ACM (JACM), vol.44, no.4,1997, pp.585-591.
    [18]Hagen Lars, Andrew B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE transactions on Computer-aided design of integrated circuits and systems, vol.11, no.9,1992, pp.1074-1085.
    [19]Shi Jianbo, Jitendra Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.22, no.8,2000, pp. 888-905.
    [20]Von Luxburg Ulrike, Mikhail Belkin, Olivier Bousquet. Consistency of spectral clustering. The Annals of Statistics,2008, pp.555-586.
    [21]Craddock R. Cameron, James G A, Holtzheimer P E, et al. A whole brain fMRI atlas generated via spatially constrained spectral clustering. Human brain mapping, vol.33, no.8,2012, pp.1914-1928.
    [22]Liu Rong, Hao Zhang. Segmentation of 3D meshes through spectral clustering. In 12th Pacific Conference on Computer Graphics and Applications,2004.
    [23]Von Luxburg Ulrike. A tutorial on spectral clustering. Statistics and computing, vol.17, no.4,2007, pp.395-416.
    [24]http://en.wikipedia.org/wiki/Rayleigh_quotient#cite_note-2
    [25]Y. Qian, S. Lin, Y. Zhang, et al. An introduction to corpora resources of 863 program for Chinese language processing and human machine interaction. In ALR2004, affiliated to IJCNLP,2005.
    [26]Young Steve, Evermann Gunnar, Gales Mark, et al. The HTK book. Cambridge University Engineering Department,2002.
    [1]Zgank Andrej, Bogomir Horvat, Zdravko Kacic. Data-driven generation of phonetic broad classes, based on phoneme confusion matrix similarity. Speech Communication, vol.47, no.3,2005, pp.379-393.
    [2]Zhang Qingqing, Pan Jielin, Yan Yong hong. Development of a Mandarin-English Bilingual Speech Recognition System with Unified Acoustic Models. Journal of Information Science and Engineering, vol.26, no.4,2010, pp.1491-1507.
    [3]Santiago Omar Caballero Morales, Stephen J. Cox. Modelling errors in automatic speech recognition for dysarthric speakers. EURASIP Journal on Advances in Signal Processing,2009, pp.1-14.
    [4]Le Viet Bac, Laurent Besacier, Tanja Schultz. Acoustic-phonetic unit similarities for context dependent acoustic model portability. In IEEE International Conference on Acoustics, Speech and Signal Processing,2006.
    [5]Martijn Wieling, Eliza Margaretha, John Nerbonne. Inducing a measure of phonetic similarity from pronunciation variation. Journal of Phonetics, vol.40, no.2, 2012, pp.307-314.
    [6]Sonia Granlund, Valerie Hazan, Rachel Baker. An acoustic-phonetic comparison of the clear speaking styles of Finnish-English late bilinguals. Journal of Phonetics, vol.40, no.3,2012, pp.509-520.
    [7]Lu Lie, Hong You, Hong-Jiang Zhang. A new approach to query by humming in music retrieval. In the IEEE International Conference on Multimedia and Expo,2001.
    [8]Xu Xin, Tsuneo Kato. Robust and fast two-pass search method for lyric search covering erroneous queries due to mishearing. Computational Linguistics and Intelligent Text Processing, Springer Berlin Heidelberg,2012, pp.306-317.
    [9]http://zh.wikipedia.org/zh-cn/IPA
    [10]Minh N. Do, Martin Vetterli. Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance. IEEE Transactions on Image Processing, vol.11, no.2,2002.
    [11]Mak Brian Kan-Wing, Barnard Etienne. Phone clustering using the Bhattacharyya distance. In ICSLP 1996.
    [12]Li X.Q., King I. Gaussian mixture distance for information retrieval. In the International Joint Conference on Neural Networks,1999.
    [13]Justin Zobel, Philip Dart. Phonetic string matching:Lessons from information retrieval. In the 19th International Conference on Research and Development in Information Retrieval,1996.
    [1]Tsai Wei-Ho, Tu Yu-Ming, and Ma Cin-Hao. An FFT-based fast melody comparison method for query-by-singing/humming systems. Pattern Recognition Letters,2012.
    [2]Jang Dalwon, Jang Sei-Jin, and Lee Seok-Pil. Test of pitch extraction algorithms for query-by-singing/humming system. In IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB),2012.
    [3]Tsai Wei-Ho, and Tu Yu-Ming. An Efficient Query-by-Singing/Humming System Based on Fast Fourier Transforms of Note Sequences. In IEEE International Conference on Multimedia and Expo (ICME),2012.
    [4]Park Sungjoo, and Chung Kwangsue. Query by singing/humming (QbSH) system for polyphonic music retrieval. In IEEE International Conference on Consumer Electronics (ICCE),2012.
    [5]Mesaros Annamaria, and Virtanen Tuomas. Automatic recognition of lyrics in singing. EURASIP Journal on Audio, Speech, and Music Processing, vol.4,2010.
    [6]Mesaros Annamaria, and Virtanen Tuomas. Recognition of phonemes and words in singing. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP),2010.
    [7]Fujihara Hiromasa, Goto Masataka, and Okuno Hiroshi G. A novel framework for recognizing phonemes of singing voice in polyphonic music. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,2009.
    [8]Haus G, Pollastri E. An audio front end for query-by-humming systems, In ISMIR,2001.
    [9]Wang Chung-Che, Jang Jyh-Shing Roger, and Wang Wennen. An improved query by singing/humming system using melody and lyrics information. In International Symposium on Music Information Retrieval,2010, pp.45-50.
    [10]http://www.music-ir.org/mirex/wiki/MIREX_HOME.
    [11]Jang Jyh-Shing Roger, QBSH:A corpus for designing QBSH systems,2006. http://www.cs.nthu.edu.tw/-jang.
    [12]Wu Xiao, Li Ming, Liu Jian, et al. Atop-down approach to melody match in pitch contour for query by humming. In International Conference of Chinese Spoken Language Processing,2006.
    [1]Ghias A., Logan J., Chamberlin D., et al. Query by humming:Musical information retrieval in an audio database, In ACM Multimedia,1995, pp.231-236.
    [2]Jang Jyh-Shing Roger, and Gao Ming-Yang. A query-by-singing system based on dynamic programming. In International Workshop on Intelligent Systms Resolutions (the 8th Bellman Continuum),2000, pp.85-89.
    [3]Jang Jyh-Shing Roger, Lee Hong-Ru, and Kao Ming-Yang. Content-based music retrieval using linear scaling and branch-and-bound tree search. In IEEE International Conference on Multimedia and Expo, Waseda University, Tokyo, Japan,2001.
    [4]Wu Xiao, Li Ming, Liu Jian, et al. A top-down approach to melody match in pitch contour for query by humming. In International Conference of Chinese Spoken Language Processing,2006.
    [5]Jiang Hongchen, and Xu Bo. Query by humming via multiscale transportation distance in random query occurrence context. In IEEE International Conference on Multimedia and Expo,2008, pp.1225-1228.
    [6]Haus Goffredo, and Pollastri Emanuele. An audio front end for query-by-humming systems. In International Symposium on Music Information Retrieval,2001, pp.65-72.
    [7]Suzuki Motoyuki, Hosoya Toru, Ito Akinori, et al. Music information retrieval from a singing voice using lyrics and melody information. EURASIP Journal on Advances in Signal Processing,2007.
    [8]Wang Chung-Che, Jang Jyh-Shing Roger, and Wang Wennen. An improved query by singing/humming system using melody and lyrics information. In International Symposium on Music Information Retrieval,2010, pp.45-50.
    [9]Gi Pyo Nam, Luong Thi Thu Trang, Nam Hyun Ha. Intelligent query by humming system based on score level fusion of multiple classifiers, EURASIP Journal on Advances in Signal Processing,2011.
    [10]Wang Qiang, Guo Zhiyuan, Liu Gang, et al. Query by humming by using locality sensitive hashing based on combination of pitch and note. In IEEE International Conference on Multimedia and Expo Workshops (ICMEW),2012, pp.302-307.
    [11]Annamaria Mesaros, and Virtanen Tuomas. Automatic recognition of lyrics in singing. EURASIP Journal on Audio, Speech, and Music Processing,2010.
    [12]Annamaria Mesaros, and Virtanen Tuomas. Recognition of phonemes and words in singing. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP),2010.
    [13]Hiromasa Fujihara, Masataka Goto, and Hiroshi G. Okuno. A novel framework for recognizing phonemes of singing voice in polyphonic music. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,2009.
    [14]王炳锡,王洪.变速率语音编码.西安电子科技大学出版社,2004.
    [15]Blauth D A, Minotto V P, Jung C R, et al. Voice activity detection and speaker localization using audiovisual cues. Pattern Recognition Letters, vol.33, no.4,2012, pp.373-380.
    [16]Valsan Z.. Voice activity detection:U.S. Patent 20,120,330,656[P].2012.
    [17]Obuchi Y, Takeda R, Kanda N..Voice activity detection based on augmented statistical noise suppression. In IEEE Signal & Information Processing Association Annual Summit and Conference,2012.
    [18]Scheirer E., Slaney M.. Construction and evaluation of a robust multi-feature speech/music discriminator. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP),1997.
    [19]Saunders J.. Real-time discrimination of broadcast speech/music. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 1996.
    [20]Davis S., Mermelstein P.. Experiments in syllable-based recognition of continuous speech. IEEE Trans. Acoust., Speech, Signal Processing, vol.28,1980, pp. 357-366.
    [21]Rabiner L., Juang B.H.. Fundamentals of speech recognition. Prentice Hall,1993.
    [22]Parmar Hiren. Control System with Speech Recognition Using MFCC and Euclidian Distance Algorithm. International Journal of Engineering, vol.2, no.1 2013.
    [23]Hirsch Hans-Gunter, David Pearce. The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In Automatic Speech Recognition:Challenges for the new Millenium ISCA Tutorial and Research Workshop,2000.
    [24]Logan Beth. Mel frequency cepstral coefficients for music modeling. In ISMIR, 2000.
    [25]Vapnik Vladimir. The nature of statistical learning theory. Springer,1999.
    [26]Meyer David. Support vector machines.2012.
    [27]Deng Naiyang, Tian Yinjie, Zhang Chunhua. Support vector machines: optimization based theory, algorithms, and extensions. Chapman & Hall, vol.29. 2012.
    [28]http://www.music-ir.org/mirex/wiki/MIREX_HOME.
    [29]Jang Jyh-Shing Roger. QBSH:A corpus for designing QBSH systems.2006. http://www.cs.nthu.edu.tw/-jang.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700