基于内容的海量音乐检索技术研究

英文题名：A Study of Content Based Massive Music Retrieval Technology
作者：王镪
论文级别：博士
学科专业名称：信号与信息处理
中文关键词：音乐信息检索 ; 哼唱检索 ; 局域敏感哈希 ; 样例检索
英文关键词：music information retrieval ; query by humming ; locality
英文关键词：sensitive hashing ; query by example
学位年度：2013
导师：郭军
学科代码：081002
学位授予单位：北京邮电大学
论文提交日期：2013-06-12

摘要

随着数据存储能力的增加和传输技术的发展,数字音乐的数量呈现前所未有的增长。然而,这种爆炸式的增长使得在如此巨大的音乐数据库中查找感兴趣的音乐片段变得越来越困难。最近几年,这种窘境已经促使大量的研究人员把研究重点放在如何快速准确地从海量音乐数据库中检索出需要的音乐歌曲。本文主要研究基于内容的海量音乐检索中的两个技术：基于哼唱的音乐检索和基于样例的音乐检索,并对音乐检索中的一个关键技术,即快速索引查找技术,进行深入的研究,本文的主要工作和创新点包括以下几个方面：
     1)提出了局部对齐哼唱检索算法
     在哼唱检索中,哼唱片段通常被看作某首歌曲的一个子片段,因此哼唱检索可以看作一个子序列匹配的问题,即在音乐库中查找与哼唱片段最相似的子片段。但是,由于哼唱错误频繁发生,一般只有部分片段准确匹配,因此,哼唱片段中只有部分子片段与音乐库中的某个子片段能够实现较好的匹配。为了找到最相似的子片段,本文提出了局部对齐的框架,其目标是在哼唱片段和歌曲库片段中找到最相似的公共子片段。采用此算法,哼唱片段中的严重哼唱错误会被丢弃,避免了哼唱错误带来的负面影响,提高了哼唱检索的准确率。
     2)提出了基于音符和基频的两层局域敏感哈希哼唱检索算法
     以往的哼唱检索技术使用基于基频的局域敏感哈希(Locality Sensitive Hashing, LSH)算法提高检索速度,本文提出了一种基于音符的LSH检索算法,能够提高候选片段的召回率,再使用更准确的重音移位迭代对齐算法可以提高哼唱检索的准确率。同时考虑到海量数据下哼唱检索的效率问题,本文提出了一种基于置信度的两层LSH滤波检索策略,只有当第一层的基于音符的LSH检索结果不可靠时再使用基于基频的LSH检索给出更准确的结果,使用这种检索策略,显著地降低了哼唱检索系统的平均检索时间。
     3)提出了基于哼唱速率的多层滤波和渐进式滤波哼唱检索算法
     在哼唱检索系统中,大部分用户的哼唱速率比较接近原始歌曲的正常速率,因此哼唱速率也是衡量歌曲匹配程度的一个重要因素。基于哼唱速率的不同,本文提出了一种基于哼唱速率的多层滤波检索算法,即先使用原始片段进行搜索,然后对哼唱速率进行不同程度的调整再搜索,此算法有效地提高了哼唱检索的速度。哼唱速率的不同调整程度也体现了歌曲的匹配程度,基于此点的考虑,本文提出了一种哼唱速率融合的渐进式滤波算法：先使用速度快但不精确的算法缩小候选歌曲的搜索范围,再使用速度慢但精确的算法计算候选歌曲的相似度,最后融合哼唱速率得分与其他精确匹配算法得分,并根据融合得分进行排序。哼唱速率提供了歌曲匹配程度的新信息,使用融合策略提高了哼唱检索的准确率。
     4)提出了基于熵的局域敏感哈希算法和边界扩展局域敏感哈希算法
     在基于内容的音乐检索研究中,一个关键问题是面向海量数据的快速查找。本文研究了一种当前最流行的快速检索算法：局域敏感哈希算法,在此基础上,提出了两种改进算法：基于熵的LSH算法和边界扩展LSH算法。在原始的LSH算法中,哈希函数的生成并没有考虑实际数据分布,而通常情况下,数据分布是不均匀的,导致某些哈希函数把数据点映射得比较密集,而某些哈希函数把数据点映射得比较稀疏,两者的碰撞概率差异很大。本文基于均匀映射的思想提出了一种基于熵的哈希函数生成方法,使映射之后的数据点大致呈现均匀分布,不同桶中的数据点数目也大致相同。使用LSH算法时,近邻点被映射到相邻桶中的概率通常也很大,因此相邻桶中的点也可能是近邻点,基于此点的考虑,本文提出边界扩展LSH算法扩展每个桶的边界,使得相邻桶之间有共同的区域,每个点有可能被映射到多个桶中,显著地增加了近邻点的碰撞概率。
     5)提出了基于结构音乐指纹的两层滤波样例检索算法
     一个好的基于样例的音乐检索系统,不仅需要满足高准确率的要求,还需要满足快速性要求。本文在研究Shazam算法的基础上,提出了一种结构音乐指纹构建方法,使用多个峰值特征点构建音乐指纹,增加指纹的信息量和区分性,显著地提高了检索速度；为了提高检索准确率,使用有选择的两层滤波检索算法筛选更多候选片段,并使用原始峰值特征点计算候选歌曲的相似度。使用本文提出的检索算法,同时提高了基于样例的音乐检索系统的精度和速度。
With the increasing capabilities of data storage and transmission applications, the amount of music data has kept unprecedented growth these years. Unexpectedly, the proliferation of music content has made it more difficult to find musical pieces from such vast music data. Recently the dilemma has motivated a remarkable research focus on how to find the required music accurately and quickly from a large-scale music database. The paper studies content based music information retrieval, expecially query-by-humming/singing (QBSH) and query-by-example (QBE), as well as the key technology of fast search from a massive database. The main contributions and innovations are described as follows:
     1) Local alignment for QBSH
     In the previous work of QBSH, the query has been considered to be a fragment of the music, so the task of QBSH has been to find a sub-fragment, which is most similar to the whole query, from the database. Taking into account humming errors, we assume that only part of the query is a sub-fragment of the music. Based on this assumption, the paper proposes a local alignment framework which searches for the best match local sub-fragment between the query and music in the database. In the proposed framework, QBSH is regarded as the identification of common local sub-fragment, which is robust to humming errors. It can discard serious humming errors, reducing the negative impact of humming errors and improving the retrieval accuracy.
     2) Note and pitch based locality sensitive hashing filtering for QBSH
     In the past, researchers adopted pitch based locality sensitive hashing (LSH) algorithm to improve the retrieval speed in a QBSH system. The paper puts forward the note based LSH algorithm to increase the recall rate of candidate songs, and then employs key transposition recursive alignment (KTRA) algorithm to improve the retrieval accuracy. Taking into account the efficiency, the paper proposes confidence based two layers of filter. If the results of note based LSH filter are unreliable, the query will be put into the pitch based LSH filter to search for the more accurate candidates. The strategy of two layers of filter greatly decreases the average retrieval time.
     3) Tempo based multi-layer filtering and progressive filtering for QBSH
     In a QBSH system, most humming tempos are close to the original music tempo, so the humming tempo is an important factor to measure the matching degree of a song. The paper proposes a multi-layer filtering method based on tempo variation to improve the retrieval speed. We first use the original humming clip to search for candidates and then adjust the tempo to search for more candidates. Since the humming tempo reflects the matching degree of a song, the paper presents a fusion of tempo and KTRA based progressive filtering algorithm. We first adopt fast but inaccurate algorithms to reduce the number of candidate songs, and then make use of slow but accurate algorithms to calculate the similarity of candidates, e.g. KTRA. Finally, we fuse the score of tempo and KTRA, and sort all the candidate songs. The tempo provides extra information for melody match, so the fusion strategy improves the retrieval accuracy.
     4) Entropy based locality sensitive hashing and boundary expanding locality sensitive hashing
     One of the key problems in content based music information retrieval is fast search from a large-scale database. The paper studies one of the most popular and fastest search algorithms, namely Locality Sensitive Hashing (LSH), and proposes two improved algorithms: entropy based LSH and boundary expanding LSH. When choosing hash functions, the original LSH algorithm does not consider the actual data distribution. In fact, the distribution of data is not uniform, so some hash functions map points to concentrated values and others map points to discrete values, leading to diverse collision probability. Taking into account uniform projection, the paper proposes entropy based LSH, making the projection uniform and the number of points in different buckets almost the same. In the LSH algorithm, neighbor points are likely to be mapped to two sides of the boundary of adjacent buckets, so points in adjacent buckets are likely to be neighbor points too. Based on the above description, the paper proposes boundary-expanding LSH. The main idea of this algorithm is that the boundary of each bucket is expanded so that adjacent buckets have common region and each point may be mapped to multiple adjacent buckets, significantly increasing the collision probability of neighbor points.
     5) Structural fingerprint based two layers of filter for QBE
     In the QBE system, the results should be accurate and the retrieval speed should be fast. On the basis of Shazam algorithm, the paper proposes a method to construct the structural fingerprint, which uses a plurality of peaks to construct the music fingerprint, increasing the information and discrimination of fingerprints and improving the retrieval speed. In order to improve the retrieval accuracy, the paper presents two layers of filter to obtain more candidates and makes use of the original peaks to calculate the similarity of candidates. The proposed algorithms can improve the retrieval accuracy and speed simultaneously.

引文

[1]http://www.music-ir.org/mirex/2012/index.php/Query_by_Singing/Humming.
    [2]Casey Michael A., Veltkamp Remco, Goto Masataka, et al. Content-based music information retrieval:current directions and future challenges. Proceedings of the IEEE, vol.96, no.4,2008, pp.668-696.
    [3]Ghias Asif, Logan Jonathan, Chamberlin David, et al. Query by humming: musical information retrieval in an audio database. In ACM International Conference on Multimedia,1995, pp.231-236.
    [4]Jang Jyh-Shing Roger, Lee Hong-Ru, and Kao Ming-Yang. Content-based music retrieval using linear scaling and branch-and-bound tree search. In IEEE International Conference on Multimedia and Expo, Waseda University, Tokyo, Japan,2001.
    [5]Jang Jyh-Shing Roger, Lee Nien-Jung, and Hsu Chao-Ling. Simple but effective methods for QBSH at MIREX 2006. In International Conference on Music Information Retrieval,2006.
    [6]Jang Jyh-Shing Roger, and Gao Ming-Yang. A query-by-singing system based on dynamic programming. In International Workshop on Intelligent Systms Resolutions (the 8th Bellman Continuum),2000, pp.85-89.
    [7]Jang Jyh-Shing Roger, and Lee Hong-Ru. Hierarchical filtering method for content-based music retrieval via acoustic input. In the ninth ACM International Conference on Multimedia,2001, pp.401-410.
    [8]Jang Jyh-Shing Roger, Hsu Chao-Ling, and Lee Hong-Ru. Continuous HMM and its enhancement for singing/humming query retrieval. In 6th International Conference on Music Information Retrieval,2005.
    [9]Wu Xiao, Li Ming, Liu Jian, et al. A top-down approach to melody match in pitch contour for query by humming. In International Conference of Chinese Spoken Language Processing,2006.
    [10]Jiang Hongchen, and Xu Bo. Query by humming via multiscale transportation distance in random query occurrence context. In IEEE International Conference on Multimedia and Expo,2008, pp.1225-1228.
    [11]Ryynanen Matti, and Klapuri Anssi. Query by humming of midi and audio using locality sensitive hashing. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2008, pp.2249-2252.
    [12]Yang Jingzhou, Liu Jia, Zhang Wei Qiang. A fast query by humming system based on notes. In InterSpeech,2010, pp.2898-2901.
    [13]Gi Pyo Nam, Luong Thi Thu Trang, Nam Hyun Ha. Intelligent query by humming system based on score level fusion of multiple classifiers, EURASIP Journal on Advances in Signal Processing,2011.
    [14]Suzuki Motoyuki, Hosoya Toru, Ito Akinori, et al. Music information retrieval from a singing voice using lyrics and melody information. EURASIP Journal on Advances in Signal Processing,2007.
    [15]Guo Zhiyuan, Wang Qiang, Liu Gang, et al. A music retrieval system based on spoken lyric queries, International Journal of Advancements in Computing Technology (IJACT), vol.4, no.8,2012, pp.173-180.
    [16]Guo Zhiyuan, Wang Qiang, Liu Gang, et al. A music retrieval system using melody and lyric. In IEEE International Conference on Multimedia and Expo Workshops (ICMEW),2012, pp.343-348.
    [17]http://www.midomi.com/
    [18]http://www.musipedia.org/
    [19]Jang Jyh-Shing Roger, and Lee Hong-Ru. A general framework of progressive filtering and its application to query by singing/humming. IEEE Transactions on Audio, Speech, and Language Processing. vol.16, no.2,2008, pp.350-358.
    [20]Yu Yi, Crucianu Michel, Oria Vincent, et al. Local summarization and multi-level LSH for retrieving multi-variant audio tracks. In the 17th ACM International Conference on Multimedia,2009, pp.341-350.
    [21]Piotr Indyk, Motwani Rajeev. Approximate nearest neighbors:towards removing the curse of dimensionality. In Annual ACM Symposium on Theory of Computing, 1998.
    [22]Aristides Gionis, Indyk Piotr, and Motwani Rajeev. Similarity search in high dimensions via hashing. In the 25th International Conference on Very Large Data Bases,1999.
    [23]Datar Mayur, Indyk Piotr, Immorlica Nicole, et al. Locality sensitive hashing scheme based on p-stable distributions. In Annual Symposium on Computational Geometry (SOCG), June 9-11,2004.
    [24]Deepak Ravichandran, Pantel Patrick, and Hovy Eduard. Randomized algorithms and NLP:using locality sensitive hash function for high speed noun clustering. In Annual Meeting on Association for Computational Linguistics,2005.
    [25]Alexandr Andoni, and Indyk Piotr. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In the 47th Annual Symposium on Foundations of Computer Science (FOCS),2006.
    [26]Lv Qin, Josephson William, Wang Zhe, et al. Multi-probe LSH:efficient indexing for high-dimensional similarity search. In the 33rd International Conference on Very Large Data Bases,2007, pp.950-961.
    [27]Wei Dong, Wang Zhe, Josephson William, et al. Modeling LSH for performance tuning, In ACM International Conference on Information and Knowledge Management (CIKM), October 2008, pp 26-30.
    [28]Kulis Brian, and Grauman Kristen. Kernelized locality-sensitive hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, no.6,2012, pp. 1092-1104.
    [29]Haitsma Jaap, and Kalker Ton. A highly robust audio fingerprinting system. In International Society for Music Information Retrieval (ISMIR), vol.2,2002, pp. 13-17.
    [30]Wang Avery. The Shazam music recognition service. Communications of the ACM-Music information retrieval, vol.49, no.8,2006, pp.44-48.
    [31]Wang Avery. An industrial strength audio search algorithm. In International Conference on Music Information Retrieval,2003.
    [32]Yan Ke, Hoiem Derek, Sukthankar Rahul. Computer vision for music identification. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR),2005.
    [33]Ibarrola Antonio C., and Edgar Chavez. A robust entropy-based audio-fingerprint. In IEEE International Conference on Multimedia and Expo,2006, pp.1729-1732.
    [34]Lebosse Jerome, Brun Luc, and Pailles Jean Claude. A robust audio fingerprint extraction algorithm. In International Conference on Signal Processing, Pattern Recognition, and Applications,2007, pp.269-274.
    [35]Park Mansoo, Kim H., and Yang Seung Hyun. Frequency-temporal filtering for a robust audio fingerprinting scheme in real-noise environments. ETRI journal 28, no.4, 2006, pp.509-512.
    [36]Liu Yu, Yun Hwan Sik, and Kim Nam Soo. Audio fingerprinting based on multiple hashing in DCT domain. IEEE signal processing letters, vol.16, no.6, June 2009.
    [37]Tang Jie, Liu Gang, Guo jun. Improved algorithms of music information retrieval based on audio fingerprint. In Intelligent Information Technology Application,2009.
    [38]Peter Jan O. Doets, and Lagendijk Reginald L. Distortion estimation in compressed music using only audio fingerprints. IEEE Transactions on Audio, Speech, and Language Processing, vol.16, no.2, February 2008.
    [39]Anguera Xavier, Garzon Antonio, and Adamek Tomasz. MASK:robust local features for audio fingerprinting." In IEEE International Conference on Multimedia and Expo (ICME),2012, pp.455-460.
    [40]Poulos Marios, Deliyannis Ioannis, and Floros Andreas. Audio fingerprint extraction using an adapted computational geometry algorithm. Computer and Information Science, no.6,2012, pp.88.
    [41]Smith T. F., and Waterman M. S. Identification of Common Molecular Subsequences. Molecular Biology. vol.147,1981, pp.195-197.
    [1]http://baike.baidu.com/view/5434.htm
    [2]http://baike.baidu.com/view/373613.htm
    [3]http://baike.baidu.com/view/35563.htm
    [4]http://baike.baidu.com/view/189624.htm
    [5]http://baike.baidu.com/view/189692.htm
    [6]http://baike.baidu.com/view/25661.htm
    [7]http://baike.baidu.com/view/21387.htm
    [8]http://baike.baidu.com/view/1536.htm
    [9]http://baike.baidu.com/view/3021.htm
    [10]http://baike.baidu.com/view/492048.htm
    [11]http://baike.baidu.com/view/20323.htm
    [12]http://baike.baidu.com/view/31722.htm
    [13]孙丽.哼唱检索中特征提取研究(硕士论文),北京邮电大学,2012.
    [14]http://baike.baidu.com/view/3007.htm
    [15]http://baike.baidu.com/view/2950.htm
    [16]http://baike.baidu.com/view/21803.htm
    [17]赵力.语音信号处理.机械工业出版社,2008
    [18]http://baike.baidu.com/view/2930343.htm
    [1]Ghias Asif, Logan Jonathan, Chamberlin David, et al. Query by humming: musical information retrieval in an audio database. In ACM International Conference on Multimedia,1995, pp.231-236.
    [2]http://www.music-ir.org/mirex/2012/index.php/Query_by_Singing/Humming.
    [3]Jang Jyh-Shing Roger, Lee Hong-Ru, and Kao Ming-Yang. Content-based music retrieval using linear scaling and branch-and-bound tree search. In IEEE International Conference on Multimedia and Expo, Waseda University, Tokyo, Japan,2001.
    [4]Jang Jyh-Shing Roger, Lee Nien-Jung, and Hsu Chao-Ling. Simple but effective methods for QBSH at MIREX 2006. In International Conference on Music Information Retrieval,2006.
    [5]Jang Jyh-Shing Roger, and Gao Ming-Yang. A query-by-singing system based on dynamic programming. In International Workshop on Intelligent Systms Resolutions (the 8th Bellman Continuum),2000, pp.85-89.
    [6]Jang Jyh-Shing Roger, and Lee Hong-Ru. Hierarchical filtering method for content-based music retrieval via acoustic input. In the ninth ACM International Conference on Multimedia,2001, pp.401-410.
    [7]Wu Xiao, Li Ming, Liu Jian, et al. A top-down approach to melody match in pitch contour for query by humming. In International Conference of Chinese Spoken Language Processing,2006.
    [8]Jiang Hongchen, and Xu Bo. Query by humming via multiscale transportation distance in random query occurrence context. In IEEE International Conference on Multimedia and Expo,2008, pp.1225-1228.
    [9]Ryynanen Matti, and Klapuri Anssi. Query by humming of midi and audio using locality sensitive hashing. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2008, pp.2249-2252.
    [10]Shimamura Tetsuya, and Kobayashi Hajime. Weighted autocorrelation for pitch extraction of noisy speech. IEEE Transactions on Speech and Audio Processing, vol.9, no.7,2001, pp.727-730.
    [11]Ross M., Shaffer H., Cohen Andrew, et al. Average magnitude difference function pitch extractor. IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 22, no.5,1974, pp.353-362.
    [12]Noll A. Michael. Cepstrum pitch determination. The Journal of the Acoustical Society of America, vol.41, no.2,1967, pp.293-309.
    [13]Markel John. The SIFT algorithm for fundamental frequency estimation. IEEE Transactions on Audio and Electroacoustics, vol.20, no.5,1972, pp.367-377.
    [14]http://baike.baidu.com/view/7969.htm.
    [15]Magerman David M. Statistical decision-tree models for parsing. In the 33rd Annual Meeting on Association for Computational Linguistics,1995, pp.276-283.
    [16]Suykens Johan AK, and Vandewalle Joos. Least squares support vector machine classifiers. Neural Processing Letters vol.9, no.3,1999, pp.293-300.
    [17]孙丽.哼唱检索中特征提取研究(硕士论文),北京邮电大学,2012.
    [18]Shih Hsuan-Huei, Narayanan Shrikanth S., and Kuo C-C. Jay. Multidimensional humming transcription using a statistical approach for query by humming systems. In IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.5, 2003,pp.V-541.
    [19]Shih Hsuan-Huei, Narayanan Shrikanth S., and Kuo C-CJ. An HMM-based approach to humming transcription. In IEEE International Conference on Multimedia and Expo (ICME), vol.1,2002, pp.337-340.
    [20]Pardo Bryan, Shifrin Jonah, and Birmingham William. Name that tune:A pilot study in finding a melody from a sung query. Journal of the American Society for Information Science and Technology vol.55, no.4,2003, pp.283-300.
    [21]Klee Victor, and Minty George J. How good is the simplex algorithm. No. Tr-22. Washington Univ Seattle Dept of Mathematics,1970.
    [1]http://www.music-ir.org/mirex/2012/index.php/Query_by_Singing/Humming.
    [2]Wang Qiang, Guo Zhiyuan, Liu Gang, et al. Query by humming by using locality sensitive hashing based on combination of pitch and note.In IEEE International Conference on Multimedia and Expo Workshops (ICMEW),2012, pp.302-307.
    [3]Ryynanen Matti, and Klapuri Anssi. Query by humming of midi and audio using locality sensitive hashing.In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2008, pp.2249-2252.
    [4]Jang Jyh-Shing Roger, Lee Hong-Ru, and Kao Ming-Yang. Content-based music retrieval using linear scaling and branch-and-bound tree search.In IEEE International Conference on Multimedia and Expo, Waseda University, Tokyo, Japan,2001.
    [5]Jang Jyh-Shing Roger, Lee Nien-Jung, and Hsu Chao-Ling. Simple but effective methods for QBSH at MIREX 2006. In International Conference on Music Information Retrieval,2006.
    [6]Wu Xiao, Li Ming, Liu Jian, et al. A top-down approach to melody match in pitch contour for query by humming.In International Conference of Chinese Spoken Language Processing,2006.
    [7]Gi Pyo Nam, Luong Thi Thu Trang, Nam Hyun Ha. Intelligent query by humming system based on score level fusion of multiple classifiers, EURASIP Journal on Advances in Signal Processing,2011.
    [8]Jang Jyh-Shing Roger, and Gao Ming-Yang. A query-by-singing system based on dynamic programming. In International Workshop on Intelligent Systms Resolutions (the 8th Bellman Continuum),2000, pp.85-89.
    [9]Jang Jyh-Shing Roger, and Lee Hong-Ru. Hierarchical filtering method for content-based music retrieval via acoustic input.In the ninth ACM International Conference on Multimedia,2001, pp.401-410.
    [10]Smith T. F., and Waterman M. S. Identification of Common Molecular Subsequences. Molecular Biology. vol.147,1981, pp.195-197.
    [11]Pardo Bryan, Shifrin Jonah, and Birmingham William. Name that tune:A pilot study in finding a melody from a sung query. Journal of the American Society for Information Science and Technology vol.55, no.4,2003, pp.283-300.
    [1]Datar Mayur, Indyk Piotr, Immorlica Nicole, et al. Locality sensitive hashing scheme based on p-stable distributions. In Annual Symposium on Computational Geometry (SOCG), June 9-11,2004.
    [2]Wang Qiang, Guo Zhiyuan, Liu Gang, et al. Entropy based locality sensitive hashing.In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2012, pp.1045-1048.
    [3]Wang Qiang, Guo Zhiyuan, Liu Gang, et al. Boundary-expanding locality sensitive hashing. In the 8th International Symposium on Chinese Spoken Language Processing (ISCSLP),2012, pp.358-362.
    [4]Guttman, Antonin. R-trees:a dynamic index structure for spatial searching. ACM, vol.14, no.2,1984.
    [5]Katayama Norio, and Satoh Shinichi. The SR-tree:an index structure for high-dimensional nearest neighbor queries. ACM SIGMOD Record, vol.26, no.2, 1997, pp.369-380.
    [6]Skopal Tomas. Pivoting M-tree:a metric access method for efficient similarity search. In the Dateso 2004 Annual International Workshop on DAtabases, TExts, Specifications and Objects, Desna, Czech Republic, April 2004, pp.27-37.
    [7]Vieira Marcos R., Traina Caetano, Chino Fabio JT, et al. DBM-tree:A dynamic metric access method sensitive to local density data.In Brazilian Symposium on Database (SBBD),2004.
    [8]Beygelzimer Alina, Kakade Sham, and Langford John. Cover trees for nearest neighbor. In Machine Learning-International Workshop Then Conference, vol.23, 2006, pp.97.
    [9]Weber Roger, Schek Hans-Jorg, and Blott Stephen. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In International Conference on Very Large Data Bases,1998, pp.194-205.
    [10]Arya Sunil, Mount David M., Netanyahu Nathan S., et al. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM (JACM) vol.45, no.6,1998, pp.891-923.
    [11]Sariel Har-Peled. A replacement for Voronoi diagrams of near linear size. In 42nd IEEE Symposium on Foundations of Computer Science,2001, pp.94-103.
    [12]Indyk Piotr, and Motwani Rajeev. Approximate nearest neighbors:towards removing the curse of dimensionality. In the thirtieth Annual ACM Symposium on Theory of Computing,1998, pp.604-613.
    [13]Kleinberg Jon M. Two algorithms for nearest-neighbor search in high dimensions. In the twenty-ninth Annual ACM Symposium on Theory of Computing,1997, pp. 599-608.
    [14]Kushilevitz Eyal, Ostrovsky Rafail, and Rabani Yuval. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM Journal on Computing vol.30, no.2,2000, pp.457-474.
    [15]Beyer Kevin, Goldstein Jonathan, Ramakrishnan Raghu, et al. When is "nearest neighbor" meaningful? In Database Theory (ICDT),1999, pp.217-235.
    [16]Alexander Hinneburg, Aggarwal Charu C., and Keim Daniel A.. What is the nearest neighbor in high dimensional spaces? Bibliothek der Universitat Konstanz, 2000.
    [17]Gionis Aristides, Indyk Piotr, and Motwani Rajeev. Similarity search in high dimensions via hashing. In International Conference on Very Large Data Bases,1999, pp.518-529.
    [18]Cohen Edith, Datar Mayur, Fujiwara Shinji, et al. Finding interesting associations without support pruning. IEEE Transactions on Knowledge and Data Engineering, vol. 13, no.1,2001, pp.64-78.
    [19]Haveliwala Taher, Gionis Aristides, and Indyk Piotr. Scalable techniques for clustering the web.2000.
    [20]Yang Cheng. Macs:Music audio characteristic sequence indexing for similarity retrieval. In IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics,2001, pp.123-126.
    [21]Georgescu Bogdan, Shimshoni Ilan, and Meer Peter. Mean shift based clustering in high dimensions:A texture classification example. In Ninth IEEE International Conference on Computer Vision,2003, pp.456-463.
    [22]Casey Michael A., Veltkamp Remco, Goto Masataka, et al. Content-based music information retrieval:current directions and future challenges. Proceedings of the IEEE, vol.96, no.4,2008, pp.668-696.
    [23]Yu Yi, Crucianu Michel, Oria Vincent, et al. Local summarization and multi-level LSH for retrieving multi-variant audio tracks. In 17th ACM International Conference on Multimedia,2009, pp.341-350.
    [24]Jing Yushi, and Baluja Shumeet. Visualrank:Applying pagerank to large-scale image search. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no.11,2008, pp.1877-1890.
    [25]Chum Ondej, Philbin James, Isard Michael, et al. Scalable near identical image and shot detection. In the 6th ACM International Conference on Image and Video Retrieval, vol.9, no.11,2007, pp.549-556.
    [26]Alexandr Andoni, and Indyk Piotr. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS),2006, pp.459-468.
    [27]Lv Qin, Josephson William, Wang Zhe, et al. Multi-probe LSH:efficient indexing for high-dimensional similarity search. In the 33rd International Conference on Very Large Data Bases,2007, pp.950-961.
    [28]Panigrahy Rina. Entropy based nearest neighbor search in high dimensions. In the seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm,2006, pp. 1186-1195.
    [29]Wei Dong, Wang Zhe, Josephson William, et al. Modeling LSH for performance tuning, In ACM International Conference on Information and Knowledge Management (CIKM), October 2008, pp 26-30.
    [30]Mu Yadong, and Yan Shuicheng. Non-metric locality-sensitive hashing. In AAAI Conference on Artificial Intelligence,2010.
    [31]Kulis Brian, and Grauman Kristen. Kernelized locality-sensitive hashing for scalable image search. In 12th IEEE International Conference on Computer Vision, 2009, pp.2130-2137.
    [32]Kulis Brian, and Grauman Kristen. Kernelized locality-sensitive hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, no.6,2012, pp. 1092-1104.
    [33]Pauleve Loic, Jegou Herve, and Amsaleg Laurent. Locality sensitive hashing:A comparison of hash function types and querying mechanisms. Pattern Recognition Letters vol.31, no.11,2010, pp.1348-1358.
    [34]Dasgupta Anirban, Kumar Ravi, and Sarlos Tamas. Fast locality-sensitive hashing. In the 17th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD),2011, pp.1073-1081.
    [35]Zolotarev Vladimir M. One-dimensional stable distributions. Amer Mathematical Society, vol.65,1986. [36] Andoni Alexandr, and Piotr Indyk. E21sh 0.1 user manual.2005.
    [37]Jegou Herve, Amsaleg Laurent, Schmid Cordelia, et al. Query adaptative locality sensitive hashing. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2008, pp.825-828.
    [38]Logan B, Mel Frequency Cepstral Coefficients for Music Modeling. In International Symposium on Music Information Retrieval (ISMIR),2000.
    [39]Wang Qiang, Liu Gang, Guo Zhiyuan, et al. Structural fingerprint based hierarchical filtering in song identification. In IEEE International Conference on Multimedia and Expo (ICME),2011, pp.1-4.
    [40]Mikolajczyk Krystian, Tuytelaars Tinne, Schmid Cordelia, et al. A comparison of affine region detectors. International Journal of Computer Vision vol.65, no.1,2005, pp.43-72.
    [41]Qian Yueliang, Liu Yang, Liu Hong, et al. An Introduction to corpora resources of 863 program for chinese language processing and human-machine interaction. In ALR2004, affiliated to IJCNLP.2004.
    [1]Ryynanen Matti, and Klapuri Anssi. Query by humming of midi and audio using locality sensitive hashing. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2008, pp.2249-2252.
    [2]Guo Zhiyuan, Wang Qiang, Liu Gang, et al. A query by humming system based on locality sensitive hashing indexes. Signal Processing, vol.93, no.8,2013, pp. 2229-2243.
    [3]Jang Jyh-Shing Roger, Lee Hong-Ru, and Kao Ming-Yang. Content-based music retrieval using linear scaling and branch-and-bound tree search. In IEEE International Conference on Multimedia and Expo, Waseda University, Tokyo, Japan,2001.
    [4]Jang Jyh-Shing Roger, Lee Nien-Jung, and Hsu Chao-Ling. Simple but effective methods for QBSH at MIREX 2006. In International Conference on Music Information Retrieval,2006.
    [5]Wu Xiao, Li Ming, Liu Jian, et al. A top-down approach to melody match in pitch contour for query by humming. In International Conference of Chinese Spoken Language Processing,2006.
    [6]Wang Qiang, Guo Zhiyuan, Liu Gang, et al. Entropy based locality sensitive hashing. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2012, pp.1045-1048.
    [7]http://www.music-ir.org/mirex/2012/index.php/Query_by_Singing/Humming.
    [8]Wang Qiang, Guo Zhiyuan, Li Baoxiang, et al. Tempo variation based multilayer filters for query by humming. In 21st International Conference on Pattern Recognition (ICPR),2012, pp.3034-3037.
    [1]Pedro Cano, Batlle Eloi, Kalker Ton, et al. A review of audio fingerprinting. Journal of VLSI Signal Processing, vol.41, no.3,2005, pp.271-284.
    [2]Haitsma Jaap, and Kalker Ton. A highly robust audio fingerprinting system. In International Society for Music Information Retrieval (ISMIR), vol.2,2002, pp. 13-17.
    [3]Haitsma Jaap, and Kalker Ton. A highly robust audio fingerprinting system with an efficient search strategy. Journal of New Music Research, vol.32, no.2,2003, pp. 211-221.
    [4]Wang Avery. The Shazam music recognition service. Communications of the ACM-Music information retrieval, vol.49, no.8,2006, pp.44-48.
    [5]Wang Avery. An industrial strength audio search algorithm. In International Conference on Music Information Retrieval,2003.
    [6]Peter Jan O. Doets, and Lagendijk Reginald L. Distortion estimation in compressed music using only audio fingerprints. IEEE Transactions on Audio, Speech, and Language Processing, vol.16, no.2, February 2008.
    [7]Liu Yu, Yun Hwan Sik, and Kim Nam Soo. Audio fingerprinting based on multiple hashing in DCT domain. IEEE signal processing letters, vol.16, no.6, June 2009.
    [8]Jin S. Seo, Jin Minho, Lee Sunil, et al. Audio fingerprinting based on normalized spectral subband moments. IEEE Signal Processing Letters, vol.13, no.4,2006, pp. 209-212.
    [9]Jiao Yuhua, Ji Liping, Niu Xiamu. Robust speech hashing for content authentication. IEEE Signal Processing Letters, vol.16, no.9,2009, pp.818-821.
    [10]Schreiber Hendrik, Grosche Peter, Muller Meinard, A re-ordering strategy for accelerating index-based audio fingerprinting. In the 12th International Society for Music Information Retrieval Conference,2011.
    [11]Liu Jixin, Zhang Tingxian. Wavelet-based audio fingerprinting algorithm robust to linear speed change. Computing and Intelligent Systems, vol.234,2011, pp. 360-368.
    [12]Tang Jie, Liu Gang, Guo jun. Improved algorithms of music information retrieval based on audio fingerprint. In Intelligent Information Technology Application,2009.
    [13]Wang Qiang, Liu Gang, Guo Zhiyuan, et al. Structural fingerprint based hierarchical filtering in song identification. In IEEE International Conference on Multimedia and Expo (ICME),2011, pp.1-4.
    [14]Wang Qiang, Guo Zhiyuan, Liu Gang, et al. Audio Fingerprinting Based on N-grams, International Journal of Digital Content Technology and its Applications (JDCTA), vol.6, no.10,2012, pp.361-368.
    [15]赵力.语音信号处理.机械工业出版社,2008.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700