基于内容的音频检索技术研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
如何有效地对海量数据尤其是诸如音频之类的多媒体数据进行分析、存储和检索是一个亟待解决的问题。由于原始音频数据的非结构化特性,音频检索受到极大的限制。相对于日益成熟的图像与视频检索,音频检索相对滞后。基于内容的音频检索已成为多媒体检索技术的研究热点。本文对基于内容的音频检索的关键技术展开分析,主要在以下几个方面开展了工作:
     1、音频信号特征提取与表达。音频检索是多特征组合检索,对音频信号的感知特征如响度、亮度、音调等;物理特征如过零率、Mel倒谱系数、线性预测系数等进行了分析,不同的特征组合应用于不同类型的音频检索。
     2、音频分割与识别。将音频分层分割算法改进为基于模板的音频分割算法,利用隐马尔可夫模型的良好的随机时序性以及不依赖于具体的应用阈值的优势,使分割识别准确率有较大提高。随着MPEG压缩格式成为多媒体编码主流,研究了直接对MP3格式的音频信号提取特征,基于MPEG压缩域特征音频分割。
     3、基于内容的音频检索。从音频示例查询的角度,针对不同的音频例子表示方法,分别研究了基于隐马尔可夫模型分类模板的音频例子检索算法,和基于模糊聚类音频例子检索算法。针对音乐(歌曲)检索独有特性,研究了基于哼唱的音乐(歌曲)检索,实验证明,此算法有一定的准确性。
     本文设计实现了一个基于内容的音频检索原型系统,是具有良好扩展性的系统,实现了高速、有效的音频检索。最后对基于内容的音频检索系统的发展趋势研究热点进行了展望。
How to analyze . store and retrieve the huge amount of data efficiently and effectively, especially for
    those multimedia data is an imperative problem. Because of the nonstructural character, the development of audio retrieval is restricted hugely. Comparing with image and video retrieval, audio retrieval study is behindhand. Content-based audio retrieval has been the studied hotspot of multimedia retrieval. This paper focuses on the key techniques of content-based audio retrieval, developed mainly in the following aspect:
    1. This paper analyzes audio feature extraction and expression. Audio retrieval completes through multiple features combination. This paper studies the audio perceptive features, such as loudness, brightness and pitch etc, and the audio physical feature, such as zero-crossing rate, linear prediction coefficient and Mel cepstrum coefficient etc. Different features combination can be applied in different audio retrieval.
    2. This paper study audio segmentation and recognize, and proved the audio layered segmentation algorithm to template-based audio segmentation algorithm, making use of the Hidden Markov Model's better stochastic sequence and superiority independent of concrete threshold, the veracity of segmentation and recognize is enhanced a lot Because compressed audio format MPEG has been the mainstream of multimedia encoding, this paper studies the feature extraction onMP3 directly and audio segmentation on MPEG.
    3. This paper study content-based audio retrieval. Studying from query by audio example, aiming at different audio example expression, mis paper studies the query by audio example algorithm based on Hidden Markov Model classification template and fuzzy clustering centroid respectively Aiming at music(song)'s unique character, mis paper studies muskXsong) retrieval by Humming, and does some performance test, the algorithm has high veracity.
引文
[1]. The Bulldog Group Research Report, http://wwwbulldogcom
    [2].卢坚,陈毅松,孙正兴,张福炎 基于隐马尔可夫模型的音频自动分类[J] 软件学报 2002.13(8):1593-1597.
    [3]. Erling Wood el. At, Content based classification, search, and retrieval of audio. [J]IEEE Multimedia,1996
    [4] 李国辉,李恒峰基于内容的音频检索:概念和方法[J].小型微型计算机系统 2000.11:1173-1177.
    [5] 王先全 李建蜀 WAVE声音文件格式的分析及其合成[J].电脑开发与应用 1999.07:29-30
    [6] Noll Peter, "Mpeg digital audio coding," IEEE Signal Processing Magazine, pp. 59-81, Sep. 1997.
    [7] ISO/IEC JTC1/SC29, Information Technology-Coding of Moving Pictures and Associate Audio for Digital Storage Media at up to about 1.5Mbit/s-IS 11172 (Part 3,Audio), 1992.
    [8] ISO/IEC JTC1/SC29, Information Technology-Genetic Coding of Moving Pictures an Associate Audio Information-IS 13818 (Part 3,Audio), 1994.
    [9] 苏信东 音频文件格式全介绍 http://www.waveen.com
    [10] 张春林 杨玉红 胡瑞敏 音频内容分割与聚类的研究[J].计算机工程,2002.7:173-174
    [11] 卢坚,陈毅松,孙正兴,张福炎 语音/音乐自动分类中的特征分析[J] 计算机辅助设计与图形学学报 2002.3:232-237.
    [12] 李恒蜂,李国辉 基于内容的音频检索与分类[J].计算机工程与应用 2000.7:54-56.
    [13] 庄越挺 潘云鹤 吴飞 网上多媒体信息分析与检索[M].清华大学出版社,2002
    [14] Guojun Lu and Templar Hankinson A Technique towards Automatic Audio Classification and Retrieval
    [15] Noll Peter, "Mpeg digital audio coding," IEEE Signal Processing Magazine, pp. 59-81, Sep. 1997.
    [16] George Tzanetakis, Perry Cook, "Sound Analysis Using MPEG Compressed Audio", ICASSP-2000, Istanbul, USA (Ⅱ-761)
    [17] George Tzanetakis, Perry Cook, "Multi feature audio segmentation for browsing
    
    and annotation," in Proc.1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA99, New Palz, NY, 1999
    [18] George Tzanetakis, Perry Cook. Sound Analysis Using MPEG Compressed Audio. Istanbul, USA(Ⅱ-761 ): ICASSP-2000
    [19] George Tzanetakis, Perry Cook. Multi feature audio segmentation for browsing and annotation. New Palz, NY: in Proc1999 IEEE Workshop on Application of Signal Processing to Audio and Acoustics, WASPAA99, 1999
    [20] 郑科 庄越挺 吴飞 刘骏伟 基于压缩域特征话者识别的多媒体分类检索[J] 人工只能与模式识别,2002年3月,第15卷,第1期
    [21] L Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, February 1989, 77, No 2, 257-289
    [22] L Rabiner and B Juang. Fundamentals of speech recognition. Pretice Hall, 1993
    [23] 庄越挺 毛讳 吴飞.基于隐马尔可夫链的广播新闻分类.计算机研究与发展,2002-3-16
    [24] G Tzanetakis, P Cook. Multifeature audio segmentation for browsing and annotation. In:Proc of 1999 IEEE Workshop on Application of Signal Processing to Audio and Acoustics. New Paltz, YU, 1999
    [25] Z Liu, J Huang, Y Wangetal. Audio feature extraction and analysis for scene classification. In:Proc of IEEE Signal Pro-cessing Society 1997 Workshop on Multimedis Signal Processing Princeston. New Jersey, USA, 1997
    [26] Bregrnan, A. S, Auditory Scene Analysis, MIT Press, 1990
    [26] Malcolm Slaney, "Lyon's Cochlea Model," Apple Technical Report #13 (available from the Apple Corporate Library, Cupertino, CA 95014), 1988
    [27] Wang, Z. Liu, and J. Huang, "Multimedia Content Analysis Using Audio and Visual Information," IEEE Signal Processing Magazine (invited paper), Nov. 2000.
    [28] DAN ELLIS, MANUEL REYES. SPEECH AND AUDIO PROCESSING ANDRECOGNITION FIAL PROJECT, TECHNICAL REPORT SPRING 2001 COLUMBIA UNIVERSITY
    [29] George Tzanetakis, Perry Cook. Muti feature audio segmentation for browsing and annotation. New Palz, NY: in Proc 1999 IEEE Workshop on Application ofSignal Processing to Audio and Acoustics, WASPAA99,1999
    [30] Gerald Salton, Edward Fox, Harry Wu. Extended Boolean Information Retrieval. Communications of the ACM, December 1983, Vol 26, No 11, page 1022
    
    
    [31] Asif Ghias, Jonathan Logan, David Chamberlin, Bran C Smith. Query By Humming-Musical Information Retrieval in an Audio Database. San Francisco,California:ACM Multimedia 95-Electronic Proceedings, November 5-9, 1995
    [32] James D Wise, James R Caprio and Thomas W Parks. Maximun likelihood pitch estimation. IEEE Trans Acoustics, Speech. Signal Processing, October 1976, 24(5):418-423
    [33] A V Openheim. A speech analysis-synthesis system based on homomorphic filtering. J Acoustical Society of America, February 1969, 45, 458-465
    [34] G Landau and U Vishkin. Efficient string matching with k mismatches. Theoretical Computer Science, 1986, 43:239-249
    [35] Ricardo Baesa-Yates and GH Gonnet. Fast string matching with mismatches. Information and Computation, 1992
    [36] 绉旭楷 王素琴 允许错误的并行字符串匹配技术[J] 计算机研究与发展 1995.2:34-38
    [37] E Chalom and V M Bove, Jr. Segmentation of an Image Sequence using Multi-Dimensional Image Attributes. Proc IEEE ICIP-96, 1996, pp 525-528
    [38] Bregman A(1990). Auditory Scene Analysis. Cambridge Ma: MIT press.
    [39] Eric Dsehiter. Sound Scene Segmentation by Dynamic Detection of Correlogram Comodulation. Mit Media Laboratory Perceptual Computing Section Technical Report No 491, April 1999
    [40] A V Nefian and M H Hayes. Hidden Markov Models for Face Recognition. In ICASSP98, 199g,pp 2721-2724