复合乐音的多基频提取
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着计算机网络和多媒体技术的发展,基于复合乐音的多基频提取已经成为乐音信号处理中不可或缺的技术。
     本文首先介绍了多基频提取的基础知识。它们包括听觉模型、听觉滤波器、小波变换、STFT与MDCT变换、自相关及平均幅度差分的求解。其次,描述了本文研究的多基频提取技术:基于听觉模型、二进小波变换近似系数相乘、Specmurt算法的多基频提取。最后,对以上典型技术进行了改进并比较了改进前后的提取效果。
     基于听觉模型的多基频提取是较为传统的一种方法。它可分为:Anssi Klapuri提出的多通道模型、M.Y.Wu提出的多通道分高低频模型、TeroTolonen提出的基于两通道的多基频提取。在这三种方法中,基于两通道的多基频提取得到了更好的基频分布。基于听觉模型的多基频提取采用了归一化自相关、傅里叶变换求自相关方法、增强自相关的方法来修正自相关对求解周期分布存在的缺陷,其中归一化自相关减少了分帧加窗所带来的影响,傅里叶变换求自相关提高了峰值显著度,而增强自相关则消除了倍数周期。
     二进小波变换近似系数相乘算法增强了峰值显著度,它使得基频的位置更容易被找到。本文首先采用仿真信号对尺度的选择进行了比较,实验表明选用三个近似系数相乘时效果是最佳的。文中发挥了基于听觉模型的多基频提取的优势,对它进行了改进。将两通道模型和二进小波变换近似系数相乘结合在一起,这种结合提高了提取基频的查准率和查全率
     相对于听觉模型方法,Specmurt算法提高了提取基频的分辨率,更加合理的利用了音乐的谐波结构。相对于二进小波变换来说,它的干扰频率较少,“野点”的剔除比较容易。Specmurt算法可以通过MDCT、STFT、复小波变换的方法实现。其中MDCT算法主要是针对MP3乐音存储格式。
     本文对上述三种典型技术利用仿真信号、单基频叠加的复合乐音及MIDI复合乐音进行了仿真实验,并分别从理论分析、实验验证两个方面对比分析了各算法的多基频提取性能。
With the development of computer networks and multimedia technology, multi-pitch detection from polyphonic music has become the key issue in the music signal processing.
     Firstly, this paper briefly introduces the fundamental concepts of multi-pitch detection, such as auditory system, auditory filters, wavelet transform, autocorrelation, STFT and MDCT. Secondly, it depicts the typical techniques of multi-pitch detection namely multi-pitch detection using auditory model, detection multi-pitch by Specmurt and detection multi-pitch by dyadic wavelet transform approximate coefficient multiply. Finally, typical techniques are improved and performance before and after modification is compared.
     Multi-pitch detection using auditory model is a traditional way. it utilizes auditory masking effects. It can be divided into three methods:multi-channel detection putting forward by Klapuri, high and low frequency multi-channel detection by M.Y.Wu, two-channel detection by Tolonen. Two-channel detection obtains better multi-pitch distribution than others. Multi-pitch detection using auditory model solves period distribution in three ways using normalized autocorrelation, autocorrelation by Fourier transform and enhanced autocorrelation which revises the traditional autocorrelation. Normalized autocorrelation reduces the influence of window function. Autocorrelation via Fourier transform enhances the saliency of peaks. While enhanced autocorrelation eliminates multiple periods.
     Detection multi-pitch by multiplication of approximate coefficients of dyadic wavelet transform enhances the saliency of peaks. Multi-pitch can be found more easily. In this paper, multiplications of different scale are compared by synthesized signal. The result confirms that approximate coefficients of the first three scales work best. Experimental results show that it improves the precision and recall.
     Compared with multi-pitch detection using auditory model, detection multi-pitch by Specmurt which uses music harmonic structure properly improves the resolution. Compared with wavelet approximate coefficient multiply, it has less interference frequency and is easier to eliminate outliers. Specmurt can be realized by MDCT, STFT, complex wavelet transform.
     This paper simulates three categories of multi-pitch detection discussed above and assesses their objective performance evaluation. The two objective evaluation indicators are precision and recall.
引文
[1]A.Klapuri. Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model[J], IEEE Trans. Audio, Speech and Language Processing, vol.16, pp.255-266, February 2008.
    [2]R.P.Paiva. An Approach for Melody Extraction from Polyphonic Audio:Using Perceptual Principles and Melodic Smoothness [J], The Journal of The Acoustical Society of America, Vol.122, pp.2962.
    [3]M.Y.Wu,D.L.Wang.Pitch Tracking Based on Statistical Anticipation[J].in Proc. IJ-CNN, vol.2,2001, pp.866-871.
    [4]X.L.Zhang, W.J.Liu and P.L.B.Xu. Multipitch Detection Based on Weighted Summary Correlogram[J], in Proc.IEEE Int.conf.Chinese Spoken Language Proce-ssing,2008,pp.1-4.
    [5]M.Y. Wu and D.L.Wang. A Multi-Pitch Tracking for Noisy Speech [J], in Proc,IEEE Int.conf.Speech and Noise processing,2002,pp.1.369-1.372.
    [6]T.Tolonen.A Computationaolly Efficient Multipitch Analysis Model [J], in Proc.IEEE Int.conf.Speech and Audio processing,2000, pp.708-716.
    [7]S.Satio, H.Kameoka and K.Takahashi. Specmurt Analysis Of Polyphonic Music Signals [J], in Proc.IEEE Int.conf Audio Speech,and Language Processing, IEEE, 2008,pp.639-650.
    [8]S.Sagayama, H.Kameoka, S.Saito and T.Nishimoto.'Specmurt Anasylis'of Multi-Pitch Signals [J], Proc.IEEE-EURASIP International Workshop on Nonlinear Signal and Image Processing, Sapporo,Japan.
    [9]S.Sagayama, H.Kameoka, S.Saito and T.Nishimoto. Specmurt Analysis of Multi-pitch Music Signals with Adaptive Estimation of Common Harmonic Structure[J], Proc.International Conference on Music Informatino Retrieva (ISM1R2005),84-91, 2005, London,England.
    [10]S. Sagayama, H. Kameoka, T. Nishimoto. Specmurt Anasylis:A Piano-Roll-VIsu a-lization of Polyphonic Music Signal by Deconvolution of Log-Frequency Spectrum [J], Proc.2004 ISCA Tutorial and ResearchWorkshop on Statistical and Perceptual Audio Processing (SAPA2004),Oct.2004.
    [11]M.A.BenMessaoud, A.Bouzid and N.Ellouze. Spectral Multi-scale Analysis for Mul-tipitch Tracking [J], in Proc.IEEE Int.conf.Digital Signal Processing Workshop and 5th IEEE Signal Processing Education Workshop,2009, pp.26-31.
    [12]Klapuri A. Automatic Music Transcription as We Know it Today [J]. Journal of New Music Research,2004,33(3):269-282.(WASPAA), Mohonk, NY, USA,2005.
    [13]Fletcher N H. The Physics of Musical Instruments. New York:Springer,1998.
    [14]Ellis D P W. Prediction-driven Computational Auditory Scene Analysis [D]. Boston, MA, USA:MIT Media Laboratory,1996.
    [15]Rosenthal D F. Machine Rhythm:Computer Emulation of Human Rhythm Percep-tion [D].Boston, MA, USA:Massachusetts Institute of Technology,1992.
    [16]Cooke M,Ellis D P W. The Auditory Organization of Speech and Other Sources in Listeners And Computational Models [J].Speech Communication,2001,35:141-177.
    [17]Slaney M. An Effcient Implementation of The Patterson-holdsworth Auditory Filter Bank.Technical Report [J],Apple Computer Technical Report 35,1993.
    [18]Goto M. A Real-Time Music Scene-Description System:Predominant-FO Estimation for Detecting Melody And Bass Lines in Real-world Audio Signals [J]. Speech Communication,2004,43(4):311-329.
    [19]Plumbley MD,Abdallah SA,Bello JP. Automatic Music Transcription and Audio-source Separation [J].Cybernetics and Systems,2002,33(6):603-627.
    [20]俞一彪,袁冬梅,薛峰.一种适于说话人识别的非线性频率尺度变换[J].声学学报.2008,33(05):450-455.
    [21]付强,易克初.语音信号的Bark子波变换及其在语音识别中的应用[J].电子学报.2000,28(10):1-4.
    [22]袁冬梅.面向说话人识别的非线性频谱变换[D];[学位论文].苏州:苏州大学,2007.
    [23]B.C.J.Moore,B.R.Glasberg. A Revision of Zwicker's Loudness Model [M].Acta Acustica.1996,82:335-345.
    [24]Julius O.Smith,Jonathan S.Abel. Bark and ERB Bilinear Transforms [J]. IEEE Transactions on Speech and Audio Processing.1999,7(6):697—708
    [25]X.Mei,J.Pan and S.Sun. Efficient Algorithms for Speech Pitch Estimation [J].Proc.of ISIMVSP-2001. pp:421-424,2001.
    [26]C.Kwanun and S.C.Yang. A Pitch Extraction Algorithm Based on LPC Inverse Filtering and AMDF [J]. IEEE Trans.Acoust,Speech,Signal Processing. ASSP-25(6): 565-572,1977
    [27]Ryyna"nen M,Klapuri A. Polyphonic Music Transcription Using Note Event Modeling [J]. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
    [28]H.Indefrey,w.Hess,and G.Seeser. Design and Evaluation of Double Transform Pitch Determination-Preliminary Results [J],in Proc.IEEE Int.conf.Acoastics, speech, Signal Processing,1985,pp.11.11.1-11.11.4.
    [29]M.Y.Wu,D.L.Wang. Pitch Tracking Based on Statistical Anticipation [J].in Proc. IJCNN, vol.2,2001, pp.866-871.
    [30]F.Kaiser. On t Simple Algorithm to Calculate the'energy'of A Signal [J],in Proc.IEEE ICASSP,1990,pp.381-384.
    [31]J.Heckroth. Tutorial on MIDI and Wavetable Music Synthesis [J],The MIDI Manufacturers Association,1995.
    [32]吕伯平.音高音程及律制浅谈[J],天津音乐学院学报,1994,3,14,pp59-61.
    [33]林胜,纪涌,全子一.MPEG-Ⅲ声音编码算法[J],声电技术,1999,7,pp3-6.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700