非特定人鲁棒性语音识别中前端滤波器的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
非特定人语音识别在于净环境下识别性能良好,但在噪声情况下,其系统性能将会大大下降。不仅如此,其识别率还受到语音多变性的影响,使识别的难度加大。本文针对非特定人识别系统中的噪音鲁棒性和多变性鲁棒性问题,对在特征提取时起重要作用的前端滤波器进行研究。分别从听觉感知和语音信号本身这两个角度出发来设计滤波器,使得滤波器更符合人耳听觉特性,或更精确地分析待识别的语音信号。抗噪实验表明,随着滤波器性能的不断提高,对应提取特征的噪音鲁棒性逐渐提高,不仅如此,多变性鲁棒性的实验表明,滤波器性能的提高与多变性鲁棒性的提高是一致的。本文主要完成了如下工作:
     (1)在FIR滤波器设计的基础上,给出Laguerre滤波器设计的详细步骤,并用后者代替前者用于过零峰值幅度(Zero Crossing Peak Amplitude, ZCPA)特征的提取。给出频域法实现Laguerre滤波器提取ZCPA特征的详细过程。Laguerre滤波器具有FIR滤波器的线性相位和ⅡR滤波器的长时记忆性,弥补了FIR滤波器通阻带特性差的缺点。实验表明,精确设计每一通道的中心频率和带宽得到的Laguerre滤波器较FIR滤波器明显提高了噪音鲁棒性。
     (2)针对FIR, Laguerre滤波器带宽呈对称性分布,不符合人耳听觉特性这一缺点,设计实现了弯折滤波器组(Warped Filter Banks, WFBs),并将其应用于ZCPA特征提取。通过一阶全通函数中的弯折因子p控制滤波器中心频率和带宽的分布,从而得到非均匀的频带分布和非对称性的带宽分布。典型的弯折因子p=0.48,p=0.63分别对应Bark, ERB尺度滤波器。同FIR, Laguerre滤波器相比,WFBs不需要严格控制每一通道的中心频率和带宽,而是同时得到16个通道的频率响应。实验表明,非均匀分布的频带和非对称分布的带宽较均匀分布的频带和对称分布的带宽明显提高了识别率;同FIR, Laguerre滤波器相比,尽管WFBs设计简单,但满足非对称性带宽分布的特性,因此ERB尺度的WFBs识别率更高,其噪音鲁棒性更好。
     (3)从待识别的语音信号本身出发,依据数字信号处理理论设计出优化滤波器组(Optimized Filter Bank, OFB)模型,并简化得到自适应带宽滤波器组(Adaptive Bands Filter Bank, ABFB)模型。FIR, Laguerre以及WFBs均是在人耳听觉感知准则上建立的滤波器模型,而OFB的设计则创新性地以识别性能为基准,首次通过遗传算法将前端滤波器和后端识别系统结合为一个整体,形成一个闭环系统进行优化。实现表明,OFB模型较Bark尺度滤波器明显提高了识别率,但由于其个数较多,不利于应用。因此简化OFB模型后得到ABFB模型,实验表明后者识别率仍明显高于Bark尺度滤波器,甚至优于ERB尺度滤波器。因此FIR, Laguerre, WFBs, ABFB四种滤波器中,ABFB滤波器的噪音鲁棒性最好,这也表明从分析语音信号本身出发对滤波器设计的重要性。
     (4)滤波器通道的个数,对滤波器分析信号的精度也有一定的影响。FIR,Laguerre, WFBs以及ABFB滤波器都是采用16通道的带通滤波器和16个频率箱提取ZCPA。使用Gammatone(GT)滤波器提取ZCPA时,采用K通道带通滤波器,并设计相应数目的频率箱接收幅度信息。实验表明,18通道较其他通道数的GT滤波器识别效果更好。
     (5)将FIR, GT, Laguerre,以及WFBs滤波器应用于多变性语料库的非特定人识别中,实验表明,随着滤波器性能的完善,其多变性鲁棒性也逐渐提高;且同MFCC特征相比,ZCPA在支持向量机(Support Vector Machine, SVM)系统下较在隐马尔可夫模型(Hidden Markov Model, HMM)下具有更好的多变性鲁棒性。
The speaker independent speech recognition performs well under clean environmental conditions. In noisy environment, however, the recognition rate drops dramatically. Moreover, the recognition accuracy is also affected by the speech variability, which certainly increases the recognition difficulty. Aiming at improving the robustness with respect to the noise and the speech variability, this thesis was mainly focused on the research of the front-end filter which played a significant role in the feature extraction process. The designing approach of the filter was based on perceptual criteria and the speech signal itself respectively, which ensured the filter more matchable with human hearing property or more elaborately analyzing the speech signal. The noise robustness experiments show that, with the improvement of the filter property, the corresponding feature is more robust. Furthermore, the improved filter performance and the increased variability robustness are consistent. The main contributions of this thesis are presented as follows.
     (1)Based on the FIR filter, the designing approach of Laguerre filter was described in details. The Laguerre filter was used instead of FIR in extracting ZCPA (Zero Crossing Peak Amplitude). The process of Laguerre filter in extracting ZCPA in frequency domain was illustrated carefully. The Laguerre filter not only had the FIR's linear phase, but also had the long memories of IIR's. It compensated for the poor stop-band and pass-band property in FIR. The experiments show that, the Laguerre filter, which was provided exactly each channel's center frequency and bandwidth, is more robust compared to FIR.
     (2)The FIR and Laguerre filters had the symmetrical bandwidth, which did not fit for the human hearing property. In order to solve this problem, the WFBs (Warped Filter Banks) were completed, which were used for ZCPA extraction. The warped factorρin the first-order all-pass function controlled the center frequency and bandwidth distribution of filters. Thus the bands were nonuniform and each bandwidth was un symmetrical. The typicalρ=0.48 andρ=0.63 corresponded to the Bark-scale and ERB-scale separately. Compared to FIR and Laguerre, the WFBs required no exactly each band's center frequency and bandwidth. It got the 16 channels frequency response simultaneously. The experiments show that, compared to uniform bands and symmetrical bandwidth, the nonuniform bands and unsymmetrical bandwidth improve the recognition rates significantly. Moreover, compared to FIR and Laguerre filters, although the WFBs have a simple design method, they have the unsymmetrical bandwidth. Therefore, the ERB-scale WFBs has the better recognition results and noise robustness.
     (3)For analyzing the speech signal itself, based on digital signal processing theory, the OFB (Optimize Filter Bank) was proposed. Then ABFB (Adaptive Bands Filter Bank) was represented. Although FIR, Laguerre and WFBs were filter models which were based on human hearing criteria, the OFB model innovatively used recognition performance for benchmark. This approach originally combined the front-end filter and back-end recognition system as a closed circuit for optimization by Genetic Algorithm. The experiments show that the OFB model outperforms Bark-scale filter. However, the OFB cannot be easily applied because the models are in large quantity. Therefore the ABFB model was built by simplifying the OFB model. The experiments show that ABFB's performance is still better than Bark-scale filter. It is even better than ERB-scale filter. Among the FIR, Laguerre, WFBs and ABFB models, the ABFB has the best noise robust performance, which also demonstrates that speech signal itself is important for filter design.
     (4)The number of filter bands corresponds to how precise the analysis of signal. FIR, Laguerre, WFBs and ABFB filters adopted 16 bands and used 16 frequency bins for ZCPA extraction. When using Gammatone (GT) filter for extracting ZCPA,K channels were designed and the corresponding number frequency bins were used to accept amplitude information. The experiments show that 18-channel GT gets the better recognition results than any other channels of GT filter.
     (5)Applying FIR, GT, Laguerre and WFBs filters to the variability corpus in speaker independent recognition task, the experiments show that with the improvement of filter property, the variability robust has also been improved. Moreover, compared to MFCC, ZCPA is more robust with Support Vector Machine (SVM) than with Hidden Markov Model (HMM).
引文
[1]Bartholomew J C, Miller G E. Voice Control for Noisy Industrial Environments[C]. Engineering in Medicine and Biology Society Annual Conference,1988, 3:1509-1510.
    [2]Sepe R B Jr, Pace J F. Voice Actuation with Context Learning for Intelligent Machine Control [C]. Industry Applications Conference,1999,1:295-299.
    [3]杨行峻,迟惠生.语音信号数字处理[M].北京:电子工业出版社,1995.
    [4]L.R.拉宾纳,R.W.谢弗.语音信号数字处理[M].北京:科学出版社,1983.
    [5]赵力.语音信号处理(第2版)[M].北京:机械工业出版社,2009.
    [6]Furui S.50 Years of Progress in Speech and Speaker Recognition Research [J]. ECTI Transactions on Computer and Information Technology,2005, l(2):64-74.
    [7]Davis K H, Biddulph R, Balashek S. Automatic Recognition of Spoken Digits [J]. The Journal of the Acoustical Society of America,1952,24(6):637-642.
    [8]Olson H F, Belar H. Phonetic Typewriter [J]. The Journal of the Acoustical Society of America,1956,28(6):1072-1081.
    [9]Forgie J W, Forgie C D. Results Obtained from A Vowel Recognition Computer Program [J]. The Journal of the Acoustical Society of America,1959, 31(11):1480-1489.
    [10]Vintsyuk T K. Speech Discrimination by Dynamic Programming [J]. Kibernetika, 1968,4(1):81-88.
    [11]Durbin J. Estimation of Parameters in Time-Series Regression Models [J]. Journal of the Royal Statistical Society. Series B (Methodological),1960,22(1):139-153.
    [12]Makhoul J. Linear Prediction:A Tutorial Review [J]. Proceedings of the IEEE, 1975,63(4):561-580.
    [13]Rabiner L, Juang B H. Fundamentals of Speech Recognition [M]. United States: Prentice-Hall,1993.
    [14]Haykin S. Advances in Spectrum Analysis and Array Processing [M]. United States: Prentice-Hall,1991.
    [15]Klatt D H. Review of the ARPA Speech Understanding Project [J]. The Journal of the Acoustical Society of America,1977,62(6):1345-1366.
    [16]Lowerre B. The HARPY Speech Understanding System [M]. Massachusetts: Morgan Kaufmann Publishers Inc.,1990:576-586.
    [17]Itakura F. Minimum Prediction Residual Principle Applied to Speech Recognition [J]. IEEE Transaction on Acoustic, Speech Signal Processing,1975,23(1):67-72.
    [18]Baum L E. An Inequality and Associated Maximization Technique in Statistical Estimation for Probabilistic Functions of Markov Processes [J]. Inequalities,1972, 3(1):1-8.
    [19]Duda R O, Hart P E. Pattern Classification and Scene Analysis [M]. New York: Wiley,1973.
    [20]Waibel A, Hanazawa T, Hinton G, et al. Phoneme Recognition using Time-delay Neural Networks [J]. IEEE Transactions on Acoustics, Speech, and Signal Processing (see also IEEE Transactions on Signal Processing),1989,37(3): 328-339.
    [21]Lippmann R P. Review of Neural Networks for Speech Recognition [J]. Neural Computation,1990,1(1):1-38.
    [22]Huang X, Acero A, Alleva F, et al. Microsoft Windows Highly Intelligent Speech Recognizer:Whisper [C]. Acoustics, Speech, and Signal Processing,1995 1:93-96.
    [23]Young S, et al. The HTK Book (Revised for HTK version 3.4.1) [M]. Cambridge: Cambridge University,2009.
    [24]徐金甫.基于特征提取的抗噪声语音识别研究[D].广东:华南理工大学,2000.
    [25]Chen J, Pailwal K K, Nakamura S. Cepstrum Derived from Differentiated Power Spectrum for Robust Speech Recognition [J]. Speech Communication,2003, 41(2):469-484.
    [26]宁更新.抗噪声语音识别新技术的研究[D].广东:华南理工大学,2006.
    [27]刘敬伟,肖熙.实用环境语音识别鲁棒性技术研究与展望[J].计算机工程与应用,2006,42(24):7-12.
    [28]孙暐.听觉特性与鲁棒语音识别算法研究[D].南京:东南大学,2006.
    [29]Srinvasan S, Wang DL Transforming Binary Uncertainties for Robust Speech Recognition [J]. IEEE Transactions on Audio, Speech and Language Processing, 2007,15(7):2130-2140.
    [30]Lockwood P, Boudy J. Experiments with a Nonlinear Spectral Subtractor(NSS), Hidden Markov Models and Projection, for Robust Speech Recognition in Cars [J]. Speech Communication,1993,11(2):215-228.
    [31]Das S, Bakis R, Nadas A, Pichney M. Influence of Background Noise and Microphone on the Performance of the IBM TANGORA Speech Recognition System [C]. Proceedings of the Acoustics, Speech, and Signal Processing,1993,2: 71-74.
    [32]Han J, Han M, Park G.-B. Relative Mel-Frequency Cepstral Coefficients Compensation for Robust Telephone Speech Recognition [J].5th European Conference on Speech Communication and Technology,1997,3:1531-1534.
    [33]李虎生,刘加,刘润生.语音识别说话人自适应研究现状及发展趋势[J].电子学报,2003,31(1):103-108.
    [34]Lee C H, Lin C H, Juang B. A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models [J]. IEEE Transaction On Signal Processing,1991,39(4):806-814.
    [35]Huo Q, Lee C H. Robust Speech Recognition Based on Adaptive Classification and Decision Strategies [J]. Speech Communication,2001,34:175-194.
    [36]张刚,张雪英,马建芬.语音处理与编码[M].北京:兵器工业出版社,2000.
    [37]吴玺宏,迟惠生,王楚.基于听觉外围模型的语音信号听觉神经表示[J].生物物理学报,1997,13(2):213-220.
    [38]王炳锡,屈单,彭煊.实用语音识别基础[M].北京:机械工业出版社,2005.
    [39]James O.Pickles. An Introduction to the Physiology of Hearing [M]. New York:Academic Press,1992.
    [40]Flanagan JL. Speech Analysis and Perception [M]. Berlin:Springer-Verlag,1965.
    [41]冯流保.基于听觉掩蔽效应的小波包语音增强方法研究[D].合肥:中国科学技 术大学.2009.
    [42]易克初,田斌,付强.语音信号处理[M].北京:国防工出版社,2000.
    [43]Delgutte B, Kiang N Y S. Speech Coding in the Auditory Nerve [J]. Journal of Acoustical Society of America,1984,75:866-878.
    [44]孙颖.噪音环境下语音特征提取前端处理研究[D].山西:太原理工大学,2007.
    [45]Ji M, Jie L. A Posterior Union Model for Robust Speech Recognition [J]. Electronics Letters,2003,39:162-163.
    [46]韩纪庆,张磊,郑铁然.语音信号处理[M].北京:清华大学出版社,2004:48-49.
    [47]ATAL B S. Effectiveness of Linear Prediction Characteristics of the Speech Wave for Automatic Speaker Identification and Verification [J]. Journal of Acoustical Society of America,1974,55(6):1304-1312.
    [48]Hermansky H, Hanson B A, Wakita H. Perceptually Based Linear Predictive Analysis of Speech [C]. Proceedings of the Acoustics, Speech, and Signal Processing,1985:509-512.
    [49]DAVIS S B, MERMELSTEIN P. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences [J]. IEEE Transaction on Acoustics, Speech and Signal Processing,1980,28(4):357-366.
    [50]Kim D S, Lee S Y, Kil R M. Auditory Processing of Speech Signal for Robust Speech Recognition Real-Word Noisy Environments [J]. IEEE Transaction on Speech and Audio Procession,1999,7(1):55-69.
    [51]Ghitza O. Auditory Models and Human Performance in Tasks Related to Speech Coding and Speech Recognition [J]. IEEE Transactions on Speech and Audio Processing,1994,2(1):113-131.
    [52]Bahoura M, Rouat J. Wavelet Speech Enhancement Based on the Teager Energy Operator [J]. IEEE Signal Processing Letters,2001,8:10-12.
    [53]Makeig S. Independent Component Analysis in Electroencephalographic Data [C]. Advances in Neural Information Processing System,1996,145-148.
    [54]朱民,雄闻新.计算机语音技术[M].北京:北京航空航天大学出版社,2002:89-93.
    [55]胡光锐.语音处理与识别.上海:上海科学技术文献出版社,1994,43-45.
    [56]崔国辉.说话人识别方法与策略的研究[D].山东:山东大学,2005.
    [57]李泽,崔宣,马雨廷,陈俊宇. MFCC和LPCC特征参数在说话人识别中的研究[J]。河南工程学院学报,2010,22(2):51-55.
    [58]马跃,杨磊,王巍.嵌入式语音识别系统的设计与实现[J].现代电子技术,2010,33(5):121-124.
    [59]李永恒,严家明,揭峰.线性预测分析在连接词语音识别中的研究[J].计算机仿真.2010,27(11):340-344.
    [60]Hermansky H. Perceptual Linear Predictive (PLP) Analysis of Speech [J]. Journal Acoustical Society of America.1990,87(4):1738-1752.
    [61]Honig F, Stemmer G, Hacker C, Brugnara F. Revising Perceptual Linear Prediction (PLP) [C].9th European Conference on Speech Communication and Technology, 2005:2997-3000.
    [62]Octavian C O, Inge G. Features Extraction and Training Strategies in Continuous Speech Recognition for Romanian Language [C].3rd International Conference on Informatics in Control, Automation and Robotics,2006:114-121.
    [63]Meyer B T, Kollmeier B. Complementarity of MFCC, PLP and Gabor Features in the Presence of Speech-intrinsic Variabilities [C]. Proceedings of the 10th Annual Conference of the International Speech Communication Association,2009: 2755-2758.
    [64]Mishra A N, Shrotriya M C, Sharan S N. Comparative Wavelet, PLP, and LPC Speech Recognition Techniques on the Hindi Speech Digits Database[C].2nd International Conference on Digital Image Processing,2010,7546(34): 754634-754634-6.
    [65]Hung J W. Optimization of Filter-bank to Improve the Extraction of MFCC Features in Speech Recognition [C]. International Symposium on Intelligent Multimedia, Video and Speech Processing,2004:675-678.
    [66]Darch J, Milner B, Xu S, Vaseghi S, Qin Yan. Predicting Formant Frequencies from MFCC Vectors [C]. International Conference on Acoustics, Speech and Signal Processing,2005.1941-1944.
    [67]Kopparapu S K, Laxminarayana M. Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech [C].10th International Conference on Information Sciences, Signal Processing and their Applications,2010:121-124.
    [68]Wu Z, Cao Z. Improved MFCC-based Feature for Robust Speaker Identification [J]. Tsinghua Science and Technology,2005,10(2):158-161.
    [69]Darch J, Milner B, Vaseghi S. Formant Frequency Prediction from MFCC Vectors in Noisy Environments [C].9th European Conference on Speech Communication and Technology,2005:1129-1132.
    [70]Zheng N, Li, X, Cao H, etc. Deriving MFCC Parameters from the Dynamic Spectrum for Robust Speech Recognition [C].6th International Symposium on Chinese Spoken Language Processing,2008:85-88.
    [71]Wang L, Minami K, Yamamoto K, etc. Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions [J]. IEICE Transactions on Information and Systems,2010, E93-D(9):2397-2406.
    [72]王让定,柴佩琪.语音倒谱特征的研究[J].计算机工程,2003,29(13):31-33.
    [73]Ghulam M, Fukuda T, Horikawa, J, Nitta T. Pitch-Synchronous ZCPA (PS-ZCPA)-Based Feature Extraction with Auditory Masking [J]. Acoustics, Speech and Signal Processing,2005:517-520.
    [74]梁五洲,张雪英.基于加权组合过零峰值幅度特征的抗噪语音识别[J].太原理工大学学报,2006,37(1):84-86.
    [75]李允公,张金萍,戴丽,等.基于听觉模型ZCPA的故障诊断特征提取方法研究[J].中国机械工程,2009,20(24):2988-2992.
    [76]Gajic B, Paliwal K K. Robust Speech Recognition Using Features Based on Zero Crossing with Peak Amplitudes [C].Acoustics, Speech, and Signal Processing, 2003,1:I64-I67.
    [77]韩力群.人工神经网络教程[M].北京:北京邮电大学出版社.2006:127-144.
    [78]郭利斌.基于RBF网络的语音增强研究[D].天津:天津大学.2007.
    [79]魏海坤.神经网络结构设计的理论与方法[M].北京:国防工业出版社,2005:46-49.
    [80]侯雪梅.基于改进LP倒谱特征和神经网络的语音识别算法研究[D].山西:太原理工大学.2005.
    [81]荆嘉敏,刘加,刘润生.基于HMM的语音识别技术在嵌入式系统中的应用[J].电子技术应用,2003,29(10):12-14.
    [82]刘维亭,朱志宇.基于小波网络和HMM的语音识别方法[J].电声技术,2004,11:56-59.
    [83]张丽,王福忠,张涛.基于小波分析和HMM的语音识别模型建立与仿真[J].2007,9:72-75.
    [84]张俊,危韧勇.基于连续HMM语音识别系统的构建与分析[J].计算机与现代化.2009,10:169-171.
    [85]Rabiner L R. A Tutorial on Hidden Markov Model and Selected Applications in Speech Recognition. Proceedings of the IEEE,1989,77(22):257-285.
    [86]Tsuboka Eiichi. Deriving Multiplication-Type FVQ/HMM from a New Information-Source Model and a Viewpoint of Information Theory, and the Rlation of the Model to Discrete and Continuous HMMs. Systems and Computers,2004, 35(2):59-65.
    [87]Misra Hemant, Bourlard Herve. New Entropy Based Combination Rules in HMM/ANN Multi-steam ASR. Acoustics, Speech and Signal Processing-Proceedings,2003,2:741-744.
    [88]Liu Wentao, Yin Baocai. Audio to Vsual Signal Mappings with HMM. Acoustics, Speech and Signal Processing-Proceedings.2004,5:885-888.
    [89]谢湘,匡镜明.支持向量机在语音识别中的应用研究[A].现代通信理论与信号处理进展——2003年通信理论与信号处理年会论文集[C],2003.
    [90]王平.用遗传算法改进HMM的语音识别算法研究[D].山西:太原理工大学,2007.
    [91]Vapnik V, Lerner A. Pattern Recognition Using Generalized Portrait Method [J]. Automation and Remote Control,1963,24(6):774-780.
    [92]Vapnik V. Estimation of Dependence Based on Empirical Data [M]. Berlin: Springer-Verlag,1982.
    [93]许建华,张学工等译.统计学习理论[M].北京:电子工业出版社,2004.
    [94]邓乃杨,田英杰.数据挖掘中的新方法—支持向量机[M].北京:科学出版社,2004.
    [95]唐发明.基于统计学习理论的支持向量机算法研究[D].武汉:华中科技大学,2005.
    [96]Scholkopf B, Burges C, Vapnik V. Incorporating Invariances in Support Vector Learning Machines [C]. Artificial Neural Networks,1996,11(12):47-52.
    [97]Scholkopf B. Comparing Support Vector Machines with Gaussian Kernels to Radial Basis Function Classifier [J]. IEEE Transactions on Signal Processing,1997, 45(11):2758-2765.
    [98]Smola A. Generalization Bounds for Convex Combinations of Kernel Functions [R]. Alex J. Smola, GMD. NeuroCOLT2 Technical Report Series, NC2-TR-1998-020, July,1998.
    [99]Bennett K P, Derniriz A. Semi-Supervised Support Vector Machines[C]. Proceedings of Neural Information Processing Systems.1998.
    [100]Weston J. Extensions to the Support Vector Method [D]. PhD Thesis, Royal Holloway University of London,1999.
    [101]Scholkopf B, Smola A, Vapnik V. Prior Knowledge in Support Vector Kernels [C]. Advances in Neural Information Processing Systems 10,1998:640-646.
    [102]Burges C J C, Scholkopf B. Improving the Accuracy and Speed of Support Vector Machines [C]. Neural Information Processing Systems 9,1997:375-381.
    [103]Burges C J C. A Tutorial on Support Vector Machines for Pattern Recognition [J]. Data Mining and Knowledge Discovery,1998,2(2):121-167.
    [104]Burges C J C. Geometry and Invariances in Kernel Based Methods [C]. Advance in Kernel Methods-Support Vector Learning,1999:89-116.
    [105]Scholkopf B, Smola A, Muller K R. Kernel Principal Component Analysis [C].7th International Conference on Artificial Neural Networks,1997:583-589.
    [106]Smola A. Learning with Kernels [D]. PhD Thesis, Technische Universitat Berlin, 1998.
    [107]白静.支持向量机算法在语音识别中的应用研究[D].山西:太原理工大学.2010.
    [108]贺双赤.用Laguerre滤波器实现多径衰落信道自适应均衡[J].电讯技术.2004,44(1):82-86.
    [109]吉芳芳. Laguerre滤波器在语音识别前端处理中的应用研究[D].山西:太原理工大学.2007.
    [110]黄丽霞,张雪英. Laguerre滤波器在抗噪语音识别特征提取中的应用[J].计算机工程与应用.2008,44(18):21-24.
    [111]Moore B C J, Glasberg B R. Suggested Formulae for Calculating Auditory-filter Bandwidths and Excitation Patterns [J]. Journal of the Acoustical Society of America,1983,74:750-753.
    [112]焦志平.改进的ZCPA语音识别特征提取算法研究[D],山西:太原理工大学.2005.
    [113]刘纪红,孙玉舸,李景华等.数字信号处理原理与实践[M].北京:国防工业出版社.2009.
    [114]Burrus C S, Thomas W. Parks, James F. Potts. DFT/FFT and Convolution Algorithms and Implementation [M]. United States:John Wiley & Sons.1985.
    [115]Evangelista G, Cavaliere S. Frequency-Warped Filter Banks and Wavelet Transforms:A Discrete-Time Approach via Laguerre Expansion [J]. IEEE Transactions on Signal Processing,1998,46(10):2638-2650.
    [116]Evangelista G, Cavaliere S. Discrete Frequency Warped Wavelets:Theory and Applications [J]. IEEE Transactions on Signal Processing,1998,46(4):874-885.
    [117]Gottlieb M J. Concerning Some Polynomials Orthogonal on a Finite or Enumerable Set of Points [J]. American Journal of Mathematics.1938,60:453-458.
    [118]Broome P W. Discrete Orthonormal Sequences [J]. Journal of the Association for Computing Machinery.1965,12(2):151-168.
    [119]Masnadi-Shirazi MA, Ahmed N. Optimum Laguerre Networks for a Class of Discrete-time Systems [J]. IEEE Transaction on Signal Processing,1991,39(9): 2104-2108.
    [120]Silva O E. On the Determination of the Optimal Pole Position of Laguerre Filters [J]. IEEE Transaction on Signal Processing,1995,4(9):2079-2087.
    [121]Masnadi-Shirazi M, Aleshama M. Laguerre Discrete-time Filter Design [J].Computers and Electrical Engineering,2003,29:173-192.
    [122]白静,张雪英,侯雪梅.基于RBF神经网络的抗噪语音识别[J].计算机工程与应用.2007,43(22):28-30.
    [123]Guo J J, Luh P B. Selecting Input Factors for Clusters of Gaussian Radial Basis Function Networks to Improve Market Clearing Price Prediction [J].IEEE Transactions on Power Systems,2003,18(2):665-672.
    [124]陈世雄,宫琴.常见的听觉滤波器[J].北京生物医学工程.2008,27(1):94-100.
    [125]Cusack R, Carlyon RP. Perceptual Asymmetries in Audition [J]. Journal of Experimental Psychology:Human Perception and Performance.2003, 29(1):713-725.
    [126]Hermes DJ, van Gestel JC. The Frequency Scale of Speech Intonation [J]. Journal of the Acoustical Society of America.1991,90(1):97-102.
    [127]Stevens S S, Volkman J, Newman E B. A Scale for the Measurement of the Psychological Magnitude Pitch [J]. Journal of the Acoustical Society of America. 1937,8:185-190.
    [128]Stevens S S, Volkman J. The Relation to Pitch and Frequency:a Revised Scale [J]. The American Journal of Psychology,1940,53:329-353.
    [129]Fletcher H. Auditory Patterns [J]. Reviews of Modern Physics,1940,12:47-65.
    [130]Zwicker E, Flottorp G., Stevens S S. Critical Band Width in Loudness Summation [J]. Journal of Acoustical Society of America,1957,29:548-557.
    [131]Zwicker E. Subdivision of the Audible Frequency Range into Critical Bands (Frequenzgruppen) [J]. Journal of the Acoustical Society of America,1961,33:248.
    [132]Zwicker E, Terhardt E. Analytical Expressions for Critical Band Rate and Critical Bandwidth as a Function of Frequency [J]. Journal of the Acoustical Society of America,1980,68:1523-1525.
    [133]Traunmuller H. Analytical Expressions for the Tonotopic Sensory Scale [J]. Journal of the Acoustical Society of America,1990,88:97-100.
    [134]Zwicker E, Fastl H. Psychoacoustics:Facts and Models (2nd Edition) [M]. Berlin: Springer,1999.
    [135]Patterson R D. Auditory Filter Shapes Derived with Noise Stimuli [J]. Journal of the Acoustical Society of America,1976,59:640-654.
    [136]Glasberg B R, Moore B C J. Derivation of Auditory Filter Shapes from Notched-noise Data [J]. Hearing Research,1990,47(1-2):103-138.
    [137]Painter T, Spanias A. Perceptual Coding of Digital Audio. Proceedings of the IEEE 2000,88(4):451-513.
    [138]Thomas Fillon, Jacques Prado. Evaluation of an ERB Frequency Scale Noise Reduction for Hearing Aids:A Comparative Study [J]. Speech Communication, 2003,39(1-2):23-32.
    [139]Churchill RV. Complex Variables and Applications [M]. New York:McGraw-Hill, 1960.
    [140]Julius O Smith Ⅲ, Jonathan S Abel. Bark and ERB Bilinear Transforms [J]. IEEE Transactions on Speech and Audio Processing,1999,7(6):697-7058.
    [141]Patterson R D, Allerhand M, Giguere C. Time-domain Modeling of Peripheral Auditory Processing:A Modular Architecture and Software Platform [J]. Journal of the Acoustical Society of America,1995,98:1890-1894.
    [142]Slaney M. An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank [R]. Tech. Rep.35, Apple Computer,1993, available online at http://web.interval.com/malcolm/.
    [143]Umemoto T, Fujisawa S, Yoshida T. Constant Q-value Filter Banks with Spectra Analysis Using LMS Algorithm [C].37th SICE Annual Conference,1998, 29-31:1019-1024.
    [144]Irino T, Kawahara H. Signal Reconstruction from Modified Auditory Wavelet Transform. IEEE Transaction on Signal Processing,1993,41:3549-3554.
    [145]Patterson RD, Nimmo-Smith I, Weber DL, Milroy R. The Deterioration of Hearing with Age:Frequency Selectivity, the Critical Ratio, the Audiogram, and Speech Threshold [J]. Journal of the Acoustical Society of America,1982,72:1788-1803.
    [146]Irino T, Patterson RD. A Time-Domain, Level-Dependent Auditory Filter:The Gammachirp [J]. Journal of the Acoustical Society of America,1997,101:412-419.
    [147]Irino T, Unoki M. A Time-Varying Analysis/Synthesis Auditory Filterbank Using the Gammachirp. Acoustics, Speech, and Signal Processing,1998, 6(12-15):3653-3646.
    [148]Hunt M J. Spectral Signal Processing for ASR [C]. IEEE Workshop on Automatic Speech Recognition and Understanding.1999.
    [149]Sporer T. Brandenburg K. Constraints of Filter Banks Used for Perceptual Measurement [J]. Journal of the Audio Engineering Society,1995,43(3):107-116.
    [150]Seneff S. A Joint Synchrony/Mean-rate Model of Auditory Processing [J]. Journal of Phonetics,1988,85 (1):55-76.
    [151]Lyon R F, Mead C. An Analog Electronic Cochlea [J]. IEEE Transactions on Acoustics, Speech and Signal Processing,1988,36 (7):1119-1134.
    [152]John Holland. Adaptation in Natural and Artificial Systems [M]. Massachusetts:The MIT Press,1992.
    [153]周明,孙树栋.遗传算法原理及应用.北京:国防工业出版社,1999.
    [154]Gosselin L, Tye-Gingas M, Mathieu-Potvin F. Review of Utilization of Genetic Algorithms in Heat Transfer Problems [J]. International Journal of Heat and Mass Transfer,2009,52(9-10):2169-2188.
    [155]王舟,谢锦辉.非特定人普通话孤立数字语音识别系统[J].华中理工大学学报,1994,22(11):36-39.
    [156]Huang HC, Pan JS, Lu ZM, etc. Vector Quantization Based on Genetic Simulated Annealing [J]. Signal Processing,2001,81 (7):1513-1523.
    [157]Li X, Cao G, Zhu XJ,etc. Identification and Analysis Based on Genetic Algorithm for Proton Exchange Membrane Fuel Eell Stack [J]. Journal of Central South University of Technology,2006,13(4):428-431.
    [158]Nimpitiwan N, Chaiyabut N C. Centralized Control of System Voltage/Reactive Power Using Genetic Algorithm [C]. International Conference on Intelligent Systems Applications to Power Systems,2007,83-88.
    [159]Ji Y, Sun J, Wei G. Decentralized Automatic Speed Control System of Controlled Retarder Based on Genetic Algorithm [C]. ICROS-SICE International Joint Conference,2009:4859-4862.
    [160]Yu S, Kuang S. Fuzzy Adaptive Genetic Algorithm Based on Auto-Regulating Fuzzy Rules [J]. Journal of Central South University of Technology,2010, 17(1):123-128.
    [161]Prakotpol D, Srinophakun T. GAPinch:Genetic Algorithm Toolbox for Water Pinch Technology [J]. Chemical Engineering and Processing,2004,43(2):203-217.
    [162]Patterson R D, Moore B C J. Auditory Filters and Excitation Patterns as Representations of Frequency Resolution [C]. Frequency Selectivity in Hearing. 1986:123-177.
    [163]Johannesma P I M. The Pre-response Stimulus Ensemble of Neurons in the Cochlear Nucleus [C]. Proceedings of the Symposium of Hearing Theory,1972:58-69.
    [164]陈世雄,宫琴,金慧君.用Gammatone滤波器组仿真人耳基底膜的特性[J].清华大学学报(自然科学版).2008,48(6):1044-1048.
    [165]Moreno P J. Speech Recognition in Noisy Environments [D]. Pennsylvania: Carnegie Mellon University,1996.
    [166]姚文冰,姚大任,韩涛.稳健语音识别技术发展现状及展望[J].信号处理.2001,17(6):484-497.
    [167]Cernak M, Wellekens C. Emotional Aspects of Intrinsic Speech Variabilities in Automatic Speech [C].11th International Conference Speech and Computer.2006: 405-408.
    [168]Huang L, Zhang X, Evangelista G. Speaker Independent Recognition on OLLO French Corpus by Using Different Features [C]. The First International Conference on Pervasive Computing, Signal Processing and Applications,2010:332-335.
    [169]Wesker T, Meyer B, Wagener K, etc. Oldenburg Logatome Speech Corpus (OLLO) for Speech Recognition Experiments with Humans and Machines[C]. European Conference on Speech Communication and Technology,2005:1273-1276.
    [170]Meyer B, Wesker T, Brand T, etc. A Human-Machine Comparison in Speech Recognition Based on a Logatome Corpus [C]. Workshop on Speech Recognition and Intrinsic Variation,2006:95-100.
    [171]Dubno JR, Levitt H. Predicting Consonant Confusions from Acoustic Analysis [J]. Journal of The Acoustical Society of America,1981,69 (1):249-261.
    [172]Gelfand SA, Piper N, Silman S. Consonant Recognition in Quiet as a Function of Aging Among Normal Hearing Subjects [J]. Journal of The Acoustical Society of America,1985,78 (4):1198-1206.
    [173]Huang L, Zhang X, Liu X. Different Channels in Gammatone Filter Bank Based on ZCPA for Speaker-Independent Recognition Task [C]. International Asia Conference on Optical Instrument and Measurement,2010:74-77.
    [174]Cusack R, Carlyon RP. Perceptual Asymmetries in Audition [J]. Perceptual Asymmetries in Audition,2003,29(3):713-725.
    [175]Zhang X, Huang L, Gianpaolo E. Warped Filter Banks Used in Noisy Speech Recognition [C]. International Conference on Innovative Computing, Information and Control,2009:1385-1388.
    [176]Stadermann J, Rigoll G. Flexible Feature Extraction and HMM Design for a Hybrid Distributed Speech Recognition System in Noisy Environments [J]. Acoustics, Speech, and Signal Processing,2003,1:332-335.
    [177]王峰.美尔音级轮廓特征在音乐和弦识别算法中的应用研究[D].山西:太原理工大学,2010.
    [178]雷震春,支持向量机在说话人识别中的应用研究[D].浙江:浙江大学,2006.
    [179]Gao Y, Zhang X, Huang L. Application of Support Vector Machine for Speech Recognition with Differential RASTA-HPLP Feature [C].3rd International Conference on Computer and Electrical Engineering, China,2010:607-610.
    [180]白静,张雪英.基于支持向量机的抗噪语音识别[J].太原理工大学学报.2009,40(1):11-14.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700