鲁棒语音识别技术的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
鲁棒语音识别技术是语音识别系统从实验室理论走向实际应用的关键性技术之一,其研究的主要目的是解决训练环境与应用环境之间失配所造成的识别率下降问题,本文在总结和分析现有多种鲁棒性识别算法的基础上,主要针对加性噪声的影响,在语音增强、基音提取、端点检测、鲁棒特征参数的选择等方面进行了深入地研究和探讨。
     采用共轭梯度递推求解带噪语音三阶累积量的修正Yule-Walker方程,以此估计纯净语音的生成模型参数和激励增益,提出了一种基于高阶累积量的卡尔曼滤波语音增强算法,增强后的语音失真度小,适合于识别系统的前端预处理。根据信号不连续性在小波变换不同分辨率下的可传递性,结合循环平均幅度差函数,提出了一种基于小波变换的循环平均幅度差基音提取算法(WCAMDF);同时,研究了小波多阈值估计在语音增强中的应用问题,基于小波对噪声的强抑制性,结合短时能量和谱熵函数将小波在基音提取及语音增强中的结果分别用于端点检测,给出了用于噪声环境中端点检测的两种鲁棒性特征。最后从特征空间研究了鲁棒语音识别中的特征参数提取问题,提出了三种基于MFCC的鲁棒特征参数改进策略:TEMFCC、LDA-TEMFCC和HOC-LPC-MFCC。在不同噪声环境下,对各种鲁棒识别算法进行仿真实验,成功实现了对加性噪声的抑制,验证了新算法的优良鲁棒性。
Speech signal is the most convenient and effective intercommunication mode. With the rapid development and wide application of computer technology, people hope more and more to realize the natural man-machine communication by speech. Automatic speech recognition (ASR) emerges as the times require and has achieved quite remarkable progress in recent years. Now it is being applied to the real-world applications from the laboratory research theory and may be the leading user interface for the followon operating system and application program.
     Most speech recognition systems are designed for clean speech and relatively easy to accomplish fairly complex recognition tasks with high accuracy in controlled quiet laboratory environments. However, when a ASR system is used in a real-life situation, there is bound to be a mismatch between training and testing caused by background noise. The performance of systems deteriorates severely, which is the most major obstacle to the commercial use of speech recognition technology. So, how to increase the robustness of ASR is significant and necessary. The aim of robust speech recognition is to alleviate the effect of mismatch and to achieve good recognition performance in noisy conditions. Various methods have been studied in this area, which can be broadly classified into 3 categories– speech enhancement in signal space, robust feature extraction in feature space and speech model compensation in model space. In this paper, we focus on the first two problems i.e. improving the speech recognition accuracy in signal space and feature space using some new approaches under additive background noise. The main attributes are listed as follows.
     1、Speech enhancement aims at extracting clean speech from noisy signal while suppressing noise, minimizing distortion of speech and enhancing speech intelligibility. For robust speech recognition, speech enhancement often exists as a preprocessor and produces an almost clean speech signal to a ASR system. Consequently, it is not necessary to make any changes in the recognition system to make it robust. Currently, most enhancement algorithms present important limitations, as they only focus on one given noise. With noise diversification, the techniques are becoming more and more complex. Moreover, many algorithms aim at improving intelligibility in mind, the enhanced speech signal may lose some useful information, which can degrade the performance of ASR system. To cope with these problems, in this paper, a Kalman-filter speech enhancement algorithm based on higher-order cumulants is proposed.
     The performance of Kalman-filter algorithm is mostly up to the precision of clean speech LPC parameters and the impulse gain. Considering the good robustness of higer-order cumulants to Gaussian noise, the LPC parameters of clean signal can be estimated by solving the modified Yule-Walker (MYW) equation of third-order cumulant of noisy signal. At the same time, the impulse gain needed is proposed to be approximately obtained by the estimated model parameters and the noise variance.
     Based on three objective measures-the power spectrogram, time domain waveform and SNR, the enhancement performance is evaluated respectively under nine types of noise with different SNR conditions. Simulation results show that this algorithm is simple, effective and robust in the presence of very complicated noise. There are significant improvements both in SNR and in apperception quality, besides the distortion of enhanced speech is very small. Therefore, this algorithm is especially adapted to robust speech recognition preprocessing as well. In isolated word speech recognition system, experiments show that this cascading can improve recognition accuracy at low SNR levels.
     2、We propose an adaptive recursive estimation algorithm of AR model parameters based on conjugate gradient when solving the third-order cumulant MYW equation. By contrast with the estimation errors of noisy AR sequence using RIV, direct inversion and LMS separately, this algorithm has the most rapid convergence and the highest accuracy without a mass of matrix inversion operation. At the same time, reconstructing the power spectrum of noisy sine sequence and speech signal by use of parameters spectral estimation algorithm, the model parameters estimated by conjugate gradient have good performance in envelope fitting, formant acutance and resolution even if the SNR is very low.
     3、Pitch detection is one of the most difficult technologies in speech signal processing under noisy conditions. According to the transmissibility of signal discontinuity under different resolution of wavelet transform, a new method for pitch detection on the basis of wavelet transform and circular AMDF (WCAMDF) is presented in this thesis. The method overcomes the disadvantages of low accuracy, high complexity and lack of robustness in many actual pitch detection algorithms. Simulation results indicate that the proposed algorithm possesses better pitch detection precision for speech signals under strong background noise, low calculation complexity, high resolution, and capability for real time implementation.
     4、The wavelet transform is adaptive to signal. This paper researches the multi threshold estimation of regular signal based on wavelet transform and its application in speech enhancement area. The noisy speech signal can be denoised by using of wavelet. We point out that the SURE translate soft threshold is the most adaptive to speech signal from theory analysis and experiments, and the enhancement performance is perfect. The evaluations are performed on the power spectrogram, time domain waveform and SNR, it is shown that this method is effective in noisy conditions.
     5、The VAD technology plays a very important role in ASR systems. The correct endpoint detection can reduce the computational cost and shorten the run time. A major cause for errors in speech recognition is the incorrect detection of the beginning and the ending boundaries of the test. So, the reliable, accurate, real-time, adaptive and robust VAD technology is needed in every recognition system. Based on wavelet transform, two novel strategies are proposed for accurate and robust endpoint detection under noisy environments in this paper.
     1) Endpoint detection algorithm based on WCAMDF pitch extraction. WCAMDF can extract exact pitch information against variations of noisy environments. Therefore, by use of the magnitude envelope of CAMDF during the process of pitch extraction, the proposed algorithm is verified that improved robustness is achieved in both detection accuracy and recognition performance at low SNR levels, with an average recognition error rate reduction of more than 21%.
     2) Endpoint detection algorithm based on energy-entropy of wavelet. It is found that the detection using basic energy and spectral entropy becomes difficult and inaccurate when speech signals are contaminated by colored noise, and the main specificity of wavelet transform is that the residual noises in enhanced speech signals are almost white. As a consequence, we try to couple them together closely, instead of using the energy-entropy feature of initial noisy signals, the feature are computed after wavelet transform. This modification outperforms basic energy-entropy, improves the discriminability between speech and noise so that it becomes easier to set threshold.
     The two endpoint detection approaches can go along with pitch extraction or speech enhancement simultaneously. They are realtime, simple, easy to realize, and have small model complexity, which is very important especially in large vocabulary ASR systems where processing power and memory available are limited. 6、In real world, robust features extraction is one of the most crucial issues in the field of ASR applications. It aims at finding succinct, salient, and representative relevant characteristics from noisy speech utterance to discriminate. The selection of robust features is highly desired in order to offer acceptable recognition performance under various noisy environments. Mel-frequency cepstral coefficients (MFCC) have been well accepted as a good choice for speech features with reasonable robustness, and many advanced techniques have been developed based on them. Three new improved methods are proposed based on MFCC in this thesis.
     Teager energy-Entropy MFCC (TEMFCC). Teager energy-entropy features are commonly used for locating the endpoints of an utterance. When integrated with MFCC, it is shown to offer an average accuracy increase of 10% as compared to MFCC in baseline system. The selection of Teager energy-entropy increases the dimension of feature vectors. In order to overcome this shortcoming, we can perform the classification and dimensionality reduction of the feature vectors by use of Linear Discriminant Analysis (LDA) technology. LDA-TEMFCC robust features, 20 dimensions, yields 6% increase of recognition performance by contrast with 24 dimensions MFCC in baseline. The MFCC, directly derived from power spectrum of noisy speech signals, show excessive sensitivity to external additive colored noise and generally result in degradation of recognition performance in noisy conditions. By virtue of powerful Gaussian noise restraint property of HOC, HOC-LPC-MFCC feature vectors are developed. The speech power spectrum is reconstructed by the model parameters estimated from third-order cumulant of noisy signal, and MFCC is derived from the reconstruction. The experimental results show that significant noise robustness can be achieved by the use of the proposed features in all conditions as compared to the pure MFCC.
引文
[1] 杨行峻, 迟惠生. 语音信号数字处理[M]. 北京:电子工业出版社, 1998.
    [2] 张雄伟, 陈亮, 杨吉斌. 现代语音处理技术及应用[M]. 北京: 机械工业出版社, 2003.
    [3] 韩纪庆, 张磊, 郑铁然. 语音信号处理[M]. 北京: 清华大学出版社, 2004.
    [4] 赵力. 语音信号处理[M]. 北京: 机械工业出版社, 2003.
    [5] 蔡莲红, 黄德智, 蔡锐. 现代语音技术基础与应用[M]. 北京: 清华大学出版社, 2003.
    [6] 刘加. 汉语大词汇量连续语音识别系统研究进展[J]. 电子学报, 2000, 28(1): 85-91.
    [7] 语音识别产业的新发展. 通讯世界, 2005, 124: 97.
    [8] 韦岗. 抗噪声语音识别技术的研究[D]. 华南理工大学, 2003.
    [9] Y. Gong. Speech recognition in noisy environments: A survey [J]. Speech Communication, 1995, 16(3): 261-291.
    [10] B.H. Juang. Speech recognition in adverse environments [J]. Computer Speech and Language, 1991, 5: 275-294.
    [11] J.T. Graf, N. Hubing. Dynamic time-warping for the enhancement of speech degraded by white Gaussian noise [C]. IEEE ICASSP, 1993, 2: 339-342.
    [12] R. Martin. Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors [C]. IEEE ICASSP, 2002, 1: 253-256.
    [13] R. Martin. Noise power spectral density estimation based on optimal smoothing and minimum statistics [J]. IEEE Trans. Speech and Audio Processing, 2001, 9(5): 504-512.
    [14] I. Cohen. Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging [J]. IEEE Trans. Speech and Audio Processing, 2003, 11(5): 476-475.
    [15] S. Boll. A spectral subtraction algorithm for suppression of acoustic noise in speech [C]. IEEE ICASSP, 1979, 4: 200-203.
    [16] S. Boll. Suppression of acoustic noise in speech using spectral subtraction [J]. IEEE Trans. Acoustics, Speech, and Signal Processing, 1979, 27(2): 113-120.
    [17] V. Stahl, A. Fischer, R. Bippus. Quantile based noise estimation for spectral subtraction and wiener filtering [C]. IEEE ICASSP, 2000, 3: 1875-1878.
    [18] A. Rezayee, S. Gazor. An adaptive KLT approach for speech enhancement [J]. IEEE Trans. Speech and Audio Processing, 2001, 9(2): 87-95.
    [19] Y. Hu, P. C. Loizou. A Generalized subspace approach for enhancing speech corrupted by colored noise [J]. IEEE Trans. Speech and Audio Processing, 2003, 11(4):334-340.
    [20] F. Jabloun, B. Champagne. A multi-microphone signal subspace approach for speech enhancement [C]. IEEE ICASSP, 2001, 1: 205-208.
    [21] Y. Ephraim, H. L. Van. A signal subspace approach for speech enhancement [J]. IEEE Trans. Speech and Audio Processing, 1995, 3(4):251-266.
    [22] M. Gabrea. Speech signal recovery in colored noise using an adaptive Kalman filtering [J]. IEEE CCECE, 2002, 2: 974-979.
    [23] M. Gabrea. Robust adaptive Kalman filtering-based speech enhancement algorithm [C]. IEEE ICASSP, 2004, 1: 301-304.
    [24] 朱建华. 语音增强方法的研究[D]. 大连理工大学, 2002.
    [25] 张金杰, 曹志刚, 马正新. 一种基于听觉掩蔽效应的语音增强方法[J]. 清华大学学报(自然科学版), 2001, 41(7): 34-37.
    [26] 刘海滨, 吴镇扬, 赵力, 曾毓敏. 非平稳环境下基于人耳听觉掩蔽特性的语音增强[J]. 信号处理, 2003, 19(4): 303-307.
    [27] C.A. Medina, A.Alcaim. Using neural networks wavelet denoising of speech for threshold selection [J]. Electronics Leters, 2003, 39(25): 1869-1871.
    [28] Y. Hu, P.C. Loizou. Speech enhancement based on wavelet thresholding the multitaper pectrum [J]. I EEE Trans. Speech and Audio Processing, 2004, 12(1): 59- 67.
    [29] D. Kim, S. Lee, and R.M. Kil. Auditory processing of speech signals for robust speech recognition in real-world noisy environments [J]. IEEE Trans. Speech and Audio Processing, 1999, 7(1): 55-69.
    [30] A.M.A. Ali, J. Van der Spiegel, P. Mueller. Robust auditory based speech processing using the average localized synchrony detection [J]. IEEE Trans. Speech and Audio Processing, 2002, 10(5): 279-292.
    [31] B.K.W. Mak, Y.C. Tam, P.Q. Li. Discriminative auditory-based features for robust speech recognition [J]. IEEE Trans. Speech and Audio Processing, 2004, 12(1): 27-36.
    [32] H. Hermansky. Perceptual linear predictive (PLP) analysis for speech [J]. The Journal of the Acoustic Society of America, 1990, 87(4): 1738-1752.
    [33] W.W. Hung, H.C. Wang. On the use of weighted filter bank analysis for the derivation of robust MFCCS [J]. IEEE Signal Processing Letters, 2001, 8(3): 70-73.
    [34] I. Potamifis, N. Fakotakis, G. Kokkinakis, I. Potamisfis. Improving the robustness ofnoisy MFCC features using minimal recurrent neural networks [J]. IEEE-INNS-ENNS IJCNN, 2000, 5: 271-276.
    [35] V. Tyagi, C. Wellekens. On desensitizing the Mel-cepstrum to spurious spectral components for robust speech recognition [C]. IEEE ICASSP, 2005, 1: 529-532.
    [36] V. Tyagi, I. McCowan, H. Bourlard, H. Misra. Mel-cepstrum modulation spectrum features (MCMS) for robust ASR [J]. IEEE ASRU, 2003: 399-404.
    [37] 何强, 何英. MATLAB 扩展编程[M]. 北京: 清华大学出版社, 2002.
    [38] J.D. Chen, Y. Huang, Q. Li, K.K. Paliwal. Recognition of noisy speech using dynamic spectral subband centroids [J]. IEEE Signal Processing Letters, 2004, 11(2 ): 258-261.
    [39] B. Gajic, K.K. Paliwal. Robust feature extraction using subband spectral centroid histograms [C]. IEEE ICASSP, 2001, 1: 85-88.
    [40] J.C. Segura, C. Benítez, á. de la Torre, A.J. Rubio, J. Ramírez. Cepstral domain segmental nonlinear feature transformations for robust speech recognition [J]. IEEE Signal Processing Letters, 2004, 11(5): 517-520.
    [41] S. Yoshizawa, N. Hayasaka, N. Wada, Y. Miyanage. Cepstral gain normalization for noise robust speech recognition [C]. IEEE ICASSP, 2004: 209-212.
    [42] S. Ikbal, H. Hermansky, H. Bourlard. Nonlinear spectral transformations for robust speech recognition [J]. IEEE ASRU, 2003: 393-398.
    [43] A. Biem, S. Katagiri, E. McDermott, B.H. Juang. An application of discriminative feature extraction to filter-bank-based speech recognition [J]. IEEE Trans. Speech Audio Processing, 2001, 9(2): 96-110.
    [44] O.W. Kwon, T.W. Lee, K. Chan. Application of variational Bayesian PCA for speech feature extraction [C]. IEEE ICASSP, 2002, 1: 825-1- 825-8.
    [45] K. Ishizuka, N. Miyazaki. Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition [C]. IEEE ICASSP, 2004, 1: 141-144.
    [46] C. Lee, D. Hyun, E. Choi, J. Go, C.Y. Lee. Optimizing feature extraction for speech recognition. IEEE Trans. Speech and Audio Processing, 2003, 11(1): 80-87.
    [47] U.H. Yapanel, J.H.L. Hansen. A new perspective on feature extraction for robust in-vehicle speech recognition [J]. Eurospeech, 2003: 1281-1284.
    [48] J. Chen, K. Paliwal, S. Nakamura. Cepstrum derived from differentiated power spectrum for robust speech recognition [J]. Speech Communication, 2003: 469-484.
    [49] Q. Zhu, A. Alwan. Non-linear feature extraction for robust speech recognition in stationary and non-stationary noise [J]. Computer Speech and Language, 2003: 381-402.
    [50] O.Siohan. On the robustness of linear discriminant analysis as apreprocessing step fornoisy speech recognition [C]. IEEE ICASSP, 1995, 1: 125-128.
    [51] 史媛媛, 刘 加, 刘润生. 一种改进的线性区分分析方法及其在汉语数码语音识别上的应用[J]. 电子学报, 2002, 30(7): 959-963.
    [52] 谢达东, 吴及, 王作英. 线性判别分析在汉语语音识别中的应用[J]. 计算机工程与应用, 2002, 23: 1-2.
    [53] O. Viikki, K. Laurila. Cepstral domain segmental feature vector normalization for noise robust speech recognition [J]. Speech Communication, 1998, 25: 133-147.
    [54] C.W. Hsu, L.S. Lee. Higher order cepstral moment normalization (HOCMN) for robust speech recognition [C]. IEEE ICASSP, 2004, 1: 197-200.
    [55] H. Hermansky, N. Morgan. RASTA processing of speech [J]. IEEE Trans. Speech Audio Processing, 1994, 2(4): 578-589.
    [56] B.J. Shannon, K.K. Paliwal. Influence of autocorrelation lag ranges on robust speech recognition [C]. IEEE ICASSP, 2005, 1: 545-548.
    [57] J.H. Lee, H.Y. Jung, T.W. Lee, S.Y. Lee. Speech feature extraction using independent component analysis [C]. IEEE ICASSP, 2000, 3: 1631-1634.
    [58] S. Ikbal, H. Misra, H. Bourlard. Phase AutoCorrelation (PAC) derived robust speech features [C]. IEEE ICASSP, 2003, 2: 133-136.
    [59] S. Ikbal, H. Misra, H. Bourlard, H. Hermansky. Phase autocorrelation (PAC) features in entropy based multi-stream for robust speech recognition [C]. IEEE ICASSP, 2004, 1: 205-208.
    [60] C. H. Lee. On stochastic feature and model compensation approaches to robust speech recognition [J]. Speech Communication, 1998, 25: 29-47.
    [61] X. Menéndez-Pidal, R. Chan, D. Wu, M. Tanaka. Compensation of channel and noise distortions combining normalization and speech enhacement techniques [J]. Speech Communication, 2001, 34: 115-126.
    [62] S.A. Selouani, D. O’Shaughnessy. Robustness of speech recognition using genetic algorithms and a Mel-cepstral subspace approach [C]. IEEE ICASSP, 2004, 1: 201-204.
    [63] M. Afify, O. Siohan. Sequential estimation with optimal forgetting for robust speech recognition [J]. IEEE Trans. Speech and Audio Processing, 2004, 12(1): 19-26.
    [64] N. S. Kim. Statistical linear approximation for environment compensation [J]. IEEE Signal Processing Letters, 1998, 5(1): 8-10.
    [65] N.W.D. Evans, J.S. Mason. Noise estimation without explicit speech, non-speech detection: a Comparison of Mean, Modal and Median based approaches [J]. EuroSpeech, 2001, 2: 893-896.
    [66] J. Droppo, A. Acero. Noise robust speech recognition with a switching linear dynamic model [C]. IEEE ICASSP, 2004, 1: 953-956.
    [67] J. Droppo, L. Deng, A. Acero. A comparison of three non-linear observation models for noisy speech features [J]. Eurospeech, 2003: 681-684.
    [68] V. Stouten, H.V. hamme, K. Demuynck, P. Wambacq. Robust speech recognition using model-based feature enhancement [J]. 2003 Eurospeech, 2003: 17-20.
    [69] J. Ming. Universal compensation - an approach to noisy speech recognition assuming no knowledge of noise [C]. IEEE ICASSP, 2004, 1: 961-964.
    [70] J. Ming, P. Jancovic, F. J. Smith. Robust speech recognition using probabilistic union models [J]. IEEE Trans. Speech Audio Processing, 2002, 10(6): 403-414.
    [71] W.F. Li, K. Itou, K. Takeda, F. Itakura. Two-stage noise spectra estimation and regression based in-car speech recognition using single distant microphone [C]. IEEE ICASSP, 2005, 1: 533- 536.
    [72] S. Furui. Digital speech processing synthesis and recognition [M]. Marcel Dekker, Inc, 2001.
    [73] H. Yamamoto, T. Kosaka, M. Yamada, Y. Komori, M. Fujita. Fast speech recognition algorithm under noisy environment using modified CMS-PMC and improved IDMM+SQ [C]. IEEE ICASSP, 1997, 2: 847-850.
    [74] J.C. Segura, M.C. Benitez, á. de la Torre, T. Dupont, A.J. Rubio. VTS residual noise compensation [C]. IEEE ICASSP, 2002, 1: 409-412.
    [75] L. RABINER, B.H. JUANG. Fundamentals of speech recognition [M]. Prentice Hall, 1993.
    [76] T.F. Quatieri. Discrete-time speech signal processing: principles and practice [M]. Pearson Education, Inc. 2002.
    [77] 谢锦辉. 隐 Markov 模型(HMM)及其在语音处理中的应用[M]. 武汉: 华中理工大学出版社, 1995.
    [78] http://neural.cs.nthu.edu.tw/
    [79] http://htk.eng.cam.ac.uk/
    [80] D.G. Childers. Speech proceeding and synthesis toolboxes [M]. John Wiley & Sons, Inc, 2000.
    [81] S. Gannot, D. Burshtein, E. Weinstein. Signal enhancement using beamforming and nonstationarity with applications to speech [J]. IEEE Trans. Signal Processing, 2001, 49 (8): 1614-1626.
    [82] J. Cho. Speech enhancement using microphone array [D]. The Ohio State University, 2005.
    [83] C. Marro, Y. Mahieux, K. U. Simmer. Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering [J]. IEEE Trans. Speech and Audio Processing, 1998, 6(3):240-259.
    [84] X.M. Shen, L. Deng. A dynamic system approach to speech enhancement using the H ∞ filtering algorithm [J]. IEEE Trans. Speech and Audio Processing, 1999, 7 (4): 391-399.
    [85] I. Cohen, B. Berdugo. Speech enhancement for non-Stationary noise environments [J]. Signal Processing, 2001, 81 (2): 2403-2418.
    [86] K.C. Yen, Y.X. Zhao. Adaptive co-channel speech separation and recognition [J]. IEEE Trans. Speech and Audio Processing, 1999, 7 (2): 138-151.
    [87] H.T. Hu, F.J. Kuo, H.J. Wang. Supplementary schemes to spectral subtraction for speech enhancement [J]. Speech Communication, 2002, 36 (3-4): 205-218.
    [88] H. Puder. Speech enhancement with Kalman-filters in subbands [J]. Signal Theory, Darmstadt University of Technology, Germany.
    [89] W.R. Wu, P.C. Chen. Subband Kalman filtering for speech enhancement [J]. IEEE Trans. Circuits and Systems-II, 1998, 45(8): 1072-1083.
    [90] G. Doblinger. An adaptive Kalman filter for the enhancement of noisy AR signals [C]. Proceedings of the IEEE ISCAS, 1998, 5: 305-308.
    [91] G. John Proakis, Dimitris G. Manolakis. Digital signal processing: principles, algorithms, and applications [M]. Third section. Prentice Hall, 2004.
    [92] S. HAYKIN. Adaptive filter theory fourth edition [M]. Prentice Hall, 2002.
    [93] K. Kitsios, A. Spanias, B. Welfert. Adaptive modified covariance algorithms for spectral analysis [J]. Signal Processing, 2002, 82(3): 715-720.
    [94] 张贤达. 时间序列分析——高阶统计量方法[M]. 北京: 清华大学出版社, 1996.
    [95] P.S. Chang, A.N. Wilson. Analysis of conjugate gradient algorithms for adaptive filtering [J]. IEEE Trans. Signal Processing, 2000, 48 (2): 409-418.
    [96] 董婧, 赵晓晖. 基于自适应共轭梯度算法的高分辨率谱估计器[J]. 系统工程与电子技术, 2005, 27(10): 1689-1691.
    [97] S.L. Marpple. Digital spectral analysis [M]. Prentice Hall, 1987.
    [98] S.M. KAY. Modern spectral estimation theory and application [M]. Prentice Hall, 1988.
    [99] J. Li, P. Stoica. Adaptive filtering approach to spectral estimation and SAR imaging [J]. IEEE Trans. Signal Processing, 1996, 44(6): 1469-1484.
    [100] P. Stoica, R.L. Moses. Introduction to spectral analysis [M]. Prentice Hall, 1997.
    [101] Mallat Stéphane 编, 杨力华, 戴道清, 黄文良, 湛秋辉译. 信号处理中的小波导引[M]. 北京: 机械工业出版社, 2003.
    [102] 飞思科技产品研发中心. 小波分析理论与 MATLAB7 实现[M]. 北京: 电子工业出版社, 2005.
    [103] 董婧, 赵晓晖, 应娜. 基于二进小波变换的基音检测算法[J]. 吉林大学学报(工学版), 2006, 36(6): 978-982.
    [104] T. Shimamura, H. Kobayashi. Weighted autocorrelation for pitch extraction of noisy speech [J]. IEEE Trans. Speech and Audio Processing, 2001, 9(7): 727-730.
    [105] 王都生, 铁满霞, 樊昌信. 一种实时基音检测算法[J]. 电子学报, 2000, 28(10): 9-11.
    [106] 张军, 肖自美, 韦岗. MBE 语音模型中快速基音细搜索算法的研究[J]. 电路与系统学报, 2001, 6(2): 64-67
    [107] Mallat Stéphane, L. Wen. Singularity detection and processing with wavelets [J]. IEEE Trans. Information Theory, 1992, 38 (2): 617-643.
    [108] S. Kadambe, G.F. Boudreaux-Bartels. Application of the wavelet transform for pitch detection of speech signal [J]. IEEE Trans. Information Theory, 1992, 38 (2): 917-924.
    [109] 陈海花, 曲天书, 王树勋. 基于小波变换的语音信号基音频率检测法[J]. 吉林大学学报(工学版), 2002, 32 (2): 68-72.
    [110] 陆汉民, 金勇, 张德民. 小波多带激励(WMBE)算法的 DSP 仿真[J]. 重庆邮电学院学报, 1998, 10(4): 1-4.
    [111] 楼红伟. 基于小波变换的噪声环境下的语音识别方法[D]. 上海交通大学.
    [112] 张文耀, 许光, 王裕国. 循环 AMDF 及其语音基音周期估计算法[J]. 电子学报, 2003, 31(6): 886-890.
    [113] A. Antoniadis, G. Oppenheim. Wavelets and statistics [M]. Springer, 1995.
    [114] R.L. Bouquin-Jeannes, G. Faucon. Study of a voice activity detector and its influence on a noise reduction system [J]. Speech Communication, 1995, 16(3): 245-254.
    [115] G.S. Ying, C.D. Mitchell, L.H. Jamieson. Endpoint detection of isolated utterances based on a modified Teager energy measurement [C]. IEEE ICASSP, 1993, 2: 732-735.
    [116] J. Stegmann, G. Schroeder. Robust voice activity detection based on the wavelet transform [J]. IEEE Workshop on Speech Coding For Telecommunications, Pocono Manor, Pennsylvania, 1997: 99-100.
    [117] J. Sohn, N. S. Kim, W. Sung. A statistical model-based voice activity detection [J]. IEEE Signal Processing Letters, 1999, 6(1): 1-3.
    [118] W.H. Shin, B.S. Lee, Y.K. Lee, J.S. Lee. Speech / non-speech classification using multiple features for robust endpoint detection [C]. IEEE ICASSP, 2000, 3: 1399-1402.
    [119] L.S. Huang, C.H. Yang. A novel approach to robust speech endpoint detection in car environments [C]. IEEE ICASSP, 2000, 3: 1751-1754.
    [120] B. Kotni, Z. Kacic, B. Horvat. A multiconditional robust front-end feature extraction with a noise reduction procedure based on improved spectral subtraction algorithm [J]. Eurospeech, 2001: 197-200.
    [121] L. Y. Gu, S. A. Zahorian. A new robust algorithm for isolated word endpoint detection [C]. IEEE ICASSP, 2002, 4: 4161.
    [122] M. Marzinzik, B. Kollmeier. Speech pause detection for noise spectrum estimation by tracking power envelope dynamics [J]. IEEE Trans. Speech Audio Processing, 2002, 10(2): 109-118.
    [123] J. Ramírez, J.C. Segura, C. Benítez, á. de la Torre, A.J. Rubio. A New Kullback–Leibler VAD for Speech Recognition in Noise [J]. IEEE Signal Processing Letters, 2004, 11(2): 266-269.
    [124] K. Li, N.S. Swamy, M.O. Ahmad. An improved voice activity detection using higher order statistics [J]. IEEE Trans. Speech and Audio Processing, 2005, 13(5): 965-974.
    [125] B.F. Wu, K.C. Wang. Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments [J]. IEEE Trans. Speech and Audio Processing, 2005, 13(5): 762-775.
    [126] B.H. Juang, S. Katagiri. Discriminative training for minimum error classification [J]. IEEE Trans. Signal Processing, 1992, 40: 3043-3054.
    [127] W. Chou. Discriminant-function-based minimum recognition error rate pattern-recognition approach to speech recognition [J]. IEEE, 2000, 88(8): 1201-1223.
    [128] A. Martin, D. Charlet, L. Mauuary. Robust speech / non-speech detection using LDA applied to MFCC [C]. IEEE ICASSP, 2001, 1: 237-240.
    [129] R.O. Duda, P.E. Hart, D.G. Stork. Pattern classification [M]. Second Edition, John Wiley & Sons, Inc, 2001.
    [130] M. Ioog, R.P.W. Duin, R. Haeb-Umbach. Multiclass Linear Dimension Reduction by Weighted Painise Fisher Criteria [J]. IEEE Trans. Pattern Malysis and Machine Intelligence, 2001, 23(7): 762-766.
    [131] Y.C. Tam, B. Mak. An alternative approach of finding competing hypotheses for better minimum classification error training [C]. IEEE ICASSP, 2002, 1: 101-104.
    [132] X. Li, R.M. Stern. Feature generation based on maximum normalized acoustic likelihood for improved speech recognition [J]. Eurospeech, 2003: 845-848.
    [133] J. Hai, E.M. Joo. Improved linear predictive coding method for speech recognition [J]. ICICS-PCM, 2003: 1614-1618.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700