电话信道自然语音关键词检测
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
关键词检测是一种特殊的语音识别技术,旨在从连续话音中检测出由具体应用决定的特定词,它在许多领域内有着良好的应用前景。本文简要介绍了关键词检测技术的发展史和国内外发展动态,并分别就特征提取与选择、模式划分方法和时间对准三个关键词检测的基本问题做了详细的介绍。目前比较流行的模式划分方法是隐马尔可夫模型。本文重点介绍了隐马尔可夫模型的基本原理,并给出简单、可行的训练方法和识别策略,建立了基于识别——确认两级结构的识别系统,实现了无语法限制的关键词检测。此外,本文在提高系统鲁棒性和识别速度方面做了新的尝试:应用FastICA算法对特征变换和降维;实现了说话人分类和说话人自适应基本算法,说话人分类由混合高斯模型实现,可以扩大应用人群,提高识别率,说话人自适应由最大似然线性回归算法实现;在提高系统识别速度方面采用高斯选择法。在发音确认阶段,本文还提出新的基于识别结果本身信息的置信度,可以有效减少系统虚警率。文章最后还给出了在关键词检测方面进一步的研究方向。
As one special field in speech recognition research, keyword spotting is to determine occurrences of one or more keywords embedded in unconstrained extraneous speech and/or noise. It has bright future in many application areas. In this paper, we give a brief history of keyword spotting research and provide a discussion of its fundamental principle in which three most important problems in this field are pointed out, that is, how to extract and choose feature and how to characterize keywords and garbage; and how to detect keywords from continuous speech. This paper describes the basic theory of HMM and presents simple and practical methods for building HMM and time aligning patterns with models are provided, that is, Segmental k-means training algorithm and Frame-synchronous Viterbi algorithm. And we build a recognition-verification system which can detect keywords from continuous speech without grammar restriction. Furthermore, we do some works in feature transformation, speaker clustering, speaker adaption
    and Gaussian selection for improving system robustness and efficiency. Feature transformation is achieved by FastICA algorithm. Speaker clustering is implemented by Gaussian Mixture Model which can make system applied to a wider group of people and speaker adaption is achieved by Maximum Likelihood Linear Regression algorithm. And we use Gaussian selection method to reduce calculation. In utterance verification phase, some new confidence measures based on recognition results' information are used to reduce the false alarm rate. Finally, the paper shows the further research direction in this field.
引文
[1] 杨行峻,迟惠生.语音信号处理.电子工业出版社.1995年
    [2] 胡光锐.语音处理与识别.上海科学技术文献出版社.1994年
    [3] Wilpon J.G ., Lee C.H., Rabiner L .R.. Application of Hidden Markov Models for Recognition of a Limited Set of Words in Unconstrained Speech. ICASSP89, vol.3, 254-257
    [4] Wilpon Jay G., Rabiner Lawrence R., Lee Chin-Hui, Goldman E .R.. Automatic Recognition of Keywords in Unconstrained Speech Using Hidden Markov Models. IEEE Trans, on ASSP, 38(11) ,1870-1878, 1990
    [5] Bridle, J .S.. An Efficient Elastic-Template Method for Detecting Given Words in Running Speech. Brit. Acoust. Soc. Meeting, pp. 1-4, 1973
    [6] Christiansen R.W., Rushforth C.K.. Detecting and Locating Keywords in Continuous Speech Using Linear Predictive Coding. IEEE Trans, on ASSP, 25(5) , 361-367, 1977
    [7] Myers C.S., Rabiner L .R., Rosenberg A.E. An Investigation of the Use of Dynamic Time Warping for Word Spotting and Connected Word Recognition. ICASSP80, 173-177
    [8] Higgins Alan L., Wohlford Robert E. Keyword Recognition Using Template Concatenation. ICASSP85, vol.3, 233-1236
    [9] Rohlicek J.Robin, Russel William, Roukos Salim,Gish Herbert. Continuous Hidden Markov Modeling for Speaker-Independent Word Spotting. ICASSP89, vol.3, 627-630
    [10] Rose Richard C., Paul Douglas B.. A Hidden Markov Model Based Keyword Recognition System. ICASSP90, vol.1 129-132
    [11] Zeppenfeld, Torsten, Waibel, Alex H.. A Hybrid Neural Network, Dynamic Programming Word Spotter. ICASSP92, vol.2, 77-80
    [12] Takebayashi Y ., Tsuboi H., Kanazawa H. A Robust Speech Recognition System Using Word-Spotting with Noise Immunity Learning. ICASSP91,905-908
    [13] Zheng Fang, Xu Mingxing, Mou Xiaolong. HARKMAN-A Vocabulary-Independent Keyword Spotter for Spontaneous Chinese Speech. Jounal. of Computer Science and Technology, 14(1) : 18-26, Jan., 1999
    [14] Lawrence Rabiner, BingHwang Juang. Fundamentalsof speech recognition(影印版).清 华大学出版社,1998
    [15] Lippmann, Richard P., Singer Elliot. Hybrid HMM/Neural-Network Approaches to Wordspotting. ICASSP93, vol.1, 565-568
    [16] Wilcox, Lynn D., Bush, Marcia A.. Training and Search Algorithms for an Interactive Wordspotting System. 1CASSP92, vol.2, 97-100
    [17] Yeou-Jiunn Chen, Chung-Hsien Wu, Gwo-Lang Yan. Utterance verification using prosodic information for Mandarin telephone speech keyword spotting. ICASSP 99
    [18] Berlin Chen, Hsin-min Wang. A*-Admissible Key-phrase Spotting With Sub-syllable Level Utterance Verification. ICSLP 98
    [19] Bo-Ren Bai,Chiu-Yu Tseng,Lin-Shan Lee. A Multi-Phase Approach For Fast Spotting Of Large Vocabulary Chinese Keywords From Mandarin Speech Using Prosodic Information. ICASSP97 Volume 2, Page 903
    [20] Mosur K.Ravishanker. Efficient Algorithms for Speech Recognition. Dissertation of PH.D. CMU-96-1-43
    [21] C.H.Lee, L.R.Rabiner. A Frame Synchronous Network Search Algorithm for Connected
    
    Word Recognition. IEEE Trans. ASSP,37(11) : 1649-1658, 1989
    [22] Hynek Hermansky. Perceptual Linear Predictive(PLP) Analysis of Speech. Journal of Acoustical Society of America, 87(4) , 1738-1752,1990
    [23] A. Hyva'rinen and E. Oja . Independent Component Analysis: Algorithms and Applications. Neural Networks, 13(4-5) :411-430, 2000
    [24] A. Hyva'rinen. Survey on Independent Component Analysis. Neural Computing Surveys 2:94-128, 1999
    [25] I. Potamitis. Independent Component Analysis Applied to Feature Extraction for Robust Automatic Speech Recognition. Electronics Letters, 36(23) , 1977-1979,2000
    [26] C.S.Myers, L.R.Rabiner. A Level Building Dynamic Time Warping Algorithm for Connected Word Recognition. IEEE Trans. ASSP-29: 284-297, April 1981
    [27] L.R.Rabiner, S.E.Levinson, A.E.Rosenberg, J.G.Wilpon. Speaker Independent Recognition of Isolated Words Using Clustering Techniques. IEEE Trans. ASSP-27: 336-349, August 1979
    [28] J.G.Wilpon, L.R.Rabiner. A Modified k-means Clustering Algorithm for use in Speaker Independent Isolated Word Recognition. IEEE Trans. ASSP-33, Vol 3, 587-594, June 1985
    [29] G.D.Forney. The Viterbi Algorithm. Proc. IEEE, 61: 268-278, March 1973
    [30] 易克初,田斌.语音信号处理.国防工业出版社.2000
    [31] D. Reynolds and R. Rose. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. ASSP, 3(1) :72-83, 1995
    [32] A. Dempster, N. Laird, D. Rubin. Maximum Likelihood from Incomplete Data via EM Algorithm. J. Royal Statistical Soc.39,1,1977
    [33] M.J.F. Gales, P.C. Woodland. Mean and Variance Adaptation within the MLLR Framework. Computer Speech and Language, Volume 10,1996
    [34] M.J.F. Gales. Maximum Likelihood Linear Transformations for HMM-based Speech Recognition. Computer Speech and Language, Volume 12,1998
    [35] Gethin Williams. A study of the use and evaluation of confidence measures in automatic speech recognition. Department of Computer Science ,University of Sheffield .Technical Report :CS-98-02,1998
    [36] R.C. Rose. Discriminant word spotting techniques for rejecting non-vocabulary utterances in unconstrained speech. ICASSP92, 105-108
    [37] Rahim M G, Lee C H, Juang B H. Discriminative utterance verification for connected digits recognition. Proc. Euro Speech, 1995,529-532
    [38] Rose R C, Juang B H, Lee C H. A training procedure for verifying string hypotheses in continuous speech recognition. ICASSP95, 281-284
    [39] Sukkar RA , Lee C H, Juang B H. A vocabulary independent discriminatively trained method for rejection of non-keywords in subword-based speech recognition. Proc. Euro speech, 1629-1632,1995
    [40] K.M. Knill, S.J. Young . Fast Implementation Methods for Viterbi-based Word Spotting. ICASSP96, 522-523
    [41] Gish Herbert, Chow Yen-L u, Rohlicek J. Robin. Probabilistic Vector Mapping of Noisy Speech Parameters for HMM Word Spotting. ICASSP90, vol.1, 117-120
    [42] Ng Kenney, Gish Herbert, Rohlicek J. Robin. Robust Mapping of Noisy Speech Parameters for HMM Word Spotting. ICASSP92, vol.2, 109-112
    [43] D.G..Luenberger. Optimization by vector space methods. John Wiley & Sons, 1969

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700