面向汉语语音关键词检出的时间集成神经网络研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
作为语音识别领域的核心热点技术之一,关键词检出技术(KWS)近年来得到了长足的发展。虽然目前在语音识别领域隐马尔可夫模型(HMM)占主导地位,但就关键词检出技术而言,人工神经网络(ANN),以出色的判别能力、较小的计算量、更高的灵活度,成为重点研究的方向之一。
     本文研究了一种新型的神经网络—时间集成人工神经网络(TANN),并把它应用在汉语语音的关键词检出中。TANN通过时间集成和帧间集成两个步聚的时延和集成处理,把语音的时序性特征较好地表征出来,很好地解决了神经网络在处理时序分类问题上遭遇的困难,并且巧妙地回避了语音识别中的时间对正问题。本文引入了一种基于熵误差函数(EEF)新颖的网络训练算法,并辅之以增加动量项和变步长学习法等多种加速网络收敛算法。这种训练算法不但通过对样本进行多点学习,使网络适应关键词检出时的动态过程,还有效地加速了时间集成人工神经网络的收敛速度,使网络在学习过程中逃出部分误差曲面的局部极小点。
     在关键词检出的过程中,本文采用了TANN作为分类器。本文还提出一种多模板联合决策算法,这一算法借鉴了多分类器融合思想,改进了单一分类器在时间集成方面的不足,提高了检出正确率。基于TANN的关键词检出技术,检出正确率最高可达80%以上,并可满足实时应用要求,值得进一步的研究和探索。
As one key research field and application hotpot of speech recognition, Keyword Spotting(KWS) technology has made a significant improvement in recent years. Although Hidden Markov Model(HMM) has been mainstream of speech recognition for years, Artificial Neural Network(ANN), with its strong discriminatory ability, low computation cost, high flexibility, has become an efficient solution to KWS.
     In this paper, we have an investigation into a novel strategy on KWS, in which a new type of time-delayed neural network called Time-Accumulation Neural Network(TANN) is adopted, TANN with two steps, time accumulation and frame accumulation. TANN is quite a solution to the problem both faced in temporal sequence pattern classification and time warping in speech recognition. This paper refers an innovative network training algorithm based on entropy error function(EEF), supplemented by increased momentum items and variable step-learning and other accelerated network convergence algorithms. Such training algorithm is not only through learning at several points to make it adapt dynamic process in KWS, but also effectively accelerate the speed of Artificial Neural Networks convergence, enable the network to escape some of the error surface local minima in the learning process.
     In KWS, the paper uses TANN as the classifier. The paper also proposes a multi-template joint decision-making theory, it comes classifier fusion theory. It has improved the classification of a single integrated in time for the shortage, increases the correct rate. The KWS technique based on TANN, correct rate can reach more than 80%, and also can satisfy the request of real-time application, worthy of further research and exploration.
引文
1 B.H. Juang. On the Hidden Markov Model and Dynamic Time Warping for Speech Recognition-A Unified View. AT&T Bell Laboratories Technical Journal. 1984, 63(7): 1213-1244
    2 A.S. Manos. Study on Out-of-Vocabulary Word Modeling for a Segment- Based Keyword Spotting System. M.S. Thesis. Massachusetts Institute of Technology, 1996
    3 R.C. Rose, D.B. Paul. A Hidden Markov Model Based Keyword Recognition System. Proc. IEEE ICASSP 90. 1990, 1: 129-132
    4 A.H. Waibel, T. Hanazawa, G. Hinton, K. Shikano, K. Lang. Phoneme Recognition Using Time-Delay Neural Networks. IEEE Transactions on Acoustics Speech and Signal Processing. 1989, 39(3): 698-713
    5 J.S. Bridle. An Efficient Elastic-Template Method for Detecting Given Words in Running Speech. Brit. Acous. Soc. Meeting. 1973: 1-4
    6 C.S. Myers, L.R. Rabiner, A.E. Rosenberg. An Investigation of the Use of Dynamic Time Warping for Word Spotting and Connected Word Recognition. Proc. IEEE ICASSP 80. 1980, 5: 173-177
    7 J. Robin Rohlicek, William Russell. Continuous Hidden Markov Modeling for Speaker-Independent Word Spotting. Proc. IEEE ICASSP 89. 1989, 1: 627~631
    8 B.H. Juang, L.R. Rabiner. Mixture Autoregressive Hidden Markov Models for Speech Signals. IEEE Trans on ASSP. 1985, 33: 1404~1413
    9 Richard C. Rose, Douglas B. Paul. A Hidden Markov Model Based Keyword Recognition System. Proc. IEEE ICASSP 90. 1990, 1: 1~3
    10 A. Waibel. Phoneme recognition: Neural network vs. Hidden Markov models [J]. Proc. IEEE ICASSP 88. 1988, 13: 78-85
    11 R.P. Lippmann, Singer Elliot. Hybrid HMM/Neural-Network Approaches to Word Spotting. Proc. IEEE ICASSP 93. 1993, 1: 565~568
    12 Y. Benayed, D. Fohr, J.P. Haton, G. Chollet. Confidence measure for keyword spotting using support vector machines[A]. Proceedings of IEEE ICASSP 2003. 2003: 588-591
    13阎平凡,张长水.人工神经网络与模拟进化计算.清华大学出版社, 2000
    14沈世镒.神经网络系统理论及其应用.科学出版社, 2001
    15施鸿宝.神经网络及其应用.西安交通大学出版社, 1993
    16 R.R. Lawrence, W.S. Ronald. Digital Processing of Speech Signals. Prentice Hall, Inc., 1993
    17 J. Makhoul, A. Gray. Linear Prediction of Speech. Springer-Verlay, 1976
    18 K.P. Unnikrishnan, John J. Hopfield, David W. Tank. Connected-digit speaker-dependent speech recognition using a neural network withTime-Delayed Connections. IEEE transactions on signal processing. 1991, 39(3):698-713
    19 R.H. Silverman. Image processing and pattern recognition in ultra sonograms by backpropagation[J]. Neural Networks archive. 1990, 3: 593 -603
    20张清良.一种改进的BP算法.吉首大学学报(自然科学版). 2003, 24(4): 79-81
    21 R.O. Duda, P.E. Hart, D.G. Stork. Pattern Classification, Second Edition. John Wiley & Sons, Inc., 2001
    22 P. Haffner, A. Waibel. Time-Delay Neural Networks Embedding Time Alignment: a performance analysis. Proc. Eurospeech 91. 1991, 3: 1415-1418
    23 S.P. Day, M.R. Davenport. Continuous-Time Temporal Back propagation with Adaptable Time Delays. IEEE Trans. Neural Networks. 1993, 4: 348-354
    24 E.A. Wan,. Temporal Back propagation for FIR Neural Networks. Int. Joint Conf. Neural networks. 1990, 1: 575-580.
    25孙泽行,胡绍海,张思东. BP网络中的熵函数准则.北方交通大学学报. 1997, 21(5): 543– 547
    26 N.B. Karayiannis, A.N. Venetsanopoulos. Fast learning algorithmsfor neural networks[J]. IEEE Trans. on Ciruits and System II :Analog and Digital Signal Processing. 1992 ,39: 453~474
    27 S.H. Oh. Improving the error back propagation algorithm with amodified error function [J]. IEEE Trans. on Neural Networks. 1997 ,8: 799~802.
    28 Jiang Minghu, et al. Fast learning algorithms for feed forward Neural Networks[J]. Applied Intelligence. 2003 ,18: 37~54
    29魏隽,吴育华,秦智辉.熵权系数法在软件产业发展战略选择中的应用.河北经贸大学学报. 2002, 2: 82-87
    30 S.B. Davis, P. Mermelstein. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. IEEE Transactions on Signal Processing. 1980, 28:357~366
    31赵力.语音信号处理.北京:机械工业出版社, 2003
    32郑方.连续无限制语音流中关键词识别方法研究.清华大学, 1997
    33孙即祥.现代模式识别.国防科技大学出版社, 2002
    34刘映杰,马义德,刘悦,袁敏,段磊.连续汉语音流中声韵母切分研究.甘肃科学学报. 2005, 17: 90~93
    35 J. Kittler, M. Hatef, RPW Duin. On Combining Classifiers[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1998, 20(3): 226-239
    36 Wang Wenwei, A. Brakensiek. Combination of multiple classifiers for handwritten word recognition. Proceedings of the Eighth InternationalWorkshop on Frontiers in Handwriting Recognition. 2002: 202-210
    37 T.K. Ho. Complexity of Classification Problems and Comparative Advantages of Combined Classifiers[A]. Springer Berlin, 2000
    38 F. Roli, G. Giacinto. Design of Multiple Classifier Systems. Scientific Publishing, 2002
    39易克初,田斌,付强.语音信号处理.国防工业出版社, 2000
    40胡瑞敏,薛东辉,姚天任. BP人工神经元网络与汉语语音的音节切分[J] .华中理工大学学报. 1996, 24: 25-29
    41王帆,郑方,吴文虎.基于多尺度分形维数的汉语语音声韵切分[J].清华大学学报(自然科学版). 2002, 42(1): 68-71
    42 J.P. Van. Hemert Automatic segmentation of speech[J]. IEEE Trans Signal Process. 1991, 39(4):1008-1012
    43 N. Morgan, H.A. Bourlard. Neural Networks for Statistical Recognition of Continuous Speech. Proceedings of IEEE. 1995, 83(5): 742-770

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700