基于脉冲神经网络的语音识别方法研究

英文题名：Research on Speech Recognition Based on Spiking Neural Networks
作者：章文彬
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：脉冲神经网络 ; 语音识别 ; H-H方程 ; 圆映射
英文关键词：spiking neural networks ; speech recognition ; H-H equation ; circle mapping
学位年度：2007
导师：方路平
学科代码：081203
学位授予单位：浙江工业大学
论文提交日期：2007-04-01

摘要

近年来，基于spike神经元模型的人工神经网络(Spiking NeuralNetworks，简称SNNs，我们称之为脉冲神经网络)受到了人们的很大关注，被誉为下一代神经网络。spike神经元模型是利用神经元触发的脉冲时序来进行信息的编码和处理，而不是用触发脉冲的平均速率来进行编码。因此SNNs编码方式具有很好的时态性，非常适合分析处理基于时间结构的数据。有研究表明SNNs比一般的神经网络具有更强的计算能力，而且前者所需神经元数目或层次也比后者要少。不过目前对SNNs的研究主要是理论、算法等方面，在实际应用方面的研究比较少。本文结合了脉冲神经网络处理时态问题上优势，以及最近学术界的研究的热点，就基于脉冲神经网络这一理论来解决语音识别问题进行探讨和研究。
本文系统地介绍了脉冲神经网络的相关理论，包括Spike神经元的模型，脉冲编码方式，动力学原理及表示，著名的H-H方程，以及网络结构和相关应用。较全面地介绍了语音识别技术，并分析了语音识别所面临的问题及前景和应用。清晰地给出了用脉冲神经网络来进行语音识别的方法和步骤，在脉冲神经元的仿真上使用比较经典的H-H方程，同时利用圆映射的方法将神经脉冲序列转变为符号序列，最终由符号空间转变到距离空间来进行计算和匹配。最后用软件完成了基于脉冲神经网络理论的孤立词语音识别的实验，通过调节H-H方程的部分参数提高了系统的识别率。
Recently, Spiking Neural Networks are considered as a new computation paradigm, representing the next generation of Artificial Neural Networks by offering more flexibility and degrees of freedom for modeling computational elements. Neurological research has shown that spike neurons encode information in the timing of single spikes, and not in their average firing frequency. So the encoding type of SNNs is temporal, and SNN models fits to the analysis of time-structured data very much. Spiking neural networks have been shown to have more powerful computation capability than their non-spiking predecessor as their can use less neurons to solve the same problem. But as far as we know, most researches have been restricted to theoretical work, and there are few applications of SNNs to real-life data. Therefore, we combine the advantage of SNNs to solve temporal problem with the research hotspot of the academia to do some research on Speech Recognition based on spiking neural networks.
First, this article gives an introduction about the SNNs, including the model of the spike neuron, spikes coding, neuronal dynamics, Hodgkin-Huxley Model, Network architecture and Applications using SNNs. Also, it gives a comprehensive introduction about speech recognition technology, and has analyzed the problem which the speech recognition is faced with. Clearly, it proposes the approach and the algorithm to do speech recognition based on SNNs, using the H-H equations to simulate the spike neuron and using the circle-map to transform the pulses list to symbol list and computing. At last, the recognition system has been implemented in Matlab, and parameters of H-H equation have been adjusted to improve the results of therecognition.

引文

[1] McCulloch W. and Pitts W. A logical calculus of the ideas immanent in nervous activity[J]. Bulletin of Mathematical Biophysics, 1943, (5): 115-133.
    [2] Hebb H. O. The organization of behavior[M]. New York: Wiley,1949.
    [3] Rosenblatt F. The perception: A probabilistic model for information storage and organization in the brain[J]. Psychological Review, 1958, 65: 388-408.
    [4] Minsky M. L. and papert S. A. perception[M]. Cambrige, MA: MIT press. 1969.
    [5] Willshaw, D. J. and Vonder M. C. How patterned neural connections can be set up by self-organization[J]. Proceedings of the Royal Society of London, 1976, 194(1117): 63-70.
    [6] Grossberg S. Studies of Mind and Brain[M]. Boston: Reidel Publishing Corporation. 1982.
    [7] Hopfield J. J. Neural networks and physical system with emergent collective computational abilities[J]. Proceedings of National Academy of Sciences, 1982, 79(7): 2554-2558.
    [8] 张铃，张钹．论概率逻辑神经元网络(Ⅰ)—结构性定理[J]．模式识别与人工智能，1991，4(3)，1-10．
    [9] 张铃，张钹．论概率逻辑神经元网络(Ⅱ)—Aleksander算法性质[J]．模式识别与人工智能，1992，5(2)，90-96．
    [10] Zhang B., Zhang L., Zhang H. A quantitative analysis of the behaviors of the PLN networks[J]. Neural Networks, 1992,5(4):639-644.
    [11] 张钹，张铃．概率逻辑神经元网络收敛性的分析[J]．计算机学报，1993，16(1)：1-12．
    [12] Hubertus A., Andree M. A comparison study of binary feed forward neural network and digital circuits[J]. Neural Networks, 1993, 6(6):785-790.
    [13] Pao Y. H. Adaptive Pattern Recognition and Neural Networks[M]. US: Addison Wesley, 1989.
    [14] 张军英，保铮．二进制神经网络的最稳健设计[J]．电子学报，1997，25(10)：37-43．
    [15] VanRullen R., Delorme A. and Thorpe S. J. Feed-forward contour integration in primary visual cortex based on asynchronous spike propagation[J]. Neurocomputing, 2001, 4: 38-40.
    [16] Rolf D. H. Stereosehen und das zyklopische Auge[A]. Spektrum der Wissenschaft[C]. 2002. 10-16
    [17] Andreas Z. Simulation neuronaler Netze[M]. Noon: Addison Wesley, 1994.
    [18] Van R. R., Gautrais J., Delorme A and Thorpe S. Face processing using one spike per neuron[J]. Biosystems, 1998, 48:229-239.
    [19] Berthold R. Computing and Learning with Spiking Neurons-Theory and Simulations[D]. Technische Universitat Graz, 1998.
    [20] Santi C., Michele B., and Angelo D. G. Synchronization mechanisms in neuronal networks[A]. Proceedings of the 6th International Work-Conference on Artificial and Natural Neural Networks, Lecture Notes in Computer Science[C]. Springer, 2001.87-94
    [21] Wolfgang M. Computing with spiking neurons[M], Models of Neural Networks, 1999.
    [22] Wolfgang M. Networks of spiking neurons: the third generation of neural network models[J]. Neural networks, 1997,10(9): 1659-1671.
    [23] Gerstner W. Time structure of the activity in neural network models[J]. Phys. Rev. E, 1995,51:738-758.
    [24] Thorpe S., Fize D., and Marlot C. Speed of processing in the human visual system[J]. Nature, 381:520-522.
    [25] Tovee, M.J. and Rolls E.T. Information encoding in short firing rate epochs by single neurons in the primate temporal visual cortex[J]. Visual Cognition, 2(1):35-58.
    [26] Kjaer T.W., Hertz J.A., and Richmond B.J. Decoding cortical neuronal signals: network models, information estimation and spatial tuning[J]. J.Comput. Neurosci, 1994,1:109-139.
    [27] Optical L.M. and Richmond B.J. Temporal encoding of two-dimensional patterns by single units in primate inferior temporal cortex. 3. Information theoretic anaIysis[J]. Neurophysiol. 57:162-178.
    [28] Tovee M. J., Rolls E. T., Treves A., and Belles R. P. Information encoding and the responses of single neurons in the primate visual cortex[J]. Neurophysiol, 70(2): 640-654.
    [29] Treves A., Rolls E. T. and Simmen M. Time for retrieval in recurrent associative memories[J]. PhysicaD. 1997,107:392-400
    [30] Hopfield J. J. and Herz A. V. M. Rapid local synchronization of action potentials: towards computation with coupled integrate-and-fire neurons[J]. Proceedings of the National Academy of Sciences of USA, 92(15):6655-6662.
    [31] Van V.C. and Sompolinsky H. Irregular firing in cortical circuits with inhibition/excitation balance[J]. Proceedings of the annual conference on Computational neuroscience: Trends in Research, 1997,209-213.
    [32] Tsodyks M. V. and Sejnowski T. Rapid state switching in balanced cortical networks models[J]. Network: Computation on Neural Systems, 1995,6(2):111-124.

    [33] Hopfield J. J. Pattern recognition computation using action potential timing for stimulus representation[J]. Nature, 1995, 376:33-36.
    [34] Maass W. Lower bounds for the computational power of networks of spiking neurons[J]. Neural Computation, 1996, 8(1): 1-40.
    [35] Jensen O. and Lisman J. E. (1996). Hippocampal ca3 region predicts memorysequences: accounting for the phase precession of place cells[J]. Learning and Memory, 1996, 3:279-287.
    [36] O'Keefe J. Hippocampus, theta, and spatial memory[J]. Curr. Opin. Neurobiol., 1993,3:917-924.
    [37] Abeles M. Firing rates and well-timed events in the cerebral cortex[J]. Models of Neural Networks, 1994,2: 121-138
    [38] Abeles M., Bergman H., Margalit E., and Vaadia E. Spatiotemporal firing patterns in the frontal cortex of behaving monkeys[J]. Neurophysiol, 1993, 70(4): 1629-1638.
    [39] Abeles M. Corticonics: Neural Circuits of the Cerebral Cortex[M]. Cambridge Univ. Press, 1990.
    [40] Steinmetz P. N., Roy A., Fitzgerald P. J., Hsiao S. S., Johnson K., and Niebur E. Attention modultaes synchronized neuronal firing in primate somatosensory cortex[J]. Nature, 404:187-190.
    [41] Hille B. Ionic channels of excitable membranes[M]. Sunderland MA: Sinauer Associates, 1992.
    [42] Hodgkin A. L. and Huxley A. F. A quantitative description of ion currents and its applications to conduction and excitation in nerve membranes[J]. J of Physiology. (London), 1952, 117:500-544.
    [43] Nelson. M and Rinzel J. The Hodgkin-Huxley model[J]. The Book of GENESIS: Exploring Realistic Neural Models with the General Neural Simulation System, J Bower and D Beeman, New York, 1995.
    [44] Cronin J. Mathematical aspects of Hodgkin-Huxley theory[M]. London: Cambridge University Press, 1987.
    [45] 胡光锐．语音处理与识别[M]，上海科学技术出版社，1994．
    [46] 陈永彬．语音信号处理[M]，上海交通大学出版社，1990．
    [47] 姚天任．数字语音处理[M]，华中理工大学出版社，1992．
    [48] Lawrence R., Juang B.H. Fundamentals Of Speech Recognition[M]. Prentice-Hall Internation, Inc, 1993.
    [49] 杨行峻，迟惠生等．语音信号数字处理-第一版[M]．北京：电子工业出版社，1995．
    [50] 肖述才，王作英．端点检测中的一种新的对数能量特征[J]．电声技术，2004，(6)：37～41．
    [51] 何强，何英．MATLAB扩展编程[M]．北京：清华大学出版社，2002．
    [52] Hodgkin A.L., Huxley A.F. quantitative description of membrane current and its application to conduction and excitation innerve[J]. Journal of Physiology, 1952, 117(2): 500-544.
    [53] Hansel D, Mato G, Meunier C. Phase for weakly coupled Hodgkin-Huxley neurons[J]. Europhysics Lett, 1993, 23(2): 367-372.
    [54] 童勤业，钱鸣奇，李绪，郭宏记等．嗅觉神经系统脉冲编码的机理研究[J]，中国科学E辑信息科学，2006，36(4)：449-466。
    [55] Kim S, Ostlund S. Universal scaling in circle maps[J]. Physica, 1989, 39: 365-392.
    [56] Zhang Z. J., Chen S. G. Symbolic dynamics of the circle map[J]. Acta Physica Sinica, 1989, 38(1): 1-8.
    [57] 陈式刚．圆映射[M]，上海科技教育出版社，1998．
    [58] Claude M., Idan S. Playing the Devil's advocate is the Hodgkin-Huxley model useful[J]. Trends Neurosciences, 2002, 25(11): 558-563.
    [59] Sohn J.W., Zhang B.T. and Kaang B. K. Temporal pattern recognition using a spiking neural with delays[A]. In Proceedings of International Joint Conference on Neural Network(IJCNN'99)[C], 1999,4:2590-2593
    [60] Natschlager T. and Ruf B. Spatial and temporal pattern analysis via spiking neurons[J]. Network: Computation in Neural Systems, 1998, 9(2): 319-332
    [61] Storck J., Jkel F. and G.Deco. Temporal clustering with spiking neurons and dynamic synapses: towards technological application[J]. Neural Networks, 2001,14(3):275-285.
    [62] 李虎生．汉语数码串语音识别及说话人自适应[D]，北京：清华大学，2002．
    [63] 陈永彬，王仁华．语音信号处理[M]，安徽：中国科学技术大学出版社，1990．
    [64] 甑斌，迟惠生等．语音识别和说话人识别中各倒谱分量的相对重要性[J]，北京大学学报，2001，37(3)：23-26．

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700