语音信号动态特征分析及其可视化的关键技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
语音信息的传递是人们之间交流最方便、最自然的手段。一部分聋哑人不能说话是因为他们的听觉器官遭到损坏,不能将语音信息采集到大脑,但发音器官是完好的。这种情况下的聋哑人,如果辅助于一些视觉训练系统,经过一段时间的专门训练,是可以学会说话并和健全人进行交流。对这种将语音信息转换为可以用视觉识别图像的辅助聋哑人语音训练系统自上世纪六十年代中期以来国内外都有很多研究,但到目前为止这些系统大多采用单一的语音特征表示方法,不仅识别率不高,而且显示的信息过于专业化,不宜为聋哑人理解接受。
     本文着眼于研究语音生成和感知的机理,特别是语音生成和感知在大脑中的信息传递和处理方式,利用现有技术(小波变换、听觉模型、神经元网络和流行学习方法等)在语音分析方面的优势,提出一种语音在大脑感知系统中的参数描述,并以图形形式进行显示的一种新的语音识别方法。该方法与传统语音识别方法相比,原理易于理解,计算量小;同时又试图证实语音(至少是元音)的感知过程是一个简单的拓扑映射。最终形成的图形易于识别,只需要进行简单的训练,利用聋哑人大脑自身反馈和极强的视觉补偿功能,即可进行语音的辨识。本文的创新点如下:
     (1)详尽阐述了传统语音识别技术和辅助聋哑人语音训练技术的研究现状,并通过对语音生成和感知机理的系统研究,论证了将人类的语音信号转化为视觉信息的可行性和适用性;同时对现阶段在语音分析领域中使用的各种语音图谱及可视化方法进行了较为深入的研究和探讨,分析了这些方法各自的原理、应用范围、优点和不足;最后在简要阐述传统手工语音信号的特征提取方法(包括LPCC、MFCC和PLP等等)的基础上,基于神经元网络和流行学习方法的基本原理,提出了语音信号自动特征提取的概念和方法。
     (2)提出了一种新的语音信号可视化方法,该方法利用基于小波理论(WT)的多分辨率思想,建立听觉模型滤波器组来对听觉系统进行模拟,克服了传统语音分析方法(STFT)对高、低频段具有相同的时间分辨率和频率分辨率的缺点,这种特性十分接近人耳对声音信号的感知。对经过小波变换滤波后的语音信号进行特征编码形成语音的组合特征,将该组合特征作为一个新的特征量来表示和反映语音的特征规律;并将这种特征用简单的图形表示出来,利用聋哑人自身的大脑来识别语音,在一定程度上实现了语音变图像的设想。
     (3)创建并描述了一种基于时间自组织映射网络(TSOM)的语音可读模式。在自组织映射网络(SOM)基础上,引进了时间增强机制来提高系统性能。该方法弥补了原自组织映射网络固定的空间拓扑结构和忽视了时间因素(对于语音信号至关重要)的缺陷。时间自组织映射网络(TSOM)方法对随时间变化的语音谱的可视化尤其有效,连续短时谱形成了二维映射平面上的一条轨迹并且随时间的变化可以观测到语音信号的动态变化规律。
     (4)提出了一种基于时间线性嵌入(TLE)的语音信号可视化方法。局部线性嵌入方法(LLE)是一种进行特征提取的无人监督的学习算法,特征提取的目的就是在降低语音信号特征维数的同时保留语音信号的大部分关键信息。如果语音变量可以由一小部分连续特征来描述的话,我们可以把语音数据看作是嵌入在所有可能波形的高维空间中的低维流形。本文将流形学习算法运用在语音数据处理中,详细分析并讨论了局部线性嵌入(LLE)的基本算法和局限性;在此基础上提出了基于时间线性嵌入(TLE)的改进算法,尽可能从高维的语音信号中提取出有用的低维结构。该算法在低维空间中分离元音的能力得到了评价并与经典的线性降维方法(PCA)进行了比较;结果表明流形学习算法在低维空间优于经典方法并能发现语音数据有用的流形结构。
     (5)提出了一种基于听觉模型的语音信号可视化方法,该方法利用Gammotone听觉滤波器组和Meddis内毛细胞发放模型来获取表征听觉神经活动特性的听觉相关图;并将听觉相关图中每个频带的频率分量幅值进行特征编码作为表征当前频带特性的特征向量。与传统语音信号处理方法(如语谱图)相比,该方法能反映出更多的语音频率特性。
Information transfer by voice is the most convenient and natural communication mean between people. Some deaf-mute cannot talk because their aural organ is damaged and cannot collect speech information to brain, but their pronunciation organ is intact. In this condition, the deaf-mute can communicate with the normal person if they accept some special train through some vision train system after a moment.
     The visual assistant speaking training system in order to help deaf-mute study speech has been widely researched by the inside and outside the country since the middle of 1960s. But the majority of system adopts single voice character to show image. These methods are not only very low identification rate but also make the deaf-mute difficultly accepted because of too professional.
     Based on the principle of speech production and perception, especially the information transform method in the human brain of speech production and perception, making use of the advantage of present technology in speech signal processing, including Wavelet Transform, Auditory Model, Artificial Neural Network and Manifold Learning, the paper brought forward a parameter description in human brain perception system of speech and a novel speech recognition method which was displayed with image mode. Compared with traditional method, the principle of novel method was easily understood and computed simply. At the same time, the paper attempted approve that the perception course of speech (at least vowel) was a simple topology map. The ultimate figure was easily distinguished and the deaf could recognize speech only by some simple training, making use of their vision compensating function. The innovation of the paper was described as follow:
     (a) The paper discussed the research status of the traditional speech recognition technology and speech training technology assistant hearing impaired in detail, and demonstrated the feasibility and applicability of speech to image through system research of speech production and perception. The various speech spectrum patterns in nowadays were investigated deeply, and the principle of these methods, their advantage and disadvantage were given. At last, based on traditional speech feature extracting method, including LPCC, MFCC, and PLP etc., the paper put forward the automatic speech feature extracting concept and method according to the principle of Artificial Neural Network and Manifold Learning.
     (b) This paper described a new speech visualization method that created readable patterns by integrating combined feature into a single image. The system made use of time-frequency analysis based on wavelet transform to simulate the band-pass filter property of basilar membrane. The auditory feature was displayed on the CRT by plot patterns and the deaf could utilize their own brain to identify different speech for training their oral ability effectively.
     (c) This paper described a novel speech visualization method that creates a readable pattern based on temporal self-organizing map (TSOM). According to SOM, TSOM introduced a time enhanced mechanism to improve system performance. The method remedied the defect that SOM only provided spatial topographic map ignoring temporal factor which was extremely important for speech signal. The representations of consecutive short-time spectra formed a trajectory on the map and changes in time could be observed from the representations.
     (d) This paper described a novel speech visualization method that created a readable pattern based on tempral linear embedding (TLE). LLE was an Unsupervised learning algorithm for feature extraction. If the speech variability was described by a small number of continuous features, then we could imagine the data as lying on a low dimensional manifold in the high dimensional space of speech waveforms. The goal of feature extraction was to reduce the dimensionality of the speech signal while preserving the informative signatures. In this paper we have present results from the analysis and visualization of speech data using PCA and LLE. And we observed that the nonlinear embeddings of LLE separated certain phonemes better than the linear projections of PCA.
     (e) This paper described a novel speech visualization method that created a readable pattern based on Auditory Model. The metod made use of Gammotone auditory filter and Meddis inner hair cell model to obtain auditory correlogram which expressed auditory nerve active characteristic. Then the every frequency amplitude of auditory correlogram was coded as a feature vector expressing present frequency band characteristic. The auditory model extracted the critical information of the speech signal and presented more frequency information compared with conventional acoustic processing techniques (spectrogram etc.).
引文
1.简仁宗.语音处理--搭起人机界面的桥梁[J],科学发展,2003,1(361):20-27.
    2.Thomas F.Quatieri.Discrete-Time Speech Signal Processing[M],Publishing House of Electronics Industry,2004,30-48.
    3.马志欣,王宏,李鑫.语音识别技术综述[J],昌吉学院学报,2006,(3):93-97.
    4.王炳锡,屈丹,彭煊.实用语音识别基础[M],北京:国防工业出版社,1999,2-9.
    5.Dr Andrew Smith.The present status of hearing impairment in the world and protective strategies[J],Chinese Scientific Journal of Hearing and Speech Rehabilitation,2004,(6):8-9.
    6.王枫,旭君.听力障碍儿童与正常儿童视觉记忆能力比较研究[J],中国特殊教育,2001,(4):32-34.
    7.F.Alonso,A.Antonio,J.L.Fuertes,et al.Teaching Communication Skills to Hearing Impaired Children[J],IEEE Transactions on Multimedia,1995,2(4):55-67.
    8.M.J.Russell,R.W.Series,J.L.Wallace,et al.The STAR System:an Interactive Pronunciation Tutor for Young Children[J],Computer Speech and Language,2000,14:161-175.
    9.H.T.Bunnell,D.M.Yarrington,J.B.Polikoff.STAR:Articulation Training for Young Children[C],Proceedings of the Sixth International Conference on Spoken Language Processing(ICLSP 2000),Beijing:2000,4:85-88.
    10.M.L.Hsiao.A computer Based Software for Hearing Impaired Children's Speech Training and Learing between Teacher and Parents in Taiwan[C],Proceedings of the IEEE 23rd Annual International Conference,Istanbul:2001,2:1457-1459.
    11.A.S.Farani,E.Chilton.Auditory-base Dynamical Spectrogram[C],In IEEE UK Symposium on Applications of Time-Frequency and Time-Scale Methods(TFTS 1997),Coventry:University of Warwick,1997,27-29.
    12.A.M.Oster.Teaching Speech Skills to Deaf Children by Computer-Based Speech Training[C],Proceeding of 18th International Congress on Education of the Deaf,Tel Aviv,Israel:1995,67-75.
    13.H.Franco,L.Neumeyer,Y.Kim,et al.Automatic Pronunciation Scoring for Language Instruction[C],Proceedings of the International Conference on Acoustics,Speech,and Signal Processing(ICASSP 1997),Munich:1997,2:1471-1474.
    14.Klara Vicsi.A product-oriented teaching and training system for speech handicapped children[J],Journal of Microcomputer Applications,1995,18:287-297.
    15.刘华东,吴玺宏,迟惠生.面向聋儿的计算机言语训练方法及其实现[J],北京大学学报,2004,40(3):444-450.
    16.陈汝琛,姚佳,高忠华.基于语音识别技术的聋哑人视觉辅助语音训练系[J],中国生物医学工程学报,1996,15(4):360-364.
    17.Jianwu Dang,Masato Akagi,Kiyoshi Honda.Communication Between Speech Production and Perception Within the Brain-Observation and Simulation[J],J.Comput.Sci.& Technol.,2006,21(1):95-105.
    18.于水源.语音信号的识别特征[J],牡丹江医学院学报,1996,17(1):88-89.
    19.胡航.语音信号处理[M],哈尔滨:哈尔滨工业大学出版社,2000,4-8.
    20.易克初,田斌,付强.语音信号处理[M],北京:国防工业出版社,2000,36-39.
    21.罗安源.从共振峰看元音音区[J],民族语文,1995,(4):69-71.
    22.赵力.语音信号处理[M],北京:机械工业出版社,2003,10-15.
    23.吴宗济,林茂灿.实验语音学概要[M],北京:高等教育出版社,1989,112-115.
    24.王直中.人工耳蜗植入原理与实践[M],人民卫生出版社,2003,3-17.
    25.戴维萍,熊建文.听觉系统的定性分析及应用[J],海南师范学院学报(自然科学版),2005,18(1):27-31.
    26.韩纪庆,张磊,郑铁然.语音信号处理[M],北京:清华大学出版社,2004,2-10.
    27.蔡莲红,黄得智,蔡锐.现代语音技术基础与应用[M],北京:清华大学出版社,2003,23-24.
    28.J G Wilpon,L R Rabiner,T B Martin.An improved word-detection algorithm for telephone-quality speech incorporating both syntactic and semantic constraints[J],AT&T Tech Journal,1984,63(3):479-498.
    29.L F Lamel,L R Rabiner,A E Rosenberg,J G.Wilpon.An improved endpoint detector for isolated word recognition[J],IEEE Trans on Acoustics,Speech and Signal Processing,1981,(29):777-785.
    30.Rabiner,B H Juang.Fundamentals of Speech Recognition(影印版)[M],北京:清华大学出版社,1999,26-29.
    31.李祖鹏.一种语音段起止端点检测新方法[J],电讯技术,2000,16(3):68-70.
    32.S Seneff.Real time harmonic pitch detector[J],IEEE Speech and Signal Processing,1978,26(4):358-365.
    33.宋叔飚.神经网络在语音识别中的应用研究[D],西安:西北工业大学,2002.
    34.何强,何英.Matlab扩展编程[M],北京:清华大学出版社,2002,296-230.
    35.J Canny.A computational ap' proach to edge detection[J],IEEE Trans on Pattern Analysis and Machine Intelligence,1986,8(4):679-698.
    36.李桦等.短时能零积在语音端点检测中的应用[J],测试技术学报,1999,13(1):21-27.
    37.刘晓明,覃胜,刘宗行.语音端点检测的仿真研究[J],系统仿真学报,2005,17(8):1974-1976.
    38.Poppeea Corrneliu,Dumitreacu Bogdan.Efficient state-space approach for FIR filter bank completion[J],Signal Processing,2003,83(9):173-183.
    39.张雄伟,陈亮,杨吉斌.现代语音处理技术及应用[M],北京:机械工业出版社,2003:26-29.
    40.Veysoglu M E,shin R T K,Kong J A.A finite difference time domain analysis of wave scattering from periodic surface[J],Oblique Incidence Case Journal of Electromagnetic and Application,1997,7(4):1595-1607.
    41.张绪,孙金玮.用时频的方法分析非平稳信号[J],1995,15(5):34-41.
    42.于洪丽.不同人语音特征编码模式分析[D],沈阳:东北大学,2005.
    43.马义德,袁敏等.基于PCNN的语谱图特征提取在说话人识别中的应用[J],计算机工程与应用,2005,(20):81-84.
    44.张维强.小波分析及其在语音信号处理中的应用[D],西安:西安电子科技大学,2000.
    45.Cetin A E.Signal recovery from wavelet transform maxima[J],IEEE Trans.on SP,1994,42(1):194-196.
    46.Shubha Kadambe,G.Faye.Application of wavelet transform for pitch detection of speech signal[J],IEEE Trans.on IT,1992,38(2):917-925.
    47.卜凡亮等.基于小波变换的聋儿语言康复训练系统[J],生物医学工程学杂志,2001,18(3):438-440.
    48.杨福生.小波变换的工程分析与应用[M],北京:科学出版社,1999,34-42.
    49.语音识别热中的冷思考www.people.com.cn
    50.CHI Huisheng,WU Xihong.Roles of Computational Auditory Model in Automatic Speech Recognition[J],PROGRESS IN NATURAL SCIENCE,2000,12(10):889-890.
    51.Xiaowei Yang,Kuansan Wang,Shihab A.shamma.Auditory Representations of Acoustic Signals[J],IEEE Transactions on Information Theory,1992,38(2):824-839.
    52.Shihab Shamma.On the role of space and time in auditory precessing[J],TRENDS in Cognitive Sciences,2001,5(8):340-348.
    53.Shihab Shamma.Physiological foundations of temporal integration in the perception of speech[J],Journal of Phonetics,2003,31:495-501.
    54.Taishih Chi,powen Ru,Shihab A.Shamma.Multiresolution Spectrotemporal Analysis of Complex Sounds[D],Neural Systems Laboratory University of Maryland College Park,2001.
    55.Powen Ru.Multiscal Multirate spectro-Temporal Auditory Model[D],Neural Systems Laboratory University of Maryland College Park,2001.
    56.Morgan N,Bourlard H A.Neural networks for statistical recognition of continuous speech[J],Proceedings the IEEE,1985,83(5):742-746.
    57.Chen S H,Chen W Y.Generalized minimal distortion segmentation for ANN-based speech recognition[J],IEEE Trans.on SAP-3,1995,23(2):141-143.
    58.杨华民,李平,姜会林,龚越,杨勇.神经网络语音识别技术应用研究[J],长春光学精密机械学院学报,1997,20(1):10-15.
    59.王旭,王宏,王文辉.人工神经原网络的原理与应用[M],沈阳:东北大学出版社,2000,119-121
    60.周志杰.语音信号非线性特征的数值表示[J],解放军理工大学学报(自然科学版),2002,3(1):27-30.
    61.韦岗,陆以勤,欧阳景正.混沌、分形理论与语音信号处理[J],电子学报,1996,24(1):34-39.
    62.董远,胡光锐.分形理论及其应用[J],数据采集与处理,1997,12(3):187-191.
    63.方勇,朱江,何其超.汉语语音信号的分形特征[J],四川大学学报(自然科学版),1998,35(3):414-419.
    64.高岚,卢凌.混沌信号非线性特性的研究[J],武汉理工大学学报,2002,26(5):596-599.
    65.Hayin S,Li X.Detection of signals in chaos[J],Proc of IEEE,1995,83(1):95-122.
    66.Erik L.J.,Bohez,T.R.Senevirathne.Speech recognition using fractals[J],Pattern Recognition,2001,34:2227-2243.
    67.于水源,苏在滨.语音分析中相空间的方法[J],黑龙江大学自然科学学报,1996,13(1):25-27.
    68.朱佳俊.一种基于发育思想的语音识别系统实现[D],复旦大学,2005.
    69.J.Weng,W.S.Hwang,Y.Zhang,C.Evans.Developmental Robots:Theory,Method and Experimental Results[C],Proceedings 2~(nd) International Symposium on Humanoid Robots,Tokyo,Japan,1999:57-64.
    70.Juyang Weng,Wey-Shiuan Hwang.An Incremental Learning Algorithm with Automatically Derived Discriminating Features[C],Proceedings Asian Conference on Computer Vision,Taipei,Taiwan,2000:426-431.
    71.Yilu Zhang,Juyang Weng.Developing Auditory Skills By The SAIL Robot[C],Proceedings of 2001 IEEE International Symposium on Computational Intelligence in Robotics and Automation,Banff,Alberta,Canada,2001:155-160.
    72.Yilu Zhang,Juyang Weng.Grounded Auditory Development by a Developmental Robot[C],Proceedings INNS/IEEE international Joint Conference on Neural networks,2001:1059-1064.
    73.Juyang Weng,Yilu Zhang,W ey-Shiuan Hwang.Candid Covariance-Free Incremental Principal Component Analysis[J],IEEE Transactions on Pattern Analysis and Machine Intelligence,2003,25(8):1034-1040.
    74.陶传会,杨道淳,王炜.听觉系统识别语音信号的模拟[J],数据采集与处理,1999,14(2):157-162.
    75.Ingrad Daubechies.Ten Lecture on Wavelets[M],CBM SNSF SIAM,1992,28-45.
    76.李弼程,罗建书编著.小波分析及其应用[M],北京:电子工业出版社,2003,9-24.
    77.刘涛,曾祥利,曾军主编,冼劲审.实用小波分析入门[M],北京:国防工业出版社,2006,40-59.
    78.李建平,唐远炎著.小波分析方法的应用[M],重庆:重庆大学出版社,1999,17-26.
    79.宋国乡,甘小冰.数值泛函及小波分析初步[M],郑州:河南科学技术出版社,1993,27-29.
    80.秦前清,杨宗凯.实用小波分析[M],西安:西安电子科技大学出版社,1998,38-42.
    81.刘明才.小波分析及其应用[M],北京:清华大学出版社,2005,7-13.
    82.Engin Avci,Zuhtu Hakan Akpolat.Speech recognition using a wavelet packet adaptive network based fuzzy inference system[J],Expert Systems with Applications,2006,31:495-503.
    83.Engin Avici.A new optimum feature extraction and classification method for speaker recognition:GWPNN[J],Expert Systems with Applications,2007,32:485-498.
    84.Bojan Kotnik,Zdravko Kacic.A noise robust feature extraction algorithm using joint wavelet packet subband decomposition and AR modeling of speech signals[J],Signal Processing,2007,87:1202-1223.
    85.孙颖,张雪英.基于高斯小波滤波器的语音识别特征提取方法[J],太原理工大学学报,2007,38(2):146-148.
    86.M.F.Dorman,EC.Loizou,D.Rainey.Speech Intelligibility as a Function of the Number of Channels of Stimulation for Signal Processors Using Sine-wave and Noise-band Outputs[J],The Journal of the Acoustical Society of America,1997,102(4):2403-2411.
    87.聂开宝,蓝宁,高上凯.用小波变换实现电子耳蜗CIS语音信号的处理[J],清华大学学报(自然科学版),1999,39(9):37-40.
    88.Luo Wanbo,Zhanglu,Li Zhishu,Luo Xiaofeng,Luo Xiaolan.The Time-Frenquency-Energy Representation of Speech Signal in Real-Time Recognition[J],Applied Acoustics,2001,3(20):32-33.
    89.Bekesy,G.V.On the elasticity of the cochlear partition[J],J.Acoust.Soc.Am.,1948,22:227-241.
    90.Johnstone,B.M.,Taylor,K.J.,Boryle,A.J.Mechanics of the guinea pig cochlea[J],J.Acoust.Soc.Am.,1970,47:504-509.
    91.Johnstone,B.M.,Boryle,A.J.Basilar membrane vibrations examined with the Mossbauer technique[J],Science,1987,158:390-391.
    92.Johnstone,B.M.,Yates,G.K.Basilar membrane tuning curves in the guinea pig[J],J.Acoust.Soc.Am.,1974,55:584-587.
    93.Rhide,W.S.,Robles,L.Evidence from Mossbauer experiments for nonlinear vibration in the cochlea[J],J.Acoust.Soc.Am.,1974,55:588-596.
    94.Lien,M.A mathematical model of the mechanics of the cochlea[D],Sever Institute of Technology,Washington University,St.Louis,Mo.,1973.
    95.Allen,J.B.Two-dimmensinal cochlea fluid model:New result[J],J.Acoust.Soc.Am.,1977,61:110-119.
    96.Lyon,R.F.A computational model of filtering,detecting and compression in the cochlea[C],Proc.ICASSP'82,1982:1282-1285.
    97.Lyon,R.F.A computational model of binaural localization and separation[C],Proc.ICASSP'83,1983:1148-1151.
    98.Lyon,R.F.Computational model of neural auditory processing[C],Proc.ICASSP'84,1984:36.1.1-36.1.4.
    99.Allen,J.B.Cochlear modeling[C],IEEE ASSP Magazine,1985,(1):3-28.
    100.Davis,H.Biophysics physiology of the inner ear[J],Physiol.Rev.,1957,37:1-49.
    101.Davis,H.An active process in cochlear mechanics[J],Hearing Research,1983,9:79-90.
    102.Steele,C.R.,Taber,L.A.Three-dimensional model calculations for the guinea pig cochlea[J],J.Acoust.Soc.Am.,1981,89:1107-1111.
    103.Miguel Tena Anies.Robust Speech Recognition with an Auditory Model[D],Master Degree Project in Speech Technology.
    104.R.Meddis.Implenentation Details of a Computer Model of the Inner Hair-cell/Auditory-nerve Synapse[J],J.Acoust.Soc.Am.87,1990:1813-1818.
    105.卢绪刚,陈道文.听觉计算模型在鲁棒性语音识别中的应用[J],声学学报,2000,25(6):492-498.
    106.张炎,闵丽娟,黄志同.基于听觉模型的语音特征提取[J],数据采集与处理,2000,15(3):307-311.
    107.张炎,张杰,黄志同.基于一种听觉模型的特征提取及语音识别[J],南京理工大学学报,1998,22(2):113-116.
    108.焦志平,张雪英,赵妹彦.一种基于听觉模型的抗噪语音识别特征提取方法[J],数据采集与处理,1999,14(2):157-162.
    109.P.Cosi.Auditory Modeling for Speech Analysis and Recognition[M],M.Cooke,S.Beet,M.Crawford(Eds):Visual representation of speech signals,Wiley & Sons Chichester,1993:205-212.
    110.O.Ghitza.Auditory nerve representation as a basis for speech processing[M],S.Furui and M.Sondhi,editors,Advances in Speech Signal Processing,NewYork,1992:453-485.
    111.Douglas O'Shaughnessy.Speech Communication:Human and Machine[J],IEEE Press,2001.
    112.Jont B.Allen.How Do Humans Process and Recognize Speech[J],IEEE Transactions on Speech and Audio Processing ASP,1994,2(4):567-577.
    113.Hesham Tolba,Douglas O'Shaughnessy.Speech Recognition by Intelligent Machines[J],IEEE Canadian Review,2001.
    114.Douglas Blank,Deepak Kumar Lisa Meeden.A Developmental Approach to Intelligence[C],Proceedings the Thirteenth Annual Midwest Artificial Intelligence and Cognitive Science Society Conference,2002.
    115.Heikkonen J.,Koikkalainen P..Self-Organization and Autonomous Robots[M],Neural Systems for Robotics,Academic Press,1997:297-337.
    116.T.Kohonen.Self-organizing maps[M],information science,Springer,Berlin,3 edition,2001.
    117.Elman J.L.Finding structure in time[J],Cognitive Science,1990,14:179-211.
    118.J.Vesanto,E.Alhonieni.Clustering of the selforganizing map[J],IEEE Trans on Neural Networks,2000,11(3):586-600.
    119.Fredric M.Ham,Ivica Kostanic.Principles of Neurocomputing for Science &Engineering[M],Mc-Graw-Hill Companies,Inc,2001,169-226.
    120.雷红.基于自组织神经网络的地理空间信息的可视化表示[J],科技情报开发与经济,2007,17(10):148-149.
    121.尹峻松,胡德文,陈爽,周宗潭.DSOM:一种基于NO失控动态扩散机理的新型自组织模型[J],中国科学,2004,34(10):1094-1109.
    122.L.Leinonen,J.Kangas,K.Torkkola,A.Juvas.Dysphonia detected by pattern recognition of spectral composition[J],Journal of Speech and Hearing Research,1992,35:287-295.
    123.R.Mujunen,L.Leinonen,J.Kangas,K.Torkkola.Acoustic pattern recognition of /s/misarticulation by the self-organizing map[J],Folia phoniatrica,1993,45:135-144.
    124.L.Leinonen,T.Hiltunen,K.Torkkola J.Kangas.self-organized acoustic feature map in detection of fricative-vowel coarticulation[J],Journal of the Acoustic Society of America,1993,93(6):3468-3474.
    125.L.Leinonen,R.Mujunen,J.Kangas,K.Torkkola.Acoustic pattern recognition of fricative-vowel coarticulation by the self-organizing map[J],Folia Phoniatrica,1993,45:173-181.
    126.D.Beymer,T.poggio.Image representation for visual learning[J],Science,1996,272:1905.
    127.C.M.Bishop.Neural Networks for Pattern Recognition[M],Oxford University Press,Oxford,1996.
    128.H.S.Seung,D.D.Lee.The manifold ways of perception[J],Science,2000,290:2268-2269.
    129.R.Togneri,M.Alder,J.Attikiouzel.Dimension and structure of the speech space[J],IEEE Proceedings-I,1992,139(2):123-127.
    130.A.Jansen,P.Niyogi.A geometric perspective on speech sounds[D],Tech.Rep.,University of Chicago,2005.
    131.I.T.Jolliffe.Principal Component Analysis[D],Springer Series in Statistics,Springer-Verlag,New York,1986.
    132.S.T.Roweis,L.K.Saul.Nonlinear dimensionality reduction by locally linear embedding[J],Science,2000,290(5500):2323-2326.
    133. J.B. Tenenbaum, V.de Silva, J. C. Langford. A global geometric framework for nonlinear dimensionality reduction[J], Science, 2000, 290: 2319-2323.
    
    134. M. Belkin, P.Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering[J], Advances in Neural Information Processing, 2002,14: 585-591.
    
    135. Viren Jain, Lawrence K. Saul. Exploratory analysis and visualization of speech and music by locally linear embedding[C], ICASSP, 2004, 3: 984-987.
    
    136.Rajesh M. Hegde, Hema A. Murthy. Cluster and intrinsic dimensionality analysis of the modified group delay feature for speaker classification[J], Lecture Notes in Computer Science, 2004, 3316: 1172-1178.
    
    137. M. Belkin, P.Niyogi. Semi-supervised learning on riemannian manifolds[J], Machine Learning, 2004, 56(1-3):209-239.
    
    138. Andrew Errity, John Mckenna. An Investigation of Manifold Learning for Speech Analysis[C], Proc. of the Int. on Spoken language Processing, 2006.
    
    139. L.K. Saul, S.T. Roweis. Think globally, fit locally: unsupervised learning of low dimensional manifolds[J], Journal of Machine Learning Research, 2003, 4: 119-155.
    
    140. T. Cox, M. Cox. Multidimensional Scaling[M], Chapman & Hall, London, 1994.
    
    141. M. Brand. Charting a manifold[D], Advances in Neural Information Processing Systems 15, S. Becker, S. Thrun, and K. Obermayer, Eds., Cambridge, MA, 2003.
    
    142.Hinton G., Roweis S. Stochastic neighbor embedding[J], Neural Information Proceeding Systems :Natural and Synthetic, Vancouver, Canada, 2002.
    
    143. Trevor Hastie, Robert Tibshirani, Jerome Friedman. The Element of Statistical Learning: Data Mining, Inference, and Prediction[M], Springer, 2001.
    
    144. Haw-Minn Lu, Yeshaiahu Fainman, Robert Hecht-Nieslen. Image Manifolds[C], Proc. SPIE, 1998,3307: 52-63.
    
    145. H. S. Seung, D. L. Daniel. The Manifold Ways of Perception[J], Science, 2000, 12: 2268-2269.
    
    146. A. Hyv_arinen. Survey on independent component analysis [J], Neural Computing Surveys, 1999,2:94-128.
    
    147. M. Turk, A. Pentland, Eigenfaces for Recognition[J], Journal of Cognitive Neuroscience, 1991,3(1): 71-86.
    
    148. Rafael C. Gonzalez, Richard E. Woods. Digital Image Processing (Second Edition)[M], 电子工业出版社, 2003.
    
    149.周志华,曹存根. 神经网络及其应用[M],清华大学出版社. 2004, 172-207.
    150.Junping Zhang,Stan Z.Li,Jue Wang.Manifold Leaming and Applications in Recognition[M],Intelligent Multimedia Processing with Soft Computing,Springer-Verlag,Heidelberg,2004.
    151.Junping Zhang,Li He,Zhi-Hua Zhou.Analyzing Magnification Factors and Principal Spead Directions in Manifold Learning[C],Proceedings of the 9th Online World Conference on Soft Computing in Industrial Applications(WSC9),2004.
    152.肖健.局部线性嵌入的流形学习算法研究与应用[D],国防科技大学,2005.
    153.丁峥.流形学习算法在模式识别中的应用研究[D],天津大学,2006.
    54.Kouropteva O,Okun O,Pietikainen M.Supervised locally linear embedding algorithm for pattern recognition[J],Pattern Recognition and Image Analysis,IbPRIA2003,LNCS2652,2003,386-394.
    155.Lawrence K.,Roweis S.An introduction to Locally Linear Embedding[J],Technical Report,Gatsby Computational Neuroscience Unit,UCL,2001.
    156.Ridder D de,Kouropteva O,Okun O.Supervised locally linear embedding[J],In Proc.ICANN/ICONIP 2003,LNCS 2714,Springer-Verlag,2003:333-341.
    157.Ridder D de,Loog M.,Reinders M.J.Local fisher embedding[J],17th ICPR,Cambridge,UK,2004.
    158.Sachi Nandan Mukherjee.Locally Linear Embedding For Speech Recognition[D],Churchill College University of Cambridge,2002.