语音信号鲁棒特征提取及可视化技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
语音是语言的声学表现,是人类交流信息最自然、最有效、最方便的手段,也是人类思维的一种依托。而对听力障碍者来说,语言交流变成一件很难实现的事情。一部分聋哑人不能说话是因为他们的听觉器官遭到破坏,不能将语音信息采集到大脑,但发音器官是完好的。这种情况下的聋哑人,如果辅助于一些视觉训练系统,经过一段时间的专门训练,是可以学会说话并和健全人进行交流的。这样为残障者进行听力无损补偿的语音可视化技术便应运而生。本课题便立足于这一研究构想,通过提取语音信号的特征参数,将其与图像进行映射,产生具有声音意义的图像,供听力障碍者学习并认知,辅助听力障碍者听到声音。而语音信号特征提取是关系到语音识别和可视化系统性能的一个重要指标,目前提取的语音特征参数在安静的环境下具有很好的鲁棒性,但是这些参数一旦应用于噪声环境时,其性能会急剧下降。所以本文主要针对低信噪比环境下特征参数的提取及这些特征参数在语音可视化中的应用进行了深入的研究。
     本文的主要研究内容和创新点有以下几个方面:
     (1)为了提高低信噪比下语音端点检测的准确率,提出了一种端点检测算法。其核心技术是利用短时能零积与鉴别信息的互补优势,首先利用短时能零积的方法进行判决,当遇到噪声帧与语音帧的转折帧时,利用基于子带能量鉴别信息的方法来进行复检,从而避免了因噪声幅度急剧变化而导致的误检。并提出了一种动态更新噪声能量门限的方法,从而能更准确地跟踪噪声能量的变化。仿真实验结果表明,提出的方法在信噪比变化比较剧烈的情况下仍能准确快速地检测出语音的起止点,对语音信号的后续研究起到了很好的铺垫作用。
     (2)由于小波神经网络的学习效果对网络隐层节点数、初始权值(包括阂值)、伸缩和平移因子以及学习率和动量因子的依赖性较大,致使其全局搜索能力弱,易陷入局部极小,收敛速度减慢,甚至不收敛。而遗传算法具有的高度并行、随机、自适应搜索性能,使它在处理用传统搜索方法解决不了的复杂和非线性问题时,具有明显的优势。因此,我们考虑把遗传算法和神经网络相结合,采用遗传算法选取初值进行训练,用小波神经网络完成给定精度的学习。仿真实验结果表明,该模型有效地提高了语音的识别率,并缩短了识别时间,实现了效率与时间的双赢,为算法的实用性奠定了基础。
     (3)以改善噪音环境下语音识别和语音可视化系统的鲁棒性为着眼点,把多信号分类法(MUSIC)的谱估计技术引入到特征参数的提取中,并与语音信号的感知特性相结合提出了一种新的语音特征参数PMUSIC-MFCC,同基线参数MFCC相比不但提高了稳健性而且还提高了计算效率。
     (4)动态特性是语音多样性的一部分,它不同于平稳的随机过程,它具有时间相关性,揭示了语音信号前后以及相邻之间存在着的密切关联。由于差分参数和加速度参数并不能将动态信息挖掘得很充分,所以它们尚不能很好地反映语音信号的动态特性。而调制谱具有时频集聚性,它不仅可以充分地反映语音之间的动态特性而且对语音环境的敏感度较低。所以根据干扰信号与语音信号在调制信息中不同的反映,提取调制信息中有效的语音成分,然后与MFCC参数的提取方法类似来提取其倒谱特征。这样得到的特征参数鲁棒性更好。
     (5)由于人耳对不同的频率在相应的临界带宽内的信号会引起基底膜上不同位置的振动,而小波变换在各分析频段的恒Q(品质因数)特性与人耳听觉对信号的加工特点相一致,所以本文在对MFCC参数提取过程分析的基础上,结合小波包对频带的多层次划分,并根据人耳感知频带的特点,自适应地选择相应频带,提出了一种基于小波包变换的特征参数(WPTC)。经实验验证鲁棒性很好。
     (6)鉴于如何在大量的特征参数中选择出少数具有互补作用的特征参数,提出一种系统性的实用的特征参数优化方法—基于方差的正交实验设计法。首先进行因素(语音特征参数)和水平的选择,再根据数理统计与正交性原理,从大量的实验点中挑选适量的具有代表性的点构造正交表进行正交实验,最后通过计算对正交实验结果进行分析,找出最优的特征参数组合。并且与目前参数的简单组合方案相比较,新方法的误识率和响应时间均减少了很多。
     (7)基于聋哑人的视觉鉴别能力和对色彩刺激的视觉记忆能力较强的优点,提出了两种可视化方法,一种是基于局部线性嵌入(LLE)和模糊核聚类相结合的方法,先采用本文提出的改进的LLE对特征进行非线性降维,然后再利用模糊核聚类算法对其进行聚类分析,即利用Mercer核,将原始空间通过非线性映射到高维特征空间,在高维特征空间中对语音信号特征进行模糊核聚类分析。由于经过了核函数的映射,使原来没有显现的特征突现出来,从而能够更好地支持基于位置的语音可视化,经过试验验证具有很好的效果。另一种是基于位置和图案的语音信号可视化方法,通过集成不同的语音特征进入一副图像中为聋哑人创造了语音信号的可读模式。首先对语音信号进行一系列预处理,然后提取其特征,其中用三个共振峰特征来映射图像的主颜色信息,声调特征来映射图案信息,再把经过正交实验设计优选后的23个特征送入神经网络2映射出位置信息,最后合成出可视化图像。我们对该可视化系统进行了初步的测试,并与以前的语谱图方法进行了比较,测试结果表明该方法应用在聋哑人辅助学习方面,可以收到良好的效果,具有很好的鲁棒性。
Speech is acoustic representation for language. It is the most natural, effective and convenient method when communicating information between people, is a kind of relying for human thought. However, language communication becomes very difficult for hearing handicapped people. Some deaf-mute cannot talk because their aural organ is damaged and cannot collect speech information to brain, but their pronunciation organ is intact. In this condition, the deaf-mute can communicate with the normal person if they obtain some special training through vision training system for a period of time. Listening lossless compensation of speech visualization technology for deaf-mute is rising. This paper stands on the conception of research by means of extracting speech features, and then mapping the image with voice meanings to assist deaf-mute learning and hearing. While speech signal feature extracting relates to speech recognition and visualization systems performance, although present speech features have very good robustness under quiet environment, their performance will decrease sharply under noisy environmental conditions. So the purpose of this paper is mainly to extract robust speech feature under noisy environmental conditions and deeply study in visualization.
     The main contents and innovations of this dissertation include:
     (1) This paper proposed a novel speech endpoint detection algorithm aiming to improve the accuracy in low signal-to-noise ratio (SNR) conditions. Core technology was the complementary advantage between the Short-time energy-zero-product and discrimination information, which used Short-time energy-zero-product algorithm to make judgment firstly, and then used discrimination information based on the sub-band energy distribution probabilities algorithm to recheck when met with the transition frame for noise frame and speech frame, so as to avoid error-detected owing to the sharp change of noise amplitude. Moreover, we proposed a novel dynamically update the noise energy threshold algorithm, which has high accuracy when traces the changes for noise energy. The simulation results show that the new method gives a precise and rapid endpoint detection in the case of the seriously changed noise environment, and it plays an important role in the latter speech research.
     (2) Due to the fact that the learning effects of wavelet neural network strongly depend on the number of hidden nodes, the initial weights(including thresholds), the scale and displacement factors, the learning rate and momentum factor, which leads to weak global search capability, easily falling into local minimum values, low convergence rate, and even not convergent. While Genetic Algorithm (GA) has height parallel performance, random and adaptive search performance, and it has obvious advantages in solving complex and nonlinear problem. Therefore, we can combine neural network and genetic algorithm by using GA to select initial value, and use wavelet neural network to finish the learning. The simulation results show that the new model effectively improves speech recognition rate, shortens the recognition time, realizes double wins in efficient and time, establishes the foundation for practicality of the algorithm.
     (3) This paper proposed a novel feature extraction algorithm aiming to improve speech recognition and visualization systems robustness in noisy environmental conditions. Core technology was the Multiple Signal Classification (MUSIC), which estimated MUSIC spectrum from the speech signal and incorporated perceptual information directly into the spectrum estimation, this provided improved robustness and computational efficiency when compared with the previously proposed Mel Frequency Cepstral Coefficient (MFCC) technique.
     (4) Dynamic characteristics is a part of speech diversity, which is different from stationary random process, and has the temporal Correlation, reveals the intimate connections with speech signal pre and post and adjacent. Because the difference parameters and acceleration parameters can not adequately excavate speech dynamic characteristics, whereas modulation spectrum has time frequency agglomeration performance, not only adequately reflects speech dynamic characteristics, but also has lower sensitivity for speech environment. So according to different reflect in modulation spectrum for interference signal and speech signal to extract the effective components for modulation spectrum, then the cepstrum coefficients were extracted as the feature parameter. The simulation results show that the new method has very good robustness.
     (5) Different frequency within corresponding critical bandwidth signal for human ear cause basement membrane vibration in different location, while the constant Q characteristics of each analysis frequency for wavelet transform and signal processing characteristics for human auditory are consistent, so this paper combined with the frequency band multi-level division with wavelet packet transform, and according to the characteristics for human ear perception frequency band, adaptively selected relative frequency band, proposed a new feature extracting algorithm based on wavelet packet transform, the simulation experimental results show that the new method has very good robustness.
     (6) In view of how to select complementary speech parameters in plenty of feature parameters, a systematic and practical method of the parameters selection based on the variance orthogonal test design is proposed. Firstly, chose factors (speech parameters) and levels. And then according to the principle of mathematical statistics and orthogonality, picked out proper and representative points from massive experimental points to construct orthogonal table. Finally, calculated and analyzed the experimental results, and the optimal set of process parameter values was discovered. Moreover, the word error rate and response time are reduced when compared with that of the traditional parameter selection method.
     (7) In view of the stronger superiority of deaf-mute in visual identification ability and visual memory ability for color, two kinds of new speech visualization methods were proposed. One was the method combining LLE (Locally Linear Embedding) with fuzzy kernel clustering algorithm, where the improved LLE could reduce the nonlinear dimensionality of the speech features and then the fuzzy kernel clustering algorithm was used for clustering analysis. That is to say, the Mercer kernel function was used to change the data in original space into a high-dimensional eigenspace through nonlinear mapping, and then the fuzzy clustering analysis was made in the high-dimensional eigen-space. Therefore, after the kernel function mapping, the original inherent features of speech were highlighted to improve the discriminations of the different speech. The results of simulation experiments show the feasibility and effectiveness of the method. Another method was based on position and pattern algorithm, it created readable patterns by integrating different speech features into a single picture. first, series preprocessing of speech signals were done, then extracted features. We used three formant features to map principal color information, used intonation features to map pattern information, and then 23 features selected by orthogonal test design used as the inputs of neural network 2. Finally, the outputs of neural network 2 mapped the position information. We evaluated the visualized speech in a preliminary test and contrasted with spectrogram, the test result shows that the visualization approach is an effective method to assist deaf-mute learning and has very good robustness.
引文
1.杨行峻,迟惠生.语音信号数字处理[M].北京:电子工业出版社,1995.
    2.张雄伟,陈亮,杨吉斌.现代语音处理技术及应用[M].北京:机械工业出版社,2003.
    3.韩纪庆,张磊,郑铁然.语音信号处理[M].北京:清华大学出版社,2004.
    4.赵力.语音信号处理[M].北京:机械工业出版社,2003.
    5.刘加.汉语大词汇量连续语音识别系统研究进展[J].电子学报,2000,28(1):85-91.
    6. Smith A. The present status of hearing impairment in the world and protective strategies[J]. Chinese Scientific Journal of Hearing and Speech Rehabilitation, 2004, (6):8-9.
    7.王枫,胡旭君.听力障碍儿童与正常儿童视觉记忆能力比较研究[J].中国特殊教育,2001,(4):32-34.
    8. Alonso F, Antonio A, Fuertes J L, et al. Teaching communication skills to hearing impaired children[J]. IEEE Transaction on Multimedia,1995,2(4): 55-67.
    9. Bunnell H T, Yamington D M, Polikoff J B. STAR:Articulation training for Young Children[A]. Proceedings of the Sixth International Conference on Spoken Language Processing[C].2000,4:85-88.
    10. Hsiao M L, Li P T, Lin P Y, et al. A computer based software for hearing impaired children's speech training and learning between teacher and parents in taiwan[A]. Proceedings of the IEEE 23rd Engineering in Medicine and Biology Society[C].2001,2:1457-1459.
    11. Farani A S, Chilton. Auditory-based dynamical spectrogram[A]. IEEE UK Symposium on Applications of Time-Frequency and Time-Scale Methods[C]. 1997,27-29.
    12. Franco H, Neumeyer L, Yoon K, et al. Automatic Pronunciation Scoring for Language Instructior[A]. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing[C].1997,2:1471-1474.
    13.陈汝琛,姚佳,高忠华.基于语音识别技术的聋哑人视觉辅助语音训练系统[J].中国生物医学工程学报,1996,15(4):360-364.
    14.赵瑞珍,宋国乡.基音检测的小波快速算法[J].电子科技,1998,43(1):16-19.
    15. Hong K K, Rose R C. Cepstrum-domain model combination based on decomposition of speech and noise for noisy speech recognition[J]. IEEE International Conference on Acoustics, Speech, and Signal Processing,2002,1: 209-212.
    16.韦岗.抗噪声语音识别技术的研究[D].华南理工大学,2003.
    17.方棣棠,李树青.汉语语音输入的研究现状与发展前景[A].第六届全国现代语音学学术会议论文集[C].2001,219-222.
    18. Fant G. Acoustic theory of speech production, with calculations based on X-ray studies of Russian articulations[J]. Mouton, The Hague, Paris,1970.
    19.蔡莲红,黄德智,蔡锐.现代语音技术基础与应用[M].北京:清华大学出版社,2003.
    20.刘鹏,王作英.多模式语音端点检测[J].清华大学学报,2005,45(7):896-899.
    21. Huang L S, Yang C H. A novel approach to robust speech endpoint detection in car environments [A]. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing[C].2000,3:1751-1754.
    22. Estevez P A, Becerra-Yoma N, Boric N, et al. Genetic programming-based voice activity detection[J]. Electronics Letters,2005,41(20):1141-1143.
    23. Tian Y, Wu J, Wang Z Y, et al. Fuzzy clustering and Bayesian information criterion based threshold estimation for robust voice activity detection[A]. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing[C].2003,1(1):444-447.
    24.姚文冰.基于高阶累积量的抗噪语音识别[D].华中科技大学,2001.
    25.陈尚勤,罗承烈,杨雪.近代语音识别[M].成都:电子科技大学出版社,1991.
    26. Rabiner L R. A tutorial on Hidden Markov Models and selected applications in speech recognition[A]. Proceedings of the IEEE[C].1989,77(2):257-286.
    27. Kamm C, Walker M, Rabiner L. The role of speech processing in human-computer intelligent communication[J]. Speech Communication,1997, 23(4):263-278.
    28.李桦,安钢,樊新海.短时能频值在语音端点检测中的应用[J].测试技术学报,1999,13(1):21-27.
    29. Junqua J C, Mak B, Reaves B. A robust algorithm forward boundary detection in the presence of noise[J]. IEEE Transactions on Speech and Audio Processing, 1994,2(3):406-421.
    30. Beritelli F, Casale S, Ruggeri G, et al. Performances evaluation and comparision of G.729/AMR/fuzzy voice activity detectors[J]. IEEE Signal Processing Letters,2002,9(3):85-88.
    31.王月,屈百达,李金宝等.一种改进的基于频带方差的端点检测方法[A].2007中国控制与决策学术年会论文集[C].2007,301-303.
    32.李祖鹏,姚佩阳.一种语音段起止端点检测新方法[J].电讯技术,2000,3(1):68-71.
    33.朱杰,韦晓东.噪声环境中基于HMM模型的语音端点检测方法[J].上海交通大学学报,1998,22(10):14-16.
    34.扬崇林,李雪耀,孙羽.强噪声背景下汉语语音端点检测和音节分割[J].哈尔滨工程大学学报,1997,18(5):28-32.
    35.李雪耀,林娟,扬崇林.舰船指挥舱室强噪声环境下语音识别[J].船舶工程,1999,(2):50-53.
    36.陈裴利,朱杰.一种新的基于自相关相似距离的语音信号端点检测方法[J].上海交通大学学报,1999,33(9):1097-1099.
    37. Nemer E, Goubran R, Mahmoud S. Robust voice activity detection using higher-order statistics in the LPC residual domain[J]. IEEE Transactions on Speech and Audio Processing,2002,9(3):217-231.
    38.李四根,和应民.一种基于信息熵的语音端点检测方法[J].应用科技,2001,28(3):13-14.
    39. Abdallah I, Montresor S, Baudry M. Robust speech/non-speech detection in adverse conditions using an entropy based estimator [A]. International Conference on Digital Signal Processing Santorini[C].1997:757-760.
    40.朱雪龙.应用信息论基础[M].北京:清华大学出版社,2002.
    41.刘晓明,覃胜,刘宗行等.语音端点检测的仿真研究[J].系统仿真学报,2005,17(8):1974-1976.
    42.李晔,崔慧娟,唐昆.基于能量和鉴别信息的语音端点检测算法[J].清华大学学报,2006,46(7):1271-1277.
    43. Varga A, Msteeneken H J. Assessment for automatic speech recognition: II.NOISE-92:a database an experiment to study the effect of additive noise on speech recognition systems[J]. Speech Commun,1993,12:247-251.
    44.王旭,王宏,.王文辉.人工神经元网络原理与应用[M].沈阳:东北大学出版社,2000.
    45.周开利,康耀红.神经网络模型及其Matlab仿真程序设计[M].北京:清华大学出版社,2005.
    46.关履泰.小波方法与应用[M].北京:高等教育出版社,2007.
    47.唐晓初.小波分析及其应用[M].重庆:重庆大学出版社,2006.
    48.刘春生,张晓春.实用小波分析[M].徐州:中国矿业大学出版社,2002.
    49.胡昌华,张军波,夏军.基于MATAB的系统分析与设计一小波分析[M].西安:西安电子科技大学出版社,1999.
    50.王经民.小波分析[M].陕西:西北农林科技大学出版社,2004.
    51. Rioul O, Vetterli M. Wavelets and signal processing[J]. IEEE Signal Processing, 1991,8(4):14-38.
    52. Ingrid D. Ten Lecture on Wavelets[J], CBMS-NSF Regional Conference Series in Applied Mathematics,1992.
    53.林遂芳,潘永湘,孙旭霞.基于HMM和小波网络模型的抗噪语音识别方法[J].系统仿真学报,2005,17(7):1720-1723.
    54. Zhang Q H, Benveniste A. Wavelet networks[J]. IEEE Transactions on Speech and Neural Networks,1992,3(6):889-898.
    55. Yoshihiro Y, Nikiforuk P N. A new supervised learning algorithm for multilayered and inter-connected neural networks[J]. IEEE Transactions on Speech and Neural Networks,2000,11(1):36-46.
    56. Zhang J, Walter G. Wavelet neural networks for function learning[J]. IEEE Transactions on Signal Processing,1995,43(6):1485-1497.
    57. Pan S T, Wu C H, Lai C C. The application of improved genetic algorithm on the training of neural network for speech recogntion[A]. The Second International Conference on Innovative Computing, Information and Control[C]. 2007,168-168.
    58. Slowik A, Bialko M. Training of artificial neural networks using differential evolution algorithm[A]. The International Conference on Human System Interactions[C].2008.60-65.
    59. 李影,徐涛,邢伟.基于进化遗传算法的神经网络优化[J].长春理工大学学报,2006.29(3):48-50.
    60. Chen M, Yao Z M. Classification techniques of neural networks using improved genetic algorithms[A]. The Second International Conference on Genetic and Evolutionary Computing[C].2008,115-119.
    61. Schmidt R O. Multiple emitter location and signal parameter estimation[J]. IEEE Transactions on Antennas and Propagation,1986,34(3):276-280.
    62. Wang Y Y, Chen J T, Fang W H. TST-MUSIC for joint DOA-delay estimation[J]. IEEE Transactions on Signal Processing,1992,49(4):721-729.
    63. Nagai T, Kondo K, Kaneko M, et al. Estimation of source location based on 2-D MUSIC and its application to speech recognition in cars[A]. The International Conference on Acoustics, Speech and Signal Processing[C].2001,: 5(5),3041-3044.
    64. Rieken D W, Fuhrmann D R. Generalizing MUSIC and MVDR for multiple noncoherent arrays[J]. IEEE Transactions on Signal Processing,2004,52(9): 2396-2406.
    65. Lin J D, Fang W H, Wang Y Y, et al. FSF MUSIC for joint DOA and frequency estimation and its performance analysis[J]. IEEE Transactions on Signal Processing,2006,54(12):4529-4542.
    66. Hmidat A M, Sharif B S, Woo W L. Robust multiple signal classification algorithm based on the myriad covariation matrix [A]. The International Conference on Vision, Image and Signal Processing[C].2006,:153(5): 569-573.
    67. Yoshifumi N, Toyota F, Masato A. Two-dimensional DOA estimation of sound sources based on weighted wiener gain exploiting two-directional microphones [J]. IEEE Transactions on Audio, Speech and Language Processing,2007,15(2): 416-429.
    68. Karhunen J T, Joutsensalo J. Sinusoidal frequency estimation by signal subspace approximation[J]. IEEE Transactions on Signal Processing,1992, 40(12):2961-2972.
    69. Kaushik M. Spectrum estimation, notch filters, and MUSIC [J]. IEEE Transactions on Signal Processing,2005,53(10):3727-3737.
    70. Chen X D, Agarwal K. MUSIC algorithm for two-dimensional inverse problems with special characteristic of cylinders[J]. IEEE Transactions on Antennas and Progagation,2008,56(6):1808-1818.
    71. Smith J O, Abel J S. Bark and ERB bilinear transforms[J]. IEEE Transactions on Speech Audio Process,1999.7(6):697-708.
    72. Yapanel U H, Hansen J H L. A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition [J]. Speech Communication,2008,50(2):142-152.
    73. Dharanipragada S, Yapanel U H, Rao B D. Robust feature extraction for continuous speech recognition using the MVDR spectrum estimation method[J]. IEEE Transactions on Audio, Speech and Language Processing,2007,15(1): 224-234.
    74. Greenberg S, Kingsbury B E D, Kingsbury Y. The modulation spectrogram in pursuit of an invariant representation of speech[A]. IEEE International Conference on Acoustics Speech and Signal Processing[C].1997,3: 1647-1650.
    75. Smith C L, Browman C P, McGowan R S, et al. Extracting dynamic parameters from speech movement data[J]. J Acoust Soc Am,1993,93(3):1580-1588.
    76.马听,杜利民,何成林.一种基于调制谱特征的带噪语音识别方法[M].计算机工程与应用,2005,(20):53-55.
    77. Arai T, Pavel M, Hermansky H, et al. Intelligibility of speech with filtered time trajectories of spectral envelopes[A]. IEEE International Conference on Spoken Language[C].1996,4(4):2490-2493.
    78. Kanedera N, Arai T, Hermansky H, et al. On the importance of various modulation frequencies for speech recognition[A]. The International Conference on EuroSpeech[C].1997,1079-1082.
    79. Hermansky H, Morgan N. RASTA processing of speech[J]. IEEE Transactions on Speech and Audio Processing,1994,2(4):578-589.
    80. Torres M, Humberto L, Rufiner H. Automatic speaker identification by means of mel cepstrum, wavelet and wavelets packets[A]. Proceedings of the 22nd Annual International Conference of the IEEE on Engineering in Medicine and Biology Society[C].2000,2:978-981.
    81. Karam J R, Phillips W J, Robertson W. New wavelet packet model for automatic speech recognition system[A]. Canndian Conference on Electrical and Computer Engeneering[C].2001,1:511-514.
    82.李战明,王贞.基于小波包分析特征参数的说话人识别系统[J].语音技术,2005,(6):46-49.
    83. Emmanouilidis C, Hunter A. Multiobjective evolutionary setting for feature selection and a commonality-based crossover operator [A]. Proceedings of the 2000 Congress on Evolutionary Computation[C].2000,309-316.
    84. Ho S Y, Lin H S, Liauh W H, et al. Orthogonal particles swarm optimization and its application to task assignment problems[J]. IEEE Transactions on Systems, Man and Cybernetics,2008,38(2):288-298.
    85. Liang X B. Orthogonal designs with maximal rates[J]. IEEE Transactions on Information,2003,49(10):2468-2503.
    86. Seberry J, Finlayson K, Adams S S, et al. Orthogonal designs with maximal rates[J]. IEEE Transactions on Signal Processing,2008,56(1):256-265.
    87.杨大利,徐明星,吴文虎.语音识别特征参数选择方法研究[M].计算机研究与发展,2003,40(7):963-969.
    88.现代应用数学手册编委会.现代应用数学手册—概率统计与随机过程卷[M].北京:清华大学出版社,2000.
    89.陈魁.实验设计与分析[M].北京:清华大学出版社,1996.
    90. Potter R K, Kopp G A, Green H C. Visible speech[M]. New York:Van Nostrand,1947.
    91. Stewart L, Larkin W, Houde R. A real time sound spectrograph with implications for speech training for the deaf[A]. IEEE International Conference on Acoustics, Speech and Signal Processing [C].1976,1:590-592.
    92. Kuhn G M. Description of a color spectrogram[J]. The Journal of the Acoustical Society of America,1984,76(3):682-685.
    93. Roweis S T, Saul L K. Nonlinear dimensionality reduction by locally embedding[J]. Science,2000,290(5500):2323-2326.
    94. Geng X, Zhan D C, Zho Z H. Supervised nonlinear dimensionality reduction for visualization and classification[J]. IEEE Transactions on Systems, Man and Cybernetics,2005,35(6):1098-1107.
    95.文贵华,江丽君,文军.基于邻域优化的局部线性嵌入[J].系统仿真学报,2007,19(13):3119-3121.
    96. Tran D, Wagner M, Van L T. A proposed decision rule for speaker recognition based on fuzzy C-Means clustering[A]. The 5th International Conference on Spoken Language Processing[C].1998,755-758.
    97. Girolami M. Mercer dernel based clustering in feature space[J]. IEEE Transactions on Neural Networks,2002,13(3):780-784.
    98.张莉,周传达,焦李成.核聚类算法[J].计算机学报,2002,25(6):587-590.
    99. Muller K R, Mika S, Ratsch G, et al. An introduction to kernel-based learning algorithms[J]. IEEE Transactions on Neural Networks,2001,12(3):181-202.
    100. Sch I B, Mika S, Burges C, et al. Input space versus feature space in kernel-based method[J]. IEEE Transactions on Neural Networks 1999,10(5): 1000-1017.
    101.林琳,王树勋,郭纲.短语音说话人识别新方法的研究[J].系统仿真学报,2007,19(13):2272-2275.
    102. Pan N H, Yu M S, Wu M J. A Mandarin intonation prediction model that can output real pitch patterns[A]. IEEE International Conference on Acoustics, Speech and Signal Processing [C].2003,1:496-499.
    103. Suphattharachai C, Takao K. Incorporation of phrase intonation to context clustering for average voice models in HMM-based Thai speech synthesis[A]. IEEE International Conference on Acoustics, Speech and Signal Processing[C]. 2008,4637-4640.
    104.黄泽镇,杨行峻.普通话孤立字四声的一种模式识别方法[J].声学学报,1990,15(1):36-43.
    105.张文耀,许刚,王裕国.循环AMDF及其语音基音周期估计算法[J].电子学报,2003,31(6):886-890.
    106. Atal B S, Rabiner L R. A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition[J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1976, ASSP-24(3):201-212.
    107. Christensen R L, Sreong W J, Palmer E P. A comparison of three methods of extracting resonance information from predictor coefficient coded speech[J]. IEEE Transactions on Acoustics. Speech and Signal Processing,1976,24(1): 8-14.
    108.张家騄.论语音技术的发展[J].声学学报,2004,29(3):193-199.
    109. Watanabe A. Formant estimation method using inverse-filter control[J]. IEEE Transactions on Audio Processing,2001,9(4):317-326.
    110. Rao P, Barman A D. Speech formant frequency estimation:evaluating a nonstationary analysis method[J]. Signal Processing,2000,80(8):1655-1667.
    111. Huang N E, Shen Z, Longs S R, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis[J]. Proceedings of the Royal Society,1998, London A(454):903-995.
    112. Flandrin P, Rilling G, Goncalves P. Empirical mode decomposition as a filter bank[J]. IEEE Signal Processing Letters,2004,11(2):112-114.
    113.黄海,陈祥献.基于Hilbert-Huang变换的语音信号共振峰频率估计[J].浙江大学学报,2006,40(11):1926-1930.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700