基于支持向量机的汉语语音端点检测和声韵分离

英文题名：Endpoint Detection and Initial/Final Segmentation of Chinese Speech Based on SVM
作者：蔡魁杰
论文级别：硕士
学科专业名称：应用数学
中文关键词：支持向量机 ; 端点检测 ; 声韵母分离
英文关键词：Support Vector Machine ; Endpoint Detection ; Initial/Final Segmentation
学位年度：2007
导师：沈继红
学科代码：070104
学位授予单位：哈尔滨工程大学
论文提交日期：2007-06-01

摘要

语音识别在工业、军事、商业、银行、服务、医疗、日常生活等很多领域有着广泛的应用前景。本文将支持向量机理论方法应用于语音识别中两个关键技术——端点检测和声韵母分离，取得了良好的效果。
     支持向量机(support vector machine，SVM)是数据挖掘中的一项新技术，是借助于最优化方法解决机器学习问题的新工具。它最初于20世纪90年代由Vapnik提出，近年来在其理论研究和算法实现方面都取得了突破性的进展，开始成为克服“维数灾难”和“过学习”等传统困难的有力手段。
     对于支持向量机的端点检测，本文提出了基于C-支持向量机的端点检测技术，解决了传统端点检测方法中需要人为设定阈值的繁琐和不准确性，并且可以用此方法提取语音识别研究中任意感兴趣的特征语音段。
     对于支持向量机的声韵母分离，传统声韵母分离方法一般需要人为预先设定阈值，这需要大量的试验分析和数据统计。本文提出了基于C-支持向量机的声韵母分离技术，不需预先人为设定阈值。
     本文研究了选用不同输入特征和不同惩罚参数情况下的支持向量机的分类能力，并且针对训练样本极其贫乏情况下的输出判断向量采取“侵蚀”的后期处理方法，提高了分类准确度。
The speech recognition has a great application prospect in many domains such as the industry, the military, the trade service, the bank service, the medical service, the daily life and so on. Using the support vector machine theory in solving two pivotal techniques in speech recognition, we get a fine effect in endpoint detection and initial/final segmentation.
     Support vector machine (SVM) is a new method in data mining, and is a new tool for solving machine learning problems in the virtue of optimization methods. It was first proposed by Vapnik in 1990s, and made great progresses in theory research and algorithms application recent years, and is becoming a powerful method to overcome traditional difficulties such as "dimension disaster" and "excessive learning".
     For endpoint detection based on support vector machine, a new method based on C-support vector machine is proposed in this paper to solve the problem of ado and inaccurateness brought by doors initialization in traditional endpoint detection methods, and this method can also be used to detect and pick out speech segment in which people have special interests.
     For initial/final segmentation based on support vector machine, traditional initial/final segmentation methods need doors to be initialized first and this needs a great many experimentations and data analysis. We propose a method of initial/final segmentation based on support vector machine which needs no doors to be initialized before segmentation.
     Researching on the correlation between the classification ability and input character and punishing parameter, and in the condition of extremely lacking of training data, we propose an "eroding method" to process the output of the support vector machine, and so improve the classification precision.

引文

[1] 邓乃扬，田英杰．数据挖掘中的新方法—支持向量机．北京：科学出版社，2004
    [2] V．Vapnik．统计学习理论的本质．北京：清华大学出版社，2000。
    [3] 陈尚勤，罗承烈，杨雪．近代语音识别．成都：电子科技大学出版社，1991
    [4] Reddy R D. Speech recognition by machine, a review. Proceedings of the IEEE, 1976, 64(4): 501-531
    [5] V. Vapnik. The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1999
    [6] V. Vapnik. Statistical Learning Theory. New York: John Wiley&Sons, 1998
    [7] Chih-Chung Chang, Chih-Jen Lin. Training v-Support Vector Regression: Theory and Algorithms. Neural Computation, 2002, 14(8): 1959～1977
    [8] 朱杰，韦晓东．噪声环境中基于HMM模型的语音信号端点检测方法．上海交通大学学报，1998(10)：14-16页
    [9] 赵力．语音信号处理．北京：北京机械工业出版社，2003
    [10] 赵高峰．基于小波分析的语音端点检测算法研究：(硕士学位论文)．太原：太原理工大学，2006
    [11] Junqua J C, Mak B, Reaves B. A robust algorithm for word boundary detection in the presence of noise. IEEE Transactions on Speech Audio Processing, 1994, 2(3):406-412页
    [12] 陈四根，和应民．一种基于信息熵的语音端点检测方法．应用科技，2001，28(3)
    [13] 李建民，张钹，林福宗．支持向量机的训练算法．清华大学学报(自然科学版)，2003，43(1)：120～124页
    [14] Rabiner L R．Fundamentals of Speech Recognition．北京：清华大学出版社，1999
    [15] 陈尚勤，罗承烈，杨雪．近代语音识别．成都：电子科技大学出版社，1991
    [16] L．R．语音识别基本原理(影印版)．北京：清华大学出版社，1999
    [17] P. M. L. Drezet, R. EHarrison. Support Vector Machines for System Identification. UKACC International Conference on Control'98(Conf.Publ.No.455), 1998(1): 688-692
    [18] B.J.de Kruif, T.J.A.de vries. On Using a Support Vector Machine in Learning Feed-Forward Control. In Proceedings of Int. Conf. on Advanced Intelligent Mechatronics, Como, Italy, July 2001: 272～277
    [19] 19. J. A. K. Suykens. Nonlinear Modeling and Support Vector Machines. In Proceedings of the 18th IEEE Instrumentation and Measurement Technology Conference,2001,1:287-294
    [20] 李凌均，张周锁，何正嘉．基于支持向量机的机械故障智能分类研究．小型微型计算机系统，2004，25(4)：667-670页
    [21] 刘江华，程君实，陈佳品．支持向量机训练算法综述．信息与控制，2002，31(1)：45-50页
    [22] 石锋．语音学探微．北京：北京大学出版社，1990
    [23] 陈丽霞．基于声韵母基元的汉语语音识别系统：(硕士学位论文)．南京：南京理工大学，2005
    [24] 唐发明．基于统计学习理论的支持向量机算法研究：(博士学位论文)．武汉：华中科技大学，2005
    [25] 姚文冰．基于高阶累积量的抗噪语音识别：(博士学位论文)．武汉：华中科技大学，2001
    [26] 张徽强．带噪语音信号的端点检测和声韵分离．：(硕士学位论文)．长沙：国防科技大学，2005
    [27] 何振武，丁洪，林良明．一种新的声韵分割的时域方法．数据采集与处理，1998．6
    [28] 沈亚强，冯根良．基于时间序列短时分形维数的噪声语音信号端点检测和滤波．浙江师范大学学报(自然科学版)，1999，22(1)：16-21页
    [29] M. Anthony, P. Bartlett. Learning in Neural. Networks: Theoretical Foundations. Cambridge University Press, 1999
    [30] J. Shawe-Taylor, N. Cristianini. Margin Distribution and Soft Margin. In A. Smola, P. Bartlett, B. Scholkopf, and Do Schuurmans, editors, Advances in Large Margin Classifiers, MIT Press, Cambridge, MA, 2000: 349-358
    [31] J. Weston. Leave-One-Out Support Vector Machines. IJCAI 1999:727-733
    [32] Smola, B. Scholkopf, K.-R. Muller. Convex Cost Functions for Support Vector Regression. In L. Niklasson, M. Bodtn, and T. Ziemke, editors, Proceedings of the Eighth International Conference on Artificial Neural Networks, Perspectives in Neural Computing, Berlin, Springer-Verlag. 1998
    [33] 宇缨，王晓龙．Research on Chinese place name recognition based on kernel classifier．哈尔滨工业大学学报(英文版)，2007(1)
    [34] 贺志阳，张玲华．基于GMM统计参数和SVM的说话人辨认研究．南京邮电大学学报(自然科学版)，2006(3)
    [35] 闵莉，初正恒．支持向量机的红外目标自动检测与识别．沈阳建筑工程学院学报(自然科学版)，2004(1)
    [36] 王治平，赵力．基于支持向量机的语音情感识别．东南大学学报(英文版)，2003(4)
    [37] LIXuelong，LIUZhengkai．Image Classification Based on SVM with Generalized Kernel．电子学报(英文版)，2003(2)
    [38] L. Mangasarian. Generalized Support Vector Machines. In A. Smola, P. Bartlett, B. Scholkopf, and D. Schuurmans, editors, Advances in large Margin Classifiers, MIT Press, 2000:135-146页
    [39] 韦晓东，胡光锐，任晓林．应用倒谱特征的带噪语音端点检测方法．上海交通大学学报，2000，34(2)：185-188页
    [40] 小朋，王欢．多SVM决策组合在入侵检测中的应用．信息安全与通信保密，2007(3)
    [41] 刘丽珍，贺海军．支持向量机在网页信息分类中的应用研究．小型微型计算机系统，2007(2)
    [42] A. K. Suykens, J. Vandewalle. Least Squares Support Vector Machines Classifiers. Neural Processing Letters, 1999, 9(3): 293-300
    [43] J. A. K. Suykens, J. Vandewalle. Recurrent Least Squares Support Vector Machines. IEEE Transactions on Circuits and Systems, 2000, 47(7): 1109～1114
    [44] Hong-Gunn Chew, D. Crisp, R. E. Bogner et al. Target Detection in Radar Imagery Using Support Vector Machines with Training Size Biasing. In: Proceedings of the sixth international conference on control, Automation, Robotics and Vision, Singapore, 2000
    [45] Yuh-Jye Lee, O. L. Mangasarian. SSVM: A Smooth Support Vector Machines.
    [46] Computational Optimization and Applications, 2001, 20(1): 5～22
    [47] Yuh-Jye Lee, O. L. Mangasarian. RSVM: Reduced Support Vector Machines. In Proceedings of the First SIAM International Conference on Data Mining, 2001
    [48] L. Mangasarian, D. R. Musicant. Lagrangian Support Vector Machines. Journal of Machine Learning Research, 2001,1: 161-177
    [49] E. Osuna, R. Freund, F. Girosi. Training Support Vector Machines: An Application to Face Detection, Proc. Computer Vision and Pattern Recognition' 97, 1997:130-136页
    [50] http://asi.insa-rouen.fr/～arakotom/toolbox/SVM-KMToolbox,S.Canu and Y. Grandvalet and V. Guigue and A. Rakotomamonjy. SVM and Kernel Methods Matlab Toolbox. Perception Systemes et Information, INSA de Rouen, Rouen, France. 2005

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700