基于声学参数和支持向量机的病理嗓音分类研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着人类社会交流的增加和生活习惯的改变,嗓音疾病的发生率越来越高,因此临床上对言语嗓音障碍的评估越来越重视,有关病理嗓音检测分析的研究也越来越多。能够对病理嗓音进行准确诊断是一切嗓音问题解决的根本。本研究根据病理嗓音的国内外研究现状以及目前的研究趋势,围绕病理嗓音的声学测量以及模式识别方法中的支持向量机,展开了病理嗓音客观自动分类与评级的前瞻性研究,最终构建了病理嗓音自动分类器。
     由于病理嗓音分类器的构建需要输入若干特征向量,因此本研究需进行嗓音特征参数的选取研究。针对分类器的特征向量集选取了和病理嗓音特征相关的基频类参数、振幅类参数、共振峰类参数、声门类参数、谐噪比参数以及倒谱类参数共六大类声学参数。为了克服基频提取时噪声对计算结果的干扰,采用同态系统和倒谱分析这种对噪声干扰不敏感的基频提取方法进行嗓音基频的提取,并在此基础上获得了其它新的基频类参数、谐噪比参数和倒谱类参数。
     为了提高病理嗓音分类器的分类效率,本研究对所选声学参数进行了优化。通过对病理嗓音声学特征参数相互之间的相关性分析,以及病理嗓音感知特征和声学特征的相关性分析,完成了病理嗓音声学参数的两次优化;从而选取包含较多信息含量,最能反映病理嗓音特征的参数作为优化后特征参数向量,输入到病理嗓音分类器中。
     为了避免有限样本量对研究结果的影响以及实现病理嗓音的非线性分类,本研究选择支持向量机这种模式识别方法作为病理嗓音的分类方法,其十分适合小样本和非线性分类问题的研究。根据提取的所有声学参数和优化后的声学参数分别建立了支持向量机的嗓音训练模型,从而构建了两个能够对正常嗓音和病理嗓音进行自动区分的病理嗓音二分类器。然后采用交叉验证和ROC曲线相结合的验证方式对病理嗓音二分类器的效果进行检验;这种检验方式避免了有限样本对验证结果的影响,推广性佳,而且简单直观。通过对分类器的识别正确率和ROC曲线图的研究发现:针对正常嗓音和病理嗓音识别的病理嗓音二分类器的分类效果极佳,识别率达到96%-98%,基本可以实现二者的区分。
     为了克服各类别嗓音样本数量不均衡的问题,以及避免分类重叠和不可分类的情况出现,本研究在病理嗓音二分类器的基础上,采用一对一的方法,根据提取的所有声学参数和优化后的声学参数,分别构建了两个能够对病理嗓音四级严重程度进行自动等级划分的病理嗓音四分类器。然后采用交叉验证和ROC曲线相结合的验证方式对病理嗓音四分类器的效果进行检验;通过对分类器的识别正确率和ROC曲线图的研究发现:针对病理嗓音等级评价的病理嗓音四分类器具有一定的分类效果,识别率达到73%~84%,但不如病理嗓音二分类器;基于优化后参数的病理嗓音分类器的分类效果略低于原始参数,在追求分类效率的情况下可以使用基于优化后参数的病理嗓音分类器进行分类。
     病理嗓音自动分类器不只针对病理嗓音和正常嗓音两种类型进行客观化区分,还针对病理嗓音的四级严重程度进行客观化的等级评价。对病理嗓音自动化分类与评级的研究使得嗓音障碍的评估不受到主观差异、语言环境等因素的影响,能够更加客观化。
With the increase of human society communication and the change of habits and customs, the incidence of voice diseases is increasing. Therefore, the clinical assessment of speech voice disorders is emphasized, and there is also more research on pathological voice detection and analysis. The accurate assessment of voice disorders is fundamental to the solution of all voice problems. This study is prospective which has launched objective automatic classification and grade evaluation on pathological voice revolving around the acoustic measurements of pathological voice as well as Support Vector Machine in pattern recognition methods, according to the current research status and trends of pathological voice. The automatic classifiers of pathological voice are built eventually.
     The acoustic characteristic parameters of pathological voice have been extracted. All six types of parameters were extracted by a series of voice signal processing technology; that is, the formant and amplitude parameters were achieved by Linear Prediction, the glottal parameters were achieved by Inverse Filtering, and the fundamental frequency, HNR and cepstral parameters were achieved by Homomorphic Analysis and Cepstrum Technology. The pathological voice classifiers were built to need inputting sevaral feature vectors, which were constituted by the above acoustic parameters.
     The acoustic characteristic parameters of pathological voice have been analysized and optimized. The perceptual evaluation of pathological voice was achieved by GBRAS, and then the grade evaluation results of G, R, B were obtained. The optimized11parameters were obtained by correlation analysis between all parameters as well as between the parameters and grade evaluation results. These parameters that had low correlation with each other and high with the grade evaluation results were inputted to SVM.
     The pathological voice classifiers have been implemented. SVM was selected to classify the voice. The training models of SVM were built based on two types of acoustic characteristic vectors so that the classifiers were achieved for classification of normal and pathological voice. Then the classifiers were achieved for classification of pathological voice severity level by one-on-one approach on this basis.
     The effect of pathological voice classifiers have been verified by Cross Validation and ROC curve. The effect to classify pathological voice from normal was the best and the rate of recognition reached96%~98%, so it could realize the distinction between the above two types of voice. The effect to classify pathological voice severity level was certain and the rate of recognition reached73%~84%. The effect of classifiers based on the optimized parameters was slightly lower than that based on the initial parameters, so the former could be applied in the objective automatic classification on pathological voice for greater efficiency.
     The pathological voice classifiers can differentiate objectively not only normal and pathological voice, but also pathological voice severity level. The assessment of voice disorders will be more objective and not be affected by subjective differences, language environment and else factors through the research on objective classification and grade evaluation of pathological voice.
引文
[1]江德胜、余养居:《嗓音外科学》,上海,世界图书出版公司,2004
    [2]Mendoza E, Carballo G. Vocal tremor and psychological stress, Journal of Voice,1999,13: 105-112
    [3]Nancy Tye-Murrayetc. Foundations of Aural Rehabilitation, Singular Publishing Group, SanDiego, California, USA,1998
    [4]楼正才、吴小红、陈怀华.经商人员嗓音疾病调查分析,中国耳鼻咽喉颅底外科杂志,2006,12(3):199
    [5]李正廷、章波、郭茅.嗓音医学现状及发声显微外科相关问题,听力学及言语疾病杂志,2005,13(6):390
    [6]杨烨、王建群、高下.南京市中小学及幼儿园教师嗓音疾病的调查,南京医科大学学报,2008,28(2):270
    [7]黄昭鸣、杜晓新:《言语障碍的评估与矫治》,上海,华东师范大学出版社,2006
    [8]杨式麟:《嗓音医学基础与临床》,辽宁,科学技术出版社,2001
    [9]Orlikoff RF. Assessment of the dynamics of vocal fold contact from the EGG:data from normal male subjects, Speech Hear Res,1991,34:1066-1072
    [10]Hirano, M., S. Nagasawa, and T. Morimoto. A Convenient Preparation of Ketones by the Oxidation of Secondary Alcohols with Chromium (Vi) Trioxide in Aprotic-Solvent in the Presence of Wet-Aluminum Oxide, Bulletin of the Chemical Society of Japan,1991,64(9): 2857-2859
    [11]Daniel R. Boone, Stephen C, McFarlane, Shelley L., Von Berg, et. The Voice and Vioce Therapy, New York, Stephen D. Dragin Publishers,200:41-45
    [12]Huang,Z., Minifie,F., Kasuya,H., & Lin,X.. Measures of vocal function during change in vocal level, Journal of Voice,1995,9(4):429-438
    [13]万萍,黄昭鸣:《嗓音保健》,上海,华东师范大学出版社,2007
    [14]Hirano M. Psycho-coustic evaluation of voice:GRBAS scale for evaluating the hoarse voice. In Hirano M,ed, Clinical examination of voice,1981:81-84
    [15]黄昭鸣,黄鹤年,陈玉琰.嗓音言语的生理解剖机理,Seattle,Tiger DRS,2003
    [16]Manfredi, C.. Adaptive noise energy estimation in pathological speech signals, Ieee Transactions on Biomedical Engineering,2000,47(11):1538-1543
    [17]Decoster, W. and F. Debruyne. Changes in spectral measures and voice-onset time with age:A cross-sectional and a longitudinal study, Folia PhoniatricaEtLogopaedica,1997,49(6): 269-280
    [18]Donna, M. and D. Walsh. Benzonatate for opioid-resistant cough in advanced cancer, Palliative Medicine,1998,12(1):55-58
    [19]Janai'na Mendes Laureano, Marcos Felipe S. Sa', Rui A. Ferriani, and Gustavo S. Romao. Variations of Jitter and Shimmer Among Women in Menacme and Postmenopausal Women, Journal of Voice,2009,23(6):687-689
    [20]Yu Zhang and Jack J.Jiang. Acoustic Analyses of Sustained and Running Voices From Patients With Laryngeal Pathologies, Journal of Voice,2008,22(1):1-9
    [21]de Krom G. Consistency and reliability of voice quality ratings for different types ofspeech fragments, J Speech Hear Res,1994,37(5):985—1000
    [22]Dejonckere P H, Obbens C, De Moor G M, et al. Per ceptual evaluation of dysphonia: reliability and relevance, Folia Phoniatrica,1993,45:76-83
    [23]Yumoto, Eiji. Atrodynanics, voice quality,and laryngeal image analysis of normal and pathologic voice, Otolaryngohogy and Head and Neck Surgery,2004,12(3):166-173
    [24]Zeitels,S.M., Blitzeria, Hillman,R.E., et.al. Foresight in laryngologyland laryngeal surgery: a2020vision, J.Ann Otol Rhinol Laryngol Suppl,2007,198:2-16
    [25]James Hillenbrand, Robert A. Houde. Acoustic Correlates of Breathy Vocal Quality:Dysphonic Voices and Continuous Speech, Journal of Speech and Hearing Research,1996,39:311-321
    [26]A. Alpan, J. Schoentgen, Y. Maryn, et al. Assessment of disordered voice via the first rahmonic, Speech Communication,2012,54:655-663
    [27]Hanson DG, Gerratt BR, Kanin RR, Berke GS. Giottographic measures of vocal fold vibrations:an examination of laryngeal paralysis, Laryngoscope,1988,98:541-549
    [28]David H, Thomas V. McCaffrey. Open Slope Quotient:A New Glottographic Parameter, Journal of Voice,1995,1:86-94
    [29]A. Gelzinisa, A. Verikasa,b, M. Bacauskienea. Automated speech analysis applied to laryngeal disease categorization, Computer methods and programs in biomedicine,2008,9:36-47
    [30]Rosalyn J. Moran, Richard B. Reilly, et al. Telephony-Based Voice Pathology Assessment Using Automated Speech Analysis, IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING,2006,55(3):468-477
    [31]Solomon,N.P., Garlitz,S.J, Milbrath,R.L.. Respiratory and laryngeal contributions to maximum phonation duration, J.Voice,2000,14:331-340
    [32]Piccirillo, J.F., et al. Multivariate analysis of objective vocal function, Annals of Otology Rhinology and Laryngology,1998,107(2):107-112
    [33]Chan, R.W., Titze,I. R.. Dependence of phonation threshold pressure on vocal tract acoustics and vocal fold tissue mechanics, J Acoust Soc Am,2006,119:2351-2352
    [4]Jiang, J, Tao, C.. The minimum glottal airflow to initiate vocal fold oscillation, J Acoust Soc Am,2007,121:2873
    [35]陈小铃,吴梁光,王薇.嗓音声学分析及电声门图在喉科疾病的应用初探,听力学及言语疾病杂志,1999,7:4
    [36]Rehan Kazi, Jeeve Kanagalingam, et al. Electroglottographic and Perceptual Evaluation of Tracheoesophageal Speech, Journal of Voice,2009,23(2):247-254
    [37]于萍、王国建、韩冰等.成人女性嗓音障碍的客观多参数分析,临床耳鼻咽喉科杂志,2004,12:751
    [38]徐洁洁,黄昭鸣,祁沁红等.声带良恶性增生疾病嗓音的声学参数分析,江苏医药杂志,2001,3:170
    [39]安伟,林世龙,李江民.声带息肉340例嗓音声学参数的分析,贵州医学杂志,2004,10:900
    [40]黄昭鸣,万萍.嗓音声学参数与嗓音音质的相关研究,临床耳鼻喉头颈外科杂志,2008,3(6):251-254
    [41]肖永涛.前发声期嗓音参数参考标准的制定及临床应用研究,博士论文,华东师范大学,2010
    [42]张志明,杨式麟.病态嗓音基频和音域的变化,临床耳鼻咽喉科杂志,2000,6:260
    [43]蔡青,陶泽璋,杨强.发不同元音时嗓音声学分析参数的比较,临床耳鼻咽喉科杂志,2001,4:167
    [44]于萍,·韩冰,黄冬雁等.嗓音声学分析和电声门图的比较研究,听力学及言语疾病杂志,2005,13(3):160-163
    [45]张明星、温武.嗓音主观评估的研究进展,听力学及言语疾病杂志,2007,15(6):508-511
    [46]Webb AL, Carding PN, Deary IJ, et al. The reliability of three perceptual evaluation scales or dysphonia, Eur Archotor hinolaryngol,2004,261-429
    [47]韩仲明.嗓音声学检测分析,中国耳鼻咽喉头颈外科,2006,13:351
    [48]Webb AL, Carding PN, Deary IJ, et al. The reliability of three perceptual evaluation scales or dysphonia, Eur Archotor hinolaryngol,2004,261-429
    [49]Jacobson BH, Johnson A, Grywalsky C, et al. The voice handicap index (VHI):development and validation, Am J Speech Lang Pathol,1997,6:66
    [50]Spector BC, Net terville JL, Billante C, et al. Quality of life assessment in patients with unilateral vocal cord paralysis, Otolaryngol Head Neck Surg,2001,125:176
    [51]Rosen CA, Lee AS, Osborne J. et al. Development and validation of the voice handicap index-10, Laryngoscope,2004,114:1549
    [52]Hogikyan ND, Set huraman G. Validation of an inst rument to measure voice related quality of life (V-RQOL), Journalof Voice,1999,13:557
    [53]Gliklich RE, Glovsky RM, Montgomery WM. Validation of a voice outcome survey for 110 unilateral vocal cord paralysis, Otolaryngol Head Neck Surg,1999,120:153
    [54]Ma EP, Yiu EM. Voice activity and participation profile:assessing the impact of voice disorders on daily activities, Speech Lang Hear Res,2001,44:511
    [55]Deary IJ, Wilson JA, Carding PN. Voiss:a patient-derived Voice Symptom Scale, J Psychosom Res,2003,54(5):483
    [56]Alku P, Airas M, Bjorkner E, et al. An amplitude quotient based method to analyze changes in the shape of the glottal pulse in the regulation of vocal intensity, The Journal of the Acoustical Society of America,2006,120(2):1052-1062
    [57]Wuyts FL, De Bodt MS, Molenberghs G, et al. The Dysphonia Severity Index:an objective measure of vocal quality based on a multiparameter approach, J Speech Lang Hear Res,2000, 43:796-809
    [58]R.J.Moran, R.B.Reilly. Telephony-Based voice pathology assessment using automated speech analysis, IEEE Transs on Biomedical Eng,2006,53(3):468-477
    [59]A.A.Dibazar, S.Narayanan,T., W. Berger. Feature analysis for automatic detection of pathological speech, In Proc.IEEE Joint EMBS/BMES Conf, Houston,TX,2002,1:82-84
    [60]T.Ritchings, M.McGillion, C.Moore. Pathological voice quality assessment using artifical neural networks, Med.Eng.Phys,2002,24(8):561-564
    [61]C.Peng, W.X.Chen, X.Zhu, B.K.Wan, et al. Pathological voice classification based on single vowel's acoustic features.7th International Conference on Computer and Information Technology,2007,10:1106-1110
    [62]于萍,王荣光.嗓音障碍主观听感知评估研究现状,听力学及言语疾病杂志,2009,17(1):1-6
    [63]王慧.180名女性教师嗓音参数及嗓音障碍指数分析,天津医科大学学报,2008,15(3):379-381
    [64]贾弘光,王琪.嗓音障碍指数量表中文版信度和效度评价,中国医学文摘耳鼻咽喉科学,2008,23(6):332
    [65]高俊芬,胡维平.基于非线性动力学和GMM的病态嗓音识别与研究,广西师范大学学 报,2011,29(3):5-8
    [66]A. Gelzinisa, A. Verikasa,b, M. Bacauskienea. Automated speech analysis applied to laryngeal disease categorization, Computer methods and programs in biomedicine,2008,9:36-47
    [67]Rosalyn J. Moran, Richard B. Reilly, et al. Telephony-Based Voice Pathology Assessment Using Automated Speech Analysis, IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING,2006,55(3):468-477
    [68]Fant, G., Liljencrants, J., Lin, Q.. A four-parameter model of glottal flow, STL-QPSR,1985, 4:1-13
    [69]Childers,D.G, Ahn,C.. Modeling the glottal volume-velocity waveform for the three voice type, Journal of the Acoustical Society of America,1997
    [70]赵力:《语音信号处理》,北京,机械工业出版社,2003
    [71]Childers, D.G. and C.K. Lee. Vocal Quality Factors-Analysis, Synthesis, and Perception, Journal of the Acoustical Society of America,1991,90(5):2394-2410
    [72]Lu'is C. Oliveira. Text-to-Speech Synthesis with Dynamic Control of Source Parameters, the Second ESCA/IEEE/AAAI Workshop on Text-to-Speech Synthesis,1994
    [73]Alku, P., Vilkman, E.. Estimation of the glottal pulseform based on discrete all-pole modeling, Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan,1994,1619-1622
    [74]Alku, P.. Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering, Speech Communcation,11:109-118
    [75]王聪修.语音转换及其相关问题的研究,中国科学院声学研究所,2001,45-50
    [76]Murphy, P. J, Kande,O.O. Noice estimation in voice signals using short-term cepstral analysis, Acoust Soc Am,2007,121:1679-1690
    [77]Yolanda D., Heman-Ackah, et al. The Relationship Between Cepstral Peak Prominence and Selected Parameters of Dysphonia, Journal of Voice,2002,16(1):20-27
    [78]F.Castells, P.Laguna, Sommo, et al. Research article Principal Component Analysis in EGG Signal Processing, Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Peocessing,2007
    [79]张娜等.支持向量机的算法及应用[J],福建电脑,2011,27(2):78-89
    [80]Cortes C, Vapnik V. Support vector networks, Machines Learning,1995,20(3):273-279
    [81]Vapnik V. The Nature of Statistical Learning Theory, NY, Springer-Verlag,1995
    [82]S.Knerr, L.Personnaz, and G.Dreyfus. Single—layer learning revisited:a stepwise procedure for building and training a neural network, Neurocomputing:Algorithms, Architectures and Applications, Springer-Verlag,1990
    [83]于萍,黄冬雁,Revis J, et al.不同声音样本对嗓音听感知评估的影响,听力学及言语疾病杂志,2004,12(3):164-167
    [84]Bottou, L.. Comparison of calssifier methods:A case study in handwriting digit recognition, In International Conference on Pattern Recogniton,1994
    [85]U. Krebel. Pairwise classification and support vector machines, in Adanvances in KerneZ Methods—Support Vector Learning,1999, MIT Press.P,255—268
    [86]S.Knerr, L.Personnaz, and G.Dreyfus. Single—layer learning revisited:a stepwise procedure for building and training a neural network, Neurocomputing:Algorithms, Architectures and Applications, Springer-Verlag,1990

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700