改进人工神经网络的病理嗓音共振峰修复
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Pathological Voice Formants Repaired by Improved Artificial Neural Network
  • 作者:薛隆基 ; 孙宝印 ; 张晓俊 ; 邵雅婷 ; 陶智
  • 英文作者:XUE Longji;SUN Baoyin;ZHANG Xiaojun;SHAO Yating;TAO Zhi;School of Optoelectronic Science and Engineering,Soochow University;School of Physical Science and Technology,Soochow University;
  • 关键词:病理嗓音 ; 共振峰修复 ; 线谱频率 ; 改进BP神经网络
  • 英文关键词:pathological voice;;repairing formant;;line spectral frequencies;;improved Back-Propagation neural networks
  • 中文刊名:DZQJ
  • 英文刊名:Chinese Journal of Electron Devices
  • 机构:苏州大学光电科学与工程学院;苏州大学物理科学与技术学院;
  • 出版日期:2019-02-20
  • 出版单位:电子器件
  • 年:2019
  • 期:v.42
  • 基金:国家自然科学基金重点项目(61271359);; 江苏省自然科学基金青年项目(BK20140354)
  • 语种:中文;
  • 页:DZQJ201901048
  • 页数:6
  • CN:01
  • ISSN:32-1416/TN
  • 分类号:253-258
摘要
提出了一种改进人工神经网络对病理嗓音共振峰修复的方法。分别提取正常语音和病理嗓音的线谱频率LSF(Line Spectral Frequencies),经过动态时间弯折后输入到改进BP神经网络训练。该网络通过自适应学习率和附加动量项来减少训练时间;将待修复病理嗓音通过训练好的网络进行映射,从而得到修复后的线谱频率;根据修复后线谱频率求解得到重构后的共振峰,实现对病理嗓音共振峰的修复。实验表明,该方法能对病理嗓音共振峰有效修复,修复后嗓音平均MOS得分比修复前提高55.8%。根据客观评价指标线谱对失真测度,该方法修复的语音线谱对失真测度比分段定值偏移与扩展型双线性变换联合法减少了23.4%,表明修复后的嗓音在可懂度和音质方面都有很大的提高,取得了好的修复效果。
        A method to repair pathological voice formants is proposed by using improved artificial neural network. Line spectral frequencies of normal speech and pathological voice are extracted respectively,and then input to improved BP neural network training after dynamic time warping. This network reduces training time by adaptive learning rate and additional momentum terms. The pathological voice to be repaired is mapped through the trained network to obtain repaired line spectral frequencies; the reconstructed formants are obtained according to the line spectral frequencies after being repaired,and the repaired pathological voice formants are achieved. Experiments show that this method can effectively repair the pathological voice formants,and the average MOS score of the voice after repaired is 55. 8%higher than that before the repaired. According to line spectral pairs distortion measure of objective evaluation indicators,line spectral pairs distortion of the repaired voice of the proposed method is reduced by 23.4%,compared with line spectral pairs segment fixed value migration combined with extended bilinear transformation. It is indicated that the repaired voice has greatly improved its intelligibility and sound quality and achieves a good repair effect.
引文
[1]周强.多频带非线性分析与感知多谱熵的声带疾病嗓音识别[D].苏州:苏州大学,2013.
    [2]Muhammad G.Voice Pathology Detection Using Vocal Tract Area[C]//Modelling Symposium.IEEE,2014:164-168.
    [3]Ye H,Young S.Quality-Enhanced Voice Morphing Using Maximum Likelihood Transformations[J].IEEE Transactions on Audio Speech and Language Processing,2006,14(4):1301-1312.
    [4]Sharifzadeh H R,Mcloughlin I V,Ahmadi F.Spectral Enhancement of Whispered Speech Based on Probability Mass Function[C]//Sixth Advanced International Conference on Telecommunications.IEEE,2010:207-211.
    [5]Morris R W,Clements M A.Modification of Formants in the Line Spectrum Domain[J].Signal Processing Letters IEEE,2002,9(1):19-21.
    [6]Huang C,Tao X Y,Tao L,et al.Reconstruction of Whisper in Chinese by Modified MELP[C]//International Conference on Computer Science and Education.IEEE,2012:349-353.
    [7]Mcloughlin I V,Shaifzadeh H R,Tan S L,et al.Reconstruction of Phonated Speech from Whispers Using Formant-Derived Plausible Pitch Modulation[J].ACM Transactions on Accessible Computing,2015,6(4):12.
    [8]Tanaka K,Hara S,Abe M,et al.Enhancing a Glossectomy Patient’s Speech via GMM-Based Voice Conversion[C]//Signal and Information Processing Association Summit and Conference.IEEE,2017:1-4.
    [9]栗学丽.汉语耳语音转换为正常音的研究[D].南京:南京大学,2004.
    [10]周佳秦,张晓俊,吴迪,等.采用线谱对分段定值偏移进行病理嗓音共振峰修正[J].信息化研究,2016,42(2):36-42.
    [11]Desai S,Raghavendra E V,Yegnanarayana B,et al.Voice Conversion using Artificial Neural Networks[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEEComputer Society,2009:3893-3896.
    [12]Nirmal J H,Patnaik S,Zaveri M A.Line Spectral Pairs Based Voice Conversion using Radial Basis Function[J].International Journal on Signal and Image Processing,2013,4(2):26-33.
    [13]苏州大学物理与光电·能源学部.苏州大学病理嗓音数据库[DB].2013.
    [14]周佳秦.线谱对的病理嗓音共振峰修正研究[D].苏州:苏州大学,2016.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700