神经网络声码器的话者无关与自适应训练方法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

神经网络声码器的话者无关与自适应训练方法研究

详细信息查看全文 | 推荐本文 |

英文篇名：Speaker-independent Training and Adaptation of Neural Vocoders
作者：伍宏传 ; 凌震华
英文作者：WU Hong-chuan;LING Zhen-hua;National Engineering Laboratory for Speech and Language Information Processing,University of Science and Technology of China;
关键词：神经网络 ; WaveNet ; 声码器 ; 话者无关模型 ; 自适应训练
英文关键词：neural network;;WaveNet;;vocoder;;speaker-independent model;;model adaptation
中文刊名：XXWX
英文刊名：Journal of Chinese Computer Systems
机构：中国科学技术大学语音及语言信息处理国家工程实验室;
出版日期：2019-02-15
出版单位：小型微型计算机系统
年：2019
期：v.40
基金：安徽省科技重大专项(17030901005)资助
语种：中文;
页：XXWX201902039
页数：6
CN：02
ISSN：21-1106/TP
分类号：207-212

摘要

近年来出现的基于WaveNet的神经网络声码器可以取得较高的重构语音质量,但其采用的话者相关模型训练方法对于目标发音人语音数据量依赖较大.因此,本文研究目标发音人语音数据量受限情况下的神经网络声码器训练方法.首先利用多发音人数据训练话者无关声码器模型,进一步利用少量目标发音人数据对话者无关模型进行自适应更新,以得到目标发音人的神经网络声码器模型.本文实验对比了自适应训练中局部更新与全局更新两种策略,以及自适应与话者相关两种训练方法.实验表明,本文方法构建的神经网络声码器可以取得优于STRAIGHT声码器的重构语音质量,在目标发音人数据量受限的情况下,该方法相对话者相关训练也可以取得更好的客观和主观性能表现.
In recent years,WaveNet-based neural vocoder can achieve high quality of reconstructed speech. However,it depends on the amount of speech data because of the speaker-dependent model training method. In this paper,we study the training method of neural vocoders with limited target speaker data. In our proposed method,a speaker-independent WaveNet vocoder is first trained using a multi-speaker speech corpus. Then,the parameters of the speaker-independent model are adaptively updated to obtain the neural vocoder of the target speaker. In our experiments,we compare local updating strategy with global updating strategy in adaptive training,then compare adaptive training method with speaker-dependent training method on the same training data. Experiments showthat the neural vocoder constructed by our proposed method can achieve better reconstructed speech quality than STRAIGHT,and the method can achieve better objective and subjective performance than speaker-dependent training with limited target speaker data.

引文

[1]Zen H,Tokuda K,Black A W.Statistical parametric speech synthesis[J].Speech Communication,2009,51(11):1039-1064.
    [2]Ling Zhen-hua.Extended research on HMM-based parametric speech synthesis[D].Hefei:University of Science and Technology of China,2008.
    [3]Black A W,Zen H,Tokuda K.Statistical parametric speech synthesis[C].IEEE International Conference on Acoustics,Speech and Signal Processing,IEEE,2007:1229-1232.
    [4]Zen H,Senior A,Schuster M.Statistical parametric speech synthesisusing deep neural netw orks[C].IEEE International Conference on Acoustics,Speech and Signal Processing,IEEE,2013:7962-7966.
    [5]Flanagan J L,Golden R M.Phase vocoder[J].Bell Labs Technical Journal,1966,45(9):1493-1509.
    [6]Gold B,Rader C.The channel vocoder[J].IEEE Transactions on Audio and Electroacoustics,1967,15(4):148-161.
    [7]Oppenheim A V.Speech analysi-synthesis system based on homomorphic filtering[J].The Journal of the Acoustical Society of A-merica,1969,45(2):458-465.
    [8]Itakura F.Line spectrum representation of linear predictor coefficients of speech signals[J].The Journal of the Acoustical Society of America,1975,57(S1):S35.
    [9]Imai S.Unbiased estimator of log spectrum and its application to speech signal processing[J].Trans.IEICE,A,1987,70(1):471-480.
    [10]Kawahara H,Masuda-Katsuse I,Cheveign D,et al.Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction:possible role of a repetitive structure in sounds[J].Speech Communication,1999,27(3-4):187-207.
    [11]Erro D,Sainz I,Navas E,et al.Harmonics plus noise model based vocoder for statistical parametric speech synthesis[J].IEEE Journal of Selected Topics in Signal Processing,2014,8(2):184-194.
    [12]Morise M,Yokomori F,Ozawa K.WORLD:a vocoder-based highquality speech synthesis system for real-time applications[J].IEICETransactions on Information and Systems,2016,99(7):1877-1884.
    [13]Itakura F.A statistical method for estimation of speech spectral density and formant frequencies[J].Electronics and Communications in Japan,A,1970,53(1):36-43.
    [14]Rabiner L,Juang B H.Fundamentals of speech recognition[J].Englew ood Cliffs N J,1993,1(1):353-356.
    [15]Oord A,Dieleman S,Zen H,et al.Wavenet:agenerative model for raw audio[J].ArXiv Preprint ArXiv:1609.03499,2016.
    [16]Tamamori A,Hayashi T,Kobayashi K,et al.Speaker-dependent WaveNet vocoder[C].Proc.Interspeech,2017:1118-1122.
    [17]Hu Y J,Ding C,Liu L J,et al.The USTC system for blizzard challenge[C].Blizzard Challenge Workshop,2017.
    [18]Wu Hong-chuan,Gu Yu,Ling Zhen-hua.Speech parameter synthesizer based on deep convolution neural netw ork[C].Chinese Information Processing Society of China,Speech Information Processing Committee,Proceedings of NCM M SC,2017:177-181.
    [19]Oord A,Kalchbrenner N,Kavukcuoglu K.Pixel recurrent neural netw orks[J].ArXiv Preprint ArXiv:1601.06759,2016.
    [20]Van den Oord A,Kalchbrenner N,Espeholt L,et al.Conditional image generation w ith pixelcnn decoders[C].Advances in Neural Information Processing Systems,2016:4790-4798.
    [21]He K,Zhang X,Ren S,et al.Deep residual learning for image recognition[C].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016:770-778.
    [22]Yamagishi,Junichi.English multi-speaker corpus for CSTR voice cloning toolkit[EB/OL].https://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.htmlhttp://homepages.inf.ed.ac.uk/jyamagis/release/VCTK-Corpus.tar.gz,2017.
    [23]John Kominek,Alan W Black.CMU ARCTIC databases for speech synthesis[EB/OL].http://w w w.festvox.org/cmu_arctic/,2017.
    [24]Kingma D,Ba J.Adam:a method for stochastic optimization[J].ArXiv Preprint ArXiv:1412.6980,2014.
    [25]Amazon.Amazon mechanical turk getting started guide[EB/OL].https://w w w.mturk.com/mturk/w elcome,2017.
    [2]凌震华.基于统计声学建模的语音合成技术研究[D].合肥:中国科学技术大学,2008.
    [18]伍宏传,顾宇,凌震华.基于深度卷积神经网络的语音参数合成器[C].第十四届全国人机语音通讯学术会议(NCMMSC'2017)论文集,2017:177-181.
    1http://home. ustc. edu. cn/~w hc/xw jxt/demo. html

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700