语音特征提取及在音色转换系统的应用

英文题名：The Application of Acoustic Features in the Voice Converion System
作者：虞国桥
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：音色转换 ; 二维精细谱 ; 动态特征参数
英文关键词：voice conversion ; smoothed time-frequency representation ; dynamic feature
学位年度：2006
导师：梁满贵
学科代码：081002
学位授予单位：北京交通大学
论文提交日期：2006-12-01

摘要

随着信息技术与计算机技术的迅速发展,计算机应用深入到我们生活、工作的方方面面,人与计算机的联系越来越密切,人机交互研究,特别是语音方式的人机交互研究越来越广泛受到关注。人们不满于键盘鼠标的人机交互方式,希望使用更方便的图象语音,并且有个性化的特征。语音转换技术就是这样一种技术,它使语音形式的人机交互个性化,具有重要的理论和应用意义,是本论文的主要研究对象。
     音色转换(Voice Conversion)是一项改变说话人声音特征的技术即转换说话人的音色个性特征而保持说话内容不变,在语音即时聊天,电影、广播、电视中剪辑和配音,语音合成的语料库收集,语音合成后端语音的个性化处理,在情报部门等都有很多应用。音色转换主要转换说话人的特征,本文着眼点是在高质量STRAIGHT语音分析合成下语音特征参数提取的研究。
     本文主要完成了以下几方面的工作:(1)了解音色转换的研究现状,熟悉各种音色转换方法的基础上,对其进行了优劣的比较的研究。(2)在STRAIGHT分析算法的基础上采用动态特征参数,优化对特征参数的提取,提高了声音转换的质量。
With the development of information technology and computer science, the application of computer pervads every aspect of our lives. In modem life, computer and human being already gets more and more closed with each other.The research on "Human-Computer Interaction Techniques" became a hot issue. People don' t remain the satisfaction with typical inputting instrument using: such as key board & mouse. People prefer to get a more convinent and effective way to interact with compucter. Now "Voice Conversion" comes.In this paper we focus on the topic of voice conversion which makes the computer-human interaction more individualized and effective. And actually it will bring convenience to our real life. This technology is meaningful in theory and application and will be widely accepted on its own merits.
    Voice conversion is a technque for modifying a source speaker's speech to sound as if it was spoken by a target speaker. Voice conversion will serve as an invaluable tool for many applications in speech technology,such as instant reversion to the other side in chat,also can be used for dubbing movies for TV broadcast and can provide various distinctive voice to speech synthesis.It also can be used in intelligence department.Voice conversion mainly convert a source speaker's acoustic features to target speaker's.Tthis paper's emphasis on researching acoustic fwatures in voice conversion system.
    In this paper,we have achieve purpose:Firstly,we carries on the research comparison to each kind of acoustic features. Secondly,after the STRAIGHT analyse speech,we used dynamic features optimizes to improve the quality of converted speech.

引文

[1] 王子祥．说话人声音转换方法研究．中国科技大学硕士．20050501．第1页．
    [2] B S Atal, S L Hanauer. Speech analysis and synthesis by linear prediction of the speech wave [J]. J Acoust Soc Am, 1971, 50(2): 637-655.
    [3] S Seneff. System to independently modify excitation and/or spectrum of speech waveform without explicit pitch extraction[J]. IEEE Trans Acoust Speech Sig, 30 (4), 1982: 566-578.
    [4] D G Childers, et al. Factors in voice quality: Acoustic features related to gende [rA]. Proc ICASSP[C]. New York, USA: IEEE, 1987. 293-296.
    [5] D G Childers, et al. Voice Conversion[J]. Speech Communication, 1989, 8 (2): 147-158.
    [6] M Abe, et al. Voice Conversion through Vector Quantization[A]. Proc ICASSP[C]. New York, USA: IEEE, 1988 (1).655-658.
    [7] N Iwahashi, et al. Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks [J]. Speech Communication. 1995, 16 (2): 139-151.
    [8] A Rinscheid. Voice conversion based on topological feature maps and time-variant filtering [A]. Proc ICSLP [C]. Philadelphia, USA: ESCA, Oct. 1996. 1445-1448.
    [9] H Valbret, et al. Voice transformation using PSOLA technique[J]. Speech Communication, 1992, 11 (2-3): 175-187.
    [10] M Narendranath, et al. Transformation of formants for voice conversion using artificial neural networks [J]. Speech Communication, 1995, 16 (2): 207-216.
    [11] T Watanabe, et al. Transformation of spectral envelope for voice conversion based on radial basis function networks [A]. Proc ICSLP'2002[C]. Denver, USA: ISCA, Sept. 2002. 285-288.
    [12] KAWAHARA H. SPEECH REPRESENTATION AND TRANSFORMATION USING ADAPTIVE INTERPOLATION OF WEIGHTED SPECTRUM: VOCODER REVISITED. 1997
    [13] Hui Ye and Steve Young. Voice Conversion for Unknown Speakers
    [14] M Abe. A segment-based approach to voice conversion [A]. Proc ICASSP [C]. Toronto, Canada: IEEE, May 1991. 765-768.
    [15] Oytun Turk. NEW METHODS FOR VOICE CONVERSION. B. S. in Electrical and Electronics Eng., Bogazici University, 2000, Page6-10.
    [16] 马建春，刘文举等．基于共振峰和高斯混合模型的声音转换系统．第八届全国人机语音通讯学术会议论文集，2005，308-311．
    [17] CORVELEYN S, COOSE B,VERHELST W.Voice modification and conversion using PLAR-parameters[A].Proc 1 st IEEE Benelux Workshop on Model Based Processing and Coding of Audio(MPCA-2002)[C].Leuven,Belgium,2002,MPCA02-1-4.
    [18] LEE K S, DOH W, YOUN D H. Voice conversion using low dimensional vector mapping[J]. IEICE Trans Inf & Syst, 2002, E85-D(8);
    [19] TURK O. New methods for voice conversion. Master Degree Thesis of Science[D]. Bogazici University, 2003
    [20] 刘金凤，符敏，程德福．声音转换实验系统的研究与实现．南京理工大学学报．2005年10月第29卷增刊．211-214
    [21] 王炳锡，屈丹，彭煊等．实用语音识别基础(第一版)．国防工业出版社．2005年1月．129-154．
    [22] 杨行峻，迟惠生．语音信号数字处理[M]．北京：电子工业出版社，1995．
    [23] ABE M, NAKAMURA S, SHIKANO K, et al. Voice conversion through vector quantization[A]. ICASSP[E]. NEW York, 1988. 655-658.
    [24] Wrask R L.语音和音系词典．北京：语文出版社，2000：259
    [25] VALABRET H, MOULINES E,TUBACH J P.Voice transformation using PSOLA technique[J]. Speech Communication. 1992.11(6); 175-187.
    [26] NARENDRANATH M,MURTHY H A,RAJENDRAN S.Transformation of formants for voice conversion using artificial neural networks[J].Speech Communication, 1995,16(2).207-216.
    [27] KAIN A. High Resolution Voice Transformation[D]. OGI School of Science and Engineering at Oregon Health and Science University, 2001
    [28] H Matsumoto, et al. Multidimensional representation of personal quality of vowels and its acoustical correlates[J]. IEEE Trans Audio and Electroacoustics, 1973, 21 (5): 428-436.
    [29] S Furui. Research on individuality features in speech waves and automatic speaker recognition techniques[J]. Speech Communication. 1986, 5 (2): 183-197.
    [30] [美]Thomas F．Quatieri．赵胜辉等译．离散时间语音信号处理——原理与应用Discrete-Time Speech Signal Processing：Principles and Practice电子工业出版社．2004年8月
    [31] 左国玉，刘文举，阮晓钢等．声音转换技术的研究与进展．电子学报．Vol．32 No．7July 2004．1165-1172．
    [32] 康永国，双志伟，陶建华等．高斯混合模型和码本映射相结合的语音转换算法．第八届全国人机语音通讯学术会议论文集，2005．293-297
    [33] 洪穗．基于分段的说话人语音转换技术的研究．华南理工大学．20040601，29-30．

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700