汉语文语转换系统HJ-TTS关键技术的研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
文语转换系统(Text-to-Speech)是人-机交互接口的重要组成部分,也是中文信息处理中的一个难题。本文以华建语音翻译系统为研究背景,对文语转换系统的语言学处理和韵律表述两方面进行了深入研究和探讨。
     文语转换的目的是将计算机内存储的文本自动转换为声音输出,这项技术已经随着语音合成的发展逐步走上实用,它在信息发布系统、语音自动应答系统、语音电子邮件系统、残疾人语音服务等领域具有广阔的应用前景。它的研究对于人机语音通讯、自然语言人机接口和智能计算机系统的研制,都具有十分重要的理论意义和实用价值。
     随着计算语言学和语音学的发展,文语转换技术取得了长足发展。它从仅仅对语音信号做浅层分析发展到对语言学和语音学知识的综合运用:从机械式的语音合成发展到利用计算机进行语音输出。尽管如此,目前文语转换技术仍有许多不尽人意的方面,这主要表现在:系统的语言学处理部分还不能对文本做深层的语义分析,不能在文本理解层面上给后期合成提供必要的信息:语音的音段特征难以提取和定量分析;合成语音的评测标准和方法还不完善等。
     为了改善文语转换系统的性能,作者对系统的语言学处理部分进行了详细阐述和分析:丰富并改进了句子边界确定、特殊符号处理等规则:根据文语转换系统中分词的特点,实现了基于特征集合的语音分词算法和分词消歧策略;针对多音字发音描述问题,作者改进了词库结构和多音字存储方式,并在此基础上设计了多音字筛选算法。
     呼吸群边界划分作为汉语韵律的一种表现方式,它在语音感知和文语转换系统的性能方面起着十分重要的作用,作者在对汉语句法结构、句子长度和和呼吸群划分之间关系的研究基础上,设计并实现了基于句子长度和句法结构的呼吸群边界划分算法;为更好地描述文语转换过程,作者提出了适合汉语语音合成的SSML韵律标注语言,该标注语言不仅从声学层上对语音的韵律进行描述,而且从语言学层上标注句法结构信息,最后本文对基于语言理解的韵律规则用SSML进行了描述并用实验验证了语言学层面的标注对系统自然度的影响。
Text-to-Speech (TTS) is an important part of human-computer interface and it is also a difficult problem in the Chinese information processing. In this dissertation we conclude our research work on the language processing and prosodic expression of a Text-to-Speech system which acts as a component of Huajian Speech-to-Speech translation system.
    The Text-to-Speech system is designed to convert the text stored in the computer into speech. With the improvement of speech synthesis techniques, it has been widely used in our life such as information systems, voice response devices, voice services in E-mail, reading machines for blinds and so on. Moreover, it has great theoretic value in human-computer communication system, natural language human-computer interface and intelligent computer system.
    Text-to-Speech has made great progress with the development of computing linguistics and phonetics. It has evolved from analyzing speech signals to utilizing the knowledge of linguistics and phonetics. In spite of these, Text-to-Speech technology still has many weaknesses: Firstly, it can't provide enough information to the synthesis without deep semantic level analysis; Secondly, the segmental characteristic is difficult to extract and analyze; Finally, the evaluating criterions are far from perfect.
    The procedure of language processing is discussed and analyzed in this dissertation firstly, then the author enriches the rule sets of sentence boundary assigning, specific symbol processing and so on. Combining the characteristics of Chinese Text-to-Speech system, a new algorithm of speech word segmentation and a method of disambiguition are also put forth. The word-to-sound rule and the structure of a speech database are modified during the
引文
[1] Alain de Cheveigne, hideki Kawahara, Multiple period estimation and pitch perception model, speech Communication 27(1999) 175-185
    [2] Alan W Black, Comparison of algorithms for prediction accent placement in English speech synthesis, http://www.cstr.ed.ac.uk
    [3] Alan w black, Kevin Lenzo, Vincent Pagel, Issues in building General letter to sound rules, http://www.cstr.ed.ac.uk 1998
    [4] Alex Monaghan, Rhythm and stress in speech synthesis, Computer Speech and Language, 4: 71—78, 1990.
    [5] Alexander I. Rudnicky, Survey of current speech technology, Communications of ACM, 37: 52-57, March 1994.
    [6] Amy Isard, SSML: A Markup Language for Speech Synthesis, M. S thesis, University of Edinburgh, 1995
    [7] Andrew Golding, Pronouncing Names by a Combination of Case-Based and Rule-Based Reasoning, PhD thesis, Stanford University, 1991
    [8] Andrew J. Hunt and ALan W Black, Unit selection in a concatenative speech synthesis system using a large speech database, Proc. ICASSP-96, May 7-10, Atlanta, GA.
    [9] B. Van Coile, H. W. Ruhl, L. Vogten, M. Thoone, etc, Speech Synthesis for the new Pan-European traffic message control system RDS-TMC, Speech Communication 23(1997) 307-317
    [10] Boeffard and F. Violaro, Improving the robustness of text-to-speech synthesizers for large prosodic variations. In Conf. Proc. of second ESCA-IEEE Workshop on Speech Synthesis, pages 111-114, New Paltz, USA, Sept 1994.
    [11] C. Coker, Kenneth Church, and Mark Liberman, Morphology and rhyming: Two powerful alternatives to letter-to-sound rules for speech synthesis. In Bailly and Beno [BB90], pages 83—86.
    [12] C. L. Paris, W. R. Swartout, and William C. Mann, Natural Language Generation in Artificial Intelligence and Computational Linguistics. Kluwer Academic, July 1991.
    [13] C. Traber, F0 generation with a database of natural F0 pattern and with a neural network. In Bailly and Beno [BB92], pages 287—304.
    [14] Carlso R., Granstron B. and Nord L., Evaluation and development of the KTH text-to-speech system on the segmental level, Speech Communication, No. 9, pp. 271-277, 1990
    [15] Chen Fang and yuan Baozong, An Approach to Intelligent Speech Production System, J. of Compu. Sci & Tech, March 1997, Vol. 12 No. 2
    [16] CMU. Carnegie Mellon Pronuncing Dictionary. Http://www.speech.cs.cmu.edu/cgi-bin/cmudict 1998
    [17] Cristina Delogu, Stella Conte, Ciro Sementina, Congitive factors in the evaluation of synthetic speech, Speech Communication 24(1998) 153-168
    [18] D H. Klatt, Review of text-to-speech conversion for English. Journal of the Acoustical Society of America, 82(3): 737—793, 1987.
    [19] SGML规范, http://www.3w.org
    [20] D. Bigorgne et al, Multilingual PSOLA text-to-speech system, in Proc. of the IEEE Conf. on Acoustics, Speech, and Signal Processing, Ⅱ-187-190, 1993.
    [21] D. H. Klatt and L. C. Klatt, Analysis, synthesis a, and perception of voice quality variations among female and male talkers. Journal of the Acoustical Society of America, 87: 820—857, 1990.
    [22] D. Klatt. Software for a cascade/parallel formant synthesiser, Acoust. Soc. Amer., 67: 971-995, 1980.
    [23] D. O'Shaughnessy, Speech Communication, Addison-Wesley Publishing Company, 1987.
    [24] David D. McDonald, Natural Language Production as a Process of Decision Making Under Constraint. PhD thesis, Department of Computer Science and Electrical Engineering, Massachusetts Institute of Technology, 1980.
    [25] David Yarowsky, Homograph disambiguation in speech synthesis. In ESCA [ESC94], pp 244—247.
    [26] E. Moulines and F. Charpentier, Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones, Speech Communication, 9(5/6): 453-467, Dec 1990.
    [27] E. Moulines and Y. Sagisaka, Voice conversion: State of the art and perspectives. Speech Communication, 16(2), 1995.
    [28] Eric Sanders, Using Probabilistic Methods to predict Phrase Boundaries for a Text-to-Speech System, M.S. thesis, University of Nijmegen. 1995
    [29] Evelyne Tzoukermann and Mark Y. Liberman, A finite-state morphological processor for Spanish, In Karlgren [Kar90], pages 277-286.
    [30] F. Fallside,W. A. Woods, Computer Speech Processing, PRENTICE-HALL INTERNATIONAL(UK) LTD. ,1985
    [31] F. J. Charpentier and M.G. Stella, Diphone synthesis using an overlap-add technique for speech waveforms concatenation, ICASSP,pp.2015-2020,1986
    [32] G. Bailly and C. Beno, Talking Machines: Theories, Models, and Designs, Elsevier Science, 1992.
    [33] Gerard Kempen, Natural Language Generation: Recent Advances in Artificial Intelligence, Psychology, and Linguistics, Kluwer Academic, Boston, Dordrecht, 1987.
    [34] H. Valbret, E. Moulines, and J.P. Tubach, Voice transformation using PSOLA, Speech Communication, 11:175-187,1992.
    [35] Helen Wright, Automatic Utterance Type Detection Using suprasegmental Features, 1998. http://www.cstr.ed.ac.uk
    [36] Hiroaki Kitano, Speech-to-Speech Translation.A massively parallel Memory-Based Approach, Kluwer Academic Publishers,Boston, 1984
    [37] J. L. Flanagan, Speech Analysis, Synthesis and Perception, New York: Springer, 1972.
    [38] J. Laroche, Y. Stylianou, and E. Moulines, HNS: Speech modification based on a harmonic + noise model, In Proceedings of the 1993 International Conference on Acoustics, Speech, and Signal Processing, pp 550-553, 1993.
    [39] J. P. H. Van Santen, Assignment of segmental duration in text-to-speech synthesis, Computer Speech and Language, 8:95-128, 1994.
    [40] J. R. Davis and J. Hirschberg, Assigning intonational features in synthesized spoken directions, In Proceedings of the 26th Annual Meeting of the Association for Computational Linguistics, pp 187-193, New York, June 1988.
    [41] J. Van Santen, R. Sproat, J. Olive, and J. Hirshberg, Progress in Speech Synthesis, Springer Verlag, New York, 1995.
    [42] R. Cohen, Jerry Morgan, and Martha E. Pollack, Intentions in Communication, pages 271-311. MIT Press, Cambridge, Massachusetts, 1990.
    [43] Jau-Hung, Chen and Chung-Hsien Wu, Selection of Synthesis Units and Word Prosody Pattern in a Chinese Text-to-Speech System Using a Large Speech Database, Proceedings of 17th International Conference on Computer Processing of Oriental Languages, Vol.1, Hong Kong, April,1997, pp. 242-245
    [44] Jean Veronis, Philippe Di Cristo, Fabienne Courtois, Cedric Chaumette, A stochastic model of intonation for text-to-speech synthesis, Speech Communication 26(1998) 233-244
    [45] Jill House and Nick Youd, Contextually appropriate intonation in speech synthesis, In Bailly and Beno [BB90], pp 185-188.
    [46] Jonathan Allen, M. Sharon Hunnicutt, and D. Klatt, From text to speech—the MITalk system, MIT Press, Cambridge, Massachusetts, 1987.
    [47] Julia Hirschberg and Janet Pierrehumbert, The intonational structuring of discourse. In ACL 86, pp 136-144.
    [48] Julia hirschberg, Pilar Prieto, Training intonational phrasing rules automatically for English and Spanish text-to-speech, speech Communication 18(1996)281-290
    [49] Julia Hirschberg, Pitch accent in context: Predicting intonational prominence from text, Artificial Intelligence, 63:305—340, 1993.
    [50] K. Bartkova and C. Sorin, A model of segmental duration for speech synthesis in French, Speech Communication, 6:245—260, 1987.
    [51] K. Takeda, K. Abe, and Y. Sagisaka, On the basic scheme and algorithms in non-uniform unit speech synthesis, In Bailly and Beno [BB92], pp 93—105.
    [52] Kenneth Church, A stochastic parts program and noun phrase parser for unrestricted text, In Proceedings of the Second Conference on Applied Natural Language Processing, pp 136—143, Austin, Texas, 1988.
    [53] Kim Silverman, The structure and processing of fundamental frequency contours, PhD thesis, Cambridge University, Cambridge, England, 1987.
    [54] L. S. Lee, C. Y. Tseng and C. J. Hsieh, Improved tone concatenation rules in a formant-based Chinese Text-to-speech system, IEEE Tran., On Speech and Audio Processing, Vol. l,No.3, July, 1993
    [55] L. Witten and P. Madams, The telephone inquiry service: a man-machine system using synthetic speech, International Journal of Man-Machine Studies, 9:449-464, 1977.
    [56] M. Elhadad, Using Argumentation to Control Lexical Choice: A Functional Unification-Based Approach. PhD thesis, Computer Science Department, Columbia University, 1992.
    [57] M. Liberman and J. B. Pierrehumbert, Intonational invariance under changes in pitch range and length, In Language Sound Structure, pages 157- 233. MIT Press, 1984.
    [58] M. A. A.Tatham, Review Holmes,J.N.( 1972). Speech Synthesis. London: Mill and Boon, Journal of Phonetics(1973) Vol.1, pp. 369-373
    [59] Manny Rayner, David Carter, The Spoken Language Translator Project, Proc. of the Language Engineering Convention, May, 1995
    [60] Manny Rayner, Pierrette Bouillon, David Carter, Using Corpora to Develop Limited-Domain Speech Translation Systems, Proc. of Translating and the Computer 17(ASLIB), Nov. 1995
    [61] Marcello Balestri, Alberto Pacchiotti, Slivia Quazza, Pier Luigi SalZa and Stefano Sandri, Choose the best to modify the least:a new generation concatenative synthesis system, Proceedings of EuroSpeech 99.
    [62] Mark Tatham, Eric Lewis, Prosodic Assignment in SPRUCE Text-to- Speech Synthesis, Proceedings of the institute of Acoustics, Vol.14, Part6(1992),pp.447-454
    [63] Mark Tatham, Eric Lewis,SPRUCE-High specification Text-to-Speech synthesis, 1998, http://www.cstr.ed.ac.uk
    [64] Mark Tatham, Katherine Morton, Eric Lewis, Assigning of intonation in a high-level speech synthesis, 1998, http://www.cstr.ed.ac.uk
    [65] Mark Tatham, The Problem of Capturing Linguistic and Phonetic Knowledge, Proceedings of Institute of Acoustics, Vol.8, Part7(1986), pp.443- 450
    [66] Michelle Q. Wang and Julia Hirschberg, Automatic classification of intonational phrase boundaries, Computer Speech and Language, 6:175—196, 1992.
    [67] Ming-Yu Lin, Tung-Hui Chiang, and Keh-Yi Su, A preliminary study on unknown word problem in Chinese word segmentation, In ROCLING 6, pages 119-141. ROCLING, 1993.
    [68] Morton, Expectations for Assessment Techniques Applied to Speech Synthesis, Proceeedings of the Institute of Acoustics. Vol.13 Part2(1991)
    [69] Morton, P.K.J., Pragmatic Phonetics, in W.A. Ainsworth (ed.), Advances in Speech, Hearing and Language Processing (London, JAI Press, 1992), pp. 17-55
    [70] Morton, P. K.J., with M.A.A. Tatham, Speech synthesis in dialogue systems, in P.Dalsgaard (ed.), Spoken Dialogue Systems (Visgo, Denmark, ESCA, 1995), pp. 221-25
    [71] Morton, the British Telecom REsearch Laboratorie Text-to-Speech Synthesis System-1984- 1986[Part I], reproduced from 'speech production and synthesis'-unpublished Ph.D. thesis, University of Essex, 1987
    [72] Morton, The Relationship between Generative Grammar and a Speech Production Model, Phd thesis, 1987
    [73] N. Iwahashi and Y. Sagisaka, Duration modeling with multiple split regression, In Eurospeech '93, Proceedings of the Third European Conference on Speech Communication and Technology, volume 1, pages 329— 332, Berlin, September 1993.
    [74] N. Iwahashi, N. Kaiki and Y. Sagisaka, Speech segment slection for concatenative synthesis based on spectral distortion minimization, IEICE Trans. Fundamentals, Vol. E76-A,No.11, Nov., 1993
    [75] N. Kaiki, K. Takeda, and Y. Sagisaka, Linguistic properties in the control of segmental duration for speech synthesis, In Bailly and Beno [BB92], pages 255-264.
    [76] Paul Taylor and Amy Isard, SSML:A Speech Synthesis Markup Language, http://www.cstr.ed.ac.uk
    [77] Paul Tayor and Alan W Black, Assigning Phrase Breaks from Part-of- Speech Sequences, 1998, http://www.cstr.ed.ac.uk
    [78] R. Sproat and J. Olive, An Approach to Text-To-Speech Synthesis, In Speech Coding and Synthesis, pp 611-633. Elsevier, 1995.
    [79] R. Van Bezooijen and L. C. W. Pols, Evaluation of text-to-speech systems: some methodological aspects, Speech Communication, 9: 263—270, 1990.
    [80] R.L.特拉斯克,语音学和音系学词典,第一版,语文出版社、2000.1
    [81] R. Sproat, A. Hunt, M. Ostendorf, P. Taylor, A. Black, K. Lenzo, M. Edgington, SABLE: a standard for TTS markup, http://www.cstr.ed.ac.uk
    [82] Richard Sproat, Chilin Shih, William Gale, and Nancy Chang, A stochastic finite-state word-segmentation algorithm for Chinese, In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pages 66—73, Las Cruces, New Mexico, 1994. Association for Computational Linguistics.
    [83] Richard Sproat, Julia Hirschberg, and David Yarowsky, A corpus-based synthesizer, In ICSLP [ICS92], pages 563—566.
    [84] Richard Sproat, Paul Tayor, Michael Tanenblatt, Amy Isard, A markup language for Text-to-Speech synthesis, http://www.bell-labs.com/project/tts/stml.html
    [85] S. A. Prevost and M. J. Steedman, Specifying intonation from context for speech synthesis, Speech Communication, 15(1-2), 1994.
    [86] S. Grau, C. d'Alessandro, and G. Richard, A speech formant synthesizer based on harmonic+random formant-waveforms representation, Proc. Eurospeech-93, pp. 1697-1700, 1993.
    [87] S. H. Hwang and S. H. Chen, A Prosodic Model of Mandarin Speech and its Application to Pitch Level Generation for Text-to-Speech, ICASSP, pp. 616-619, 1995
    [88] S. Parthasarathy and C. H. Coker, Automatic estimation of articulatory parameters, Computer Speech and Language, 6: 37—75, 1992.
    [89] Spiegel M. F., Altom M. J. and Macchi M. J., Comprehensive assessment of the telephone intelligibility of synthesized and natural speech, Speech Communication, No. 9, pp 279-291, 1990
    [90] Stephen E. Levinson, Speech synthesis in telecommunications, IEEE Communications Magazine, 31: 46-53, November 1993.
    [91] Susanna Varho, Paavo Alku, Separated Linear Predication-A new all-pole modelling technique for speech synthesis, speech Communication 24(1998)111-121
    [92] T. Dutoit and B. Gosselin, On the use of a hybrid harmonic/stochastic model for tts synthesis by concatenation, Speech Communication, 19:119-143, 1996.
    [93] T. Dutoit and H. Leich, MBR-PSOLA: Text-to-speech synthesis based on an MBEre-synthesis of the segments database, Speech Communication, 13:432-440, 1993.
    [94] T. Fujisaki and H. Kawai, Realization of linguistic information in the voice fundamental frequency contour of the spoken Japanese. In Proceedings of the 1988 International Conference on Acoustics, Speech, and Signal Processing, pp 663-666, New York, 1988.
    [95] Tatham M.A.A., with Morton, P.K.J., Natural voice output in interactive information systems, Proceedings of the Institute of Acoustics Vol.18: Part 9 (St. Albans, IoA,1996), pp.43-50
    [96] Tatham, M.A.A., with Lewis, E., Prosodics in a syllable-based text-to-speech synthesis system, Proceedings of the 1992 International Conference on Spoken Language Processing (Banff, Canada, 1992).
    [97] Tony Vitale, An algorithm for high accuracy name pronunciation by parametric speech synthesizer, Computational Linguistics, 17:257—276, 1991.
    [98] W. N. Campbell and S. D. Isard, Segment durations in a syllable frame, Journal of Phonetics Computation Speech and Language, 19:37—47, 1991.
    [99] Xuedong Huang, ALex acero, Jim Adcock, Hsiao-Wuen Hon, John Goldsmith, Jingsong Liu and Mike Plumpe, WHISTLER: A trainable text-to-speech synthesis, International Conference of Spoken Language Processing, Philadelphia, 1996
    [100] Y. Yamashita, M. Tanaka, Y. Amako, Y. Nomura, Y. Ohta, A. Kitoh, O. Kakusho, and R. Mizoguchi, Tree-based approaches to automatic generation of speech synthesis rules for prosodic parameters, Trans. IEICE, E76-A(l 1): 1934--1941, 1993.
    [101] Yannis Stylianou, Thierry Dutoit and Juergen Schroeter, Diphone Concatenation using a Harmonic plus Noise Model of Speech, http://www.research.att.com
    [102] Yoshinori Sagisaka, Nobuyoshi Kaiki, Naoto Iwahashi, and Katsuhiko Mimura, ATR-Talk speech synthesis system. In ICSLP [ICS92], pp 483—486.
    [103] 蔡莲红,魏华武,周俏峰,汉语文语转换中的语言学处理,中文信息学报,Vol9 No.1
    [104] 曹木,姚天顺,英汉翻译系统上的汉语自然语音输出系统ECTRAN/SO,计算机研究与发展,1997.9,Vol.34,No.9
    [105] 曹阳,黄泰翼,基于小波变换的基频提取和连续语音中基频变化模式的分析,第四届全国人机语音通讯学术会议论文集,1996,pp 271-276
    [106] 陈芳、袁宝宗,智能语音生成系统的研究与实现,第四届全国人机语音通讯学术会议,北京,1996
    [107] 陈力为,汉语书面语的分词问题,中文信息学报,1996,10(1):11-13
    [108] 陈永彬,王仁华,语言信号处理,中国科技大学出版社,1990
    [109] 陈肇雄,SC文法功能体系,计算机学报,1992,15(11):801-808
    [110] 初敏,高清晰度高自然度汉语文语转换系统的研究,中科院声学所博士论文,1995
    [111] 丛明,王树龙,规则引导的波形边界语音合成,中文信息,1992.1
    [112] 董世伟,汉语语音合成系统评测方法,硕士学位论文,中国科学院声学研究所,1997
    [113] 董世伟,张家禄,汉语语音合成系统性能评测,智能计算机接口与应用进展,吴泉涌,钱跃良 主编,电子工业出版社,pp.188-194,1997
    [114] 董振东,汉语分词研究漫谈,语言文字应用,1997.1
    [115] 黄昌宁,中文信息处理中的分词问题,语言文字应用,1997.1
    [116] 黄金法,汉语语音合成系统的研究,清华大学硕士论文,1987
    [117] 揭春雨,刘源,梁南元,论汉语自动分词方法,中文信息学报,
    [118] 林焘,王理嘉,语音学教程,北京大学出版社,1992
    [119] 林青,高庆狮,汉语同音现象的统计与分析,Chinese Information Processing,1989.4
    [120] 刘开瑛,现代汉语自动分词评测技术研究,语言文字应用,1997.1
    [121] 刘源等,信息处理用现代汉语分词规范及自动分词方法,清华大学出版社.1994
    [122] 鲁宏伟,余胜生,周敬利,闵小平,基于TMS320C50语音信号基音周期估计的实时算法,小型微型计算机系统,1997.5 Vol.18 No.5
    [123] 罗万伯,杨家沅等,用SP1000构造语音合成和识别系统,中文信息,1988.8
    [124] 罗万伯,杨家沅等,汉语语音合成中多音字的处理,Chinese Information Processing,1991.1
    [125] 罗万伯,杨家沅等,现代汉字音节总数的研究,中文信息,1989.1
    [126] 罗小强,王仁华,汉语音节全分词算法,中国科学技术大学学报,1995.6 Vol25.No.2
    [127] 马丹耕,汉语文本-语音转换系统的研究与实现,清华硕士学位论文,1989
    [128] 马晏,基于评价的汉语自动分词系统的研究和实现,清华大学硕士论文,1991
    [129] 聂采涛,杨丽珑,李瑜,32位微机话音数据综合传输局域网的设计和实现,计算机学报,1994.5 Vol.17 No.6
    [130] 邱光谊等编著,《汉语信息处理》人民邮电出版社,1988年版
    [131] 申凌,毛育,张全,刘志文,黄曾阳,汉语语音理解系统,第四届全国人机语音通讯学术会议,北京,1996
    [132] 沈炯,汉语语势重音的音理,语文研究,1994年第3期,pp 10-15
    [133] 石纯一,黄昌宁 等,人工智能原理,清华大学出版社,1993
    [134] 司宏岩,吕士楠,汉语文语转换系统学术论文汇编,中科院声学所,1995
    [135] 王海峰,高文,李生,口语机器翻译研究综述,计算机科学 1998 Vol.25.No.5
    [136] 王晓龙,王开铸,李仲荣,白小华,最小分词问题及其解法,科学通报,1989
    [137] 吴宗济,林茂灿,实验语音学概要,高等教育出版社,1989
    [138] 许军,袁保宗等,高自然度的语音合成广播系统,第五届全国语音图象通讯信号处理会议论文集,pp 63-66
    [139] 杨家沅等,连续英汉语音翻译系统的设计与实现,声学学报,Vol.17,No.5,1992,pp.327-333
    [140] 杨行峻,迟惠生等,语音信号数字处理,电子工业出版社,1995
    [141] 杨玉芳,句法边界的韵律学表现,声学学报,1997.9 Vol.22,No.8
    [142] 张家禄,汉语文语转换系统的语音规则和声学参数,声学学报,Vol.15,No.2,1990.3
    [143] 张家禄,齐士钤,俞舸,汉语语音合成系统评测方法,声学学报,1998.1 Vol.23.No.1
    [144] 中国社会科学院语言研究所词典编辑室编,<现代汉语词典>,商务印书馆,1983年版。
    [145] 周同春 汉语语音学 内部资料 1989.9
    [146] 朱学峰,俞士汶,自动翻译电话与口语研究,“人工智能新进展”,何新贵主编,清华大学出版社,1995
    [147] 宗成庆、黄泰翼、徐波 口语自动翻译系统技术评析 中文信息学报 1999、Vol 13 No2 p56-65
    [148] 章森,基于SC文法的文语转换系统的研究,中国科学院计算技术研究所,1998
    [149] 杨顺安,面向声学语音学的普通话语音合成技术,社会科学文献出版社,1994.3

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700