汉语中介语语音库的文本设计

英文篇名：Text Design for Non-native Chinese Speech Corpora
作者：王玮 ; 张劲松
英文作者：Wang Wei;Zhang Jinsong;
关键词：汉语中介语语音 ; 语音语料库 ; 文本设计
英文关键词：interlanguage phonology of Chinese;;speech corpora;;text design
中文刊名：SJHY
英文刊名：Chinese Teaching in the World
机构：北京语言大学语言资源高精尖创新中心;新疆大学人文社科基地;
出版日期：2019-01-05
出版单位：世界汉语教学
年：2019
期：v.33
基金：语言资源高精尖创新中心项目“面向智能语音教学的汉语中介语语音多模态语料库研究”(KYR17005);; 北京语言大学重大基础研究专项(16ZDJ03);北京语言大学研究生创新基金项目“俄罗斯留学生汉语中介语语音语料库的构建”(16YCX221)的资助
语种：中文;
页：SJHY201901011
页数：13
CN：01
ISSN：11-1473/H
分类号：106-118

摘要

中介语语音库的构建面临一个文本设计问题:既要涵盖目的语的基本语音要素,又要严控其大小。因为文本过大,单人收录成本过高,不利于收集更多的话者被试。本文介绍我们面向汉语中介语语音库构建所提出的一套文本方案:在控制文本集大小的前提下,最大限度地涵盖更多的语音要素。除了单音节、音韵平衡双音节文本外,该方案还包含一个覆盖音段、声调、调联三音子、焦点语调且文本难度适中的最小句子集合,该句子集合由计算机算法从大语料库中搜索产生。我们认为基于这个录音文本,不仅能够收集到二语习得研究所关注的各种语音现象,还能够比较方便地获得大量的话者数量,从而更好地服务于语音习得及计算机辅助语音教学研究。
Text design is challenging in the work of building a non-native speech corpus,as it is desired to cover as many as possible the essential phonetic elements in a small text size.This paper proposes a text design which covers rich phonetic elements of Chinese Putonghua with a small text size for the purpose of developing non-native speech corpora.The designed text consists of monosyllables,phonetically balanced disyllables,a minimum sentence set,a short passage and a few other sentences.The minimum sentence set was selected by agreedy search algorithm from texts of primary to medium levels of Chinese textbooks,and it has a rich coverage of Chinese segments,lexical tones,tri-tones,prosody phrases and different intonation patterns.The text offers a possibility to collect various kinds of speech phenomena,and simultaneously guarantee the easiness of collecting more speakers' data.These two advantages are beneficial to studies of Second Language Acquisition and Computer-assisted Speech Teaching technology.

引文

边卫花、曹文(2007)日本人产生普通话r声母和l声母的音值考察,第九届全国人机语音通讯学术会议(NCMMSC2007)论文,中国科技大学。
    曹剑芬(1996)普通话语音的环境音变与双音子和三音子结构,《语言文字应用》第2期。
    曹文(2010)《汉语焦点重音的韵律实现》,北京:北京语言大学出版社。
    曹文、张劲松(2009)面向计算机辅助正音的汉语中介语语音语料库的创制与标注,《语言文字应用》第4期。
    陈默(2013)美国留学生汉语口语产出的韵律边界特征研究,《世界汉语教学》第1期。
    程棠(1996)对外汉语语音教学中的几个问题,《语言教学与研究》第3期。
    金哲俊(2014)朝鲜族学生汉语单音字声调发音的统计分析,《汉语学习》第2期。
    李倩、曹文(2007)日本学生汉语单字调的阳平与上声,第九届全国人机语音通讯学术会议论文,中国科技大学。
    林焘(1996)语音研究和对外汉语教学,《世界汉语教学》第3期。
    刘艺(2014)汉语学习者陈述句语调音高的声学实验分析,《汉语学习》第1期。
    鲁健骥(2010)对外汉语语音教学几个基本问题的再认识,《大理学院学报》第5期。
    冉启斌、顾倩、马乐(2016)国别典型汉语语音偏误及口音汉语在线系统开发,《语言教学与研究》第4期。
    沈炯(1985)北京话声调的音域和语调,林焘、王理嘉等著《北京语音实验录》,北京:北京大学出版社。
    沈炯(1995)汉语音高系统的有声性和区别性,《语言文字应用》第2期。
    王功平(2009)留学生普通话双音节轻声音高偏误实验,《语言文字应用》第4期。
    王韫佳(2009)日本学习者对汉语普通话“相似元音”和“陌生元音”的习得,《世界汉语教学》第2期。
    熊子瑜、林茂灿(2001)语流间断处的韵律表现,第六届全国人机语音通讯学术会议(NCMMSC2001)论文,深圳大学。
    许家金(2017)语料库研究学术源流考,《外语教学与研究》第1期。
    严彦(2010)美国学生习得第三声的声调情境变异研究,《汉语学习》第1期。
    袁家宏(2017)大规模语音语料库的采集、处理和研究,《语言学研究》第1期。
    赵贤州、李卫民(1990)《对外汉语教材教法论》,上海:上海外语教育出版社。
    Biber,Douglas,Susan Conrad&Randi Reppen(1998)Corpus linguistics:Investigating language structure and use.New York:Cambridge University Press.(《语料库语言学》,刘颖、胡海涛译,北京:清华大学出版社,2012年。)
    Carranza,Mario,Catia Cucchiarini,Pepi Burgos&Helmer Strik(2014)Non-native speech corpora for the development of computer assisted pronunciation training systems.Paper presented at 6th Annual International Conference on Education and New Learning Technologies.7-9June,2014.Barcelona,Spain.
    Chen,Nancy F.,Rong Tong,Darren Wee,Peixuan Lee,Bin Ma&Haizhou Li(2015)iCALL corpus:Mandarin Chinese spoken by non-native speakers of European descent.Paper presented at 16th Annual Conference of International Speech Communication Association(Interspeech).6-10September,2015.Dresden,Germany.
    Chun,Dorothy,Yan Jiang,Justine Meyr&Rong Yang(2015)Acquisition of L2Mandarin Chinese tones with learner-created tone visualizations.Journal of Second Language Pronunciation 1:86-114.
    Da,Jun(2004)Chinese Text Computing.Available at:http://lingua.mtsu.edu/chinese-computing/(3Dec,2017).
    Ding,Hongwei(2012)Perception and production of Mandarin disyllabic tones by German learners.Paper presented at Speech Prosody Sixth International Conference.22-25May,2012.Shanghai,China.
    Fujisaki,Hiroya(2004)Information,prosody,and modeling---with emphasis on tonal features of speech.Paper presented at the 2nd International Conference on Speech Prosody.23-26 March,2004.Nara,Japan.
    Guo,Lijuan&Liang Tao(2008)Tone production in Mandarin Chinese by American students:A case study.Paper presented at proceedings of the 20th North American Conference on Chinese Linguistics(NACCL-20).25-27April,2008.Ohio,U.S.A.
    Gut,Ulrike(2007)Learner corpora in second language research and teaching.Paper presented at non-native prosody:Phonetic description and teaching practice workshop.4-5 March,2007.Berlin:Mouton de Gruyter.
    Kikuko,Nishina(2004)Development of Japanese speech database read by non-native speakers for constructing CALL system.Paper presented at International Commission for Acoustics(ICA).2-7June,2004.Kyoto,Japan.
    Lee,Kai-Fu(1990)Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition.IEEE Transaction on Acoustics,Speech and Signal Processing38:599-609.
    Meng,Helen,Yee Lo Yuen,Lan Wang&Wing Yiu Lau(2007)Deriving salient learners mispronunciations from cross-language phonological comparisons.Paper presented at IEEE Automatic Speech Recognition&Understanding(ASRU)Workshop.9-13December,2007.Kyoto,Japan.
    Menzel,Wolfgang(2000)The ISLE corpus of non-native spoken English.Paper presented at Language Resources&Evaluation Conference(LREC).31May-2June,2000.Athens,Greece.
    Teixeira,Carlos,Isabel Trancoso&Antonio Serralheiro(1997)Recognition of non-native accents.Paper presented at Eurospeech.22-25September,1997.Rhodes,Greece.
    Tseng,Chiu-yu&Tanya Visceglia(2010)AESOP(Asian English Speech Corpus Project)and TWNAE-SOP.Paper presented at International Conference and Workshop on TEFL&Applied Linguistics.4-5March,2010.Taiwan:Ming Chuan University.
    Wang,Yow-Bang&Lin-Shan Lee(2012)Improved approaches of modeling and detecting error patterns with empirical analysis for computer-aided pronunciation training.Paper presented at IEEE International Conference on Acoustics,Speech and Signal Processing.25-30March,2012.Kyoto,Janpan.
    Wu,Bin,Yanlu Xie,Lulu Lu,Chong Cao&Jinsong Zhang(2016)The construction of a Chinese interlanguage corpus.Paper presented at Oriental COCOSDA.26-28Oct,2016.Bali,Indonesia.
    Zhang,Jinsong&Satoshi Nakamura(2008)An improved greedy search algorithm for the development of a phonetically rich speech corpus.IEICE Transactions on Information&Systems 3:615-630.
    Zou,Ting,Jinsong Zhang&Wen Cao(2012)A comparison study on F0distribution of tone 2and tone 3in Mandarin disyllables by native speakers and Japanese.Paper presented at 8th International Symposium on Chinese Spoken Language Processing(ISCSLP).5-8December,2012.Kowloon Tong,China.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700