基于规则及N-gram模型的数字数据转换成哈萨克语读音文字的方法

英文篇名：The Rules and N-gram Models for Conversion Digital Numbers into Khazahk Language Texts
作者：热木土拉·麦麦提 ; 古丽尼尕尔·买合木提 ; 努尔波拉提·胡安 ; 艾斯卡尔·艾木都拉
英文作者：Rehmutulla Memet;Gvlnigar Mehmud;Nurbolat Huan;Askar Hamdulla;School of information science and engineering, Xinjiang University;
关键词：哈萨克语 ; 数字读音 ; 规则库 ; N-gram
英文关键词：Kazakh;;Digital Pronunciation;;Rule-base;;N-gram
中文刊名：DNZS
英文刊名：Computer Knowledge and Technology
机构：新疆大学信息科学与工程学院;
出版日期：2017-05-15
出版单位：电脑知识与技术
年：2017
期：v.13
语种：中文;
页：DNZS201714070
页数：3
CN：14
ISSN：34-1205/TP
分类号：164-165+174

摘要

语音合成是哈萨克文信息处理技术的一个重要研究领域。哈萨克文本中的阿拉伯数字转换为其读音文本是语音合成中重要的预备工作。该文利用规则库和N-gram,实现了文本当中的各类数字正确的转换到读音,为哈萨克语语音合成研究,提供了高质量的数字读音文本。希望通过该文提供的方法来提高哈萨克文以及相似特性的其他语种的语音合成的质量。
Speech synthesis is an important research field of Kazakh information processing technology. Converting the Arabic numerals in the Kazakh text to their pronunciation text is considered as an important preparatory work in speech synthesis. In this paper, the Rule-base and N-gram methods are used to realized the correct conversion of all kinds of numbers into the pronunciation, which provides high quality digital pronunciation text for Kazakh speech synthesis. It is hoped that the quality of speech synthesis in Kazakh and other languages with similar characteristics will be improved by using the methods provided in this paper.

引文

[1]木合亚提·尼亚孜别克,古力沙吾利.哈萨克文信息处理的现状和发展方向[J].中文信息学报,2010,24(4):111-114.
    [2]冯志伟.文本连贯中的常识推理研究[C]//hnc与语言学研究学术研讨会,2005.
    [3]木合亚提·尼亚孜别克,古力沙吾利.哈萨克文信息处理现状中的若干问题探讨[J].智能计算机与应用,2011,1(6):45-46.
    [4]木合亚提·尼亚孜别克,古力沙吾利,古丽拉·阿东别克,等.哈萨克文语料库管理系统设计与实现[J].西南师范大学学报:自然科学版,2012,37(11):37-40.
    [5]牛宁宁.哈萨克语兼类词词性标注研究[D].乌鲁木齐:新疆大学,2014.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700