A hierarchical language identification system for Indian languages
详细信息查看全文 | 推荐本文 |
摘要
Automatic spoken Language IDentification (LID) is the task of identifying the language from a short duration of speech signal uttered by an unknown speaker. In this work, an attempt has been made to develop a two level language identification system for Indian languages using acoustic features. In the first level, the system identifies the family of the spoken language, and then it is fed to the second level which aims at identifying the particular language in the corresponding family. The performance of the system is analyzed for various acoustic features and different classifiers. The suitable acoustic feature and the pattern classification model are suggested for effective identification of Indian languages. The system has been modeled using hidden Markov model (HMM), Gaussian mixture model (GMM) and artificial neural networks (ANN). We studied the discriminative power of the system for the features mel frequency cepstral coefficients (MFCC), MFCC with delta and acceleration coefficients and shifted delta cepstral (SDC) coefficients. Then the LID performance as a function of the different training and testing set sizes has been studied. To carry out the experiments, a new database has been created for 9 Indian languages. It is shown that GMM based LID system using MFCC with delta and acceleration coefficients is performing well with 80.56%accuracy. The performance of GMM based LID system with SDC is also considerable.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700