Spoken Language Identification with Phonotactics Methods on Minangkabau, Sundanese, and Javanese Languages
文摘
Research in the field of spoken language identification (spoken LID) on local languages helps to extend the outreach of technology to local language speakers. This research also contributes to the preservation of local languages. In this paper, we report our work on identifying spoken data in three local Indonesian languages: Minangkabau, Sundanese and Javanese. Statistical phonotactics models are created to map the speech signals into the language used by the speaker. We use two phonotactics methods, namely Phone Recognition followed by Language Modelling (PRLM) and Parallel Phone Recognition followed by Language Modelling (PPRLM). PRLM method shows the highest accuracy using the phone recognizer trained for English and Russian with the average of 77.42% and 75.94% respectively.