Издательство — Речевые технологии — Выпуск №1/2016
Sazhok М., Robeiko V. Modeling of language distinctive features for Ukrainian real-time speech recognition system
The presented research is focused on features that are specific for most Slavonic languages and for Ukrainian particularly. Given arguments confirm the necessity to distinguish stressed and unstressed vowels in the phoneme alphabet. Lexical stress irregularity implies expert involvement for stress assignment. To automate this procedure we propose a data-driven stress prediction algorithm that represents words as sequences of substrings (morphemes). The formulated criterion that validates a substring sequence is based on a set of words with manually pointed stresses and a large text corpus. The described search algorithm finds N-best symbol sequences with a hypothetical stress. As a Slavonic language, Ukrainian is highly inflective and tolerates relatively free word order. These features motivate transition from word- to class-based statistical language model. Spontaneous speech recognition experiments confirmed efficiency of the stressed phoneme introduction and performance comparability of both class and word n-gram language models. We also describe several tools developed to visualize HMMs, to predict word stress, and to manage equivalence class-based language modeling. • spontaneous speech recognition • real-time • stress prediction • word equivalence classes • language models
|