J Shanghai Jiaotong Univ Sci ›› 2026, Vol. 31 ›› Issue (2): 265-272.doi: 10.1007/s12204-024-2730-3

Special Issue: 人机语音通讯

• Automation & Computer Technologies • Previous Articles     Next Articles

Exploring Generation of Pronunciation Lexicon for Low-Resource Language Automatic Speech Recognition Based on Generic Phone Recognizer

基于通用音素识别器的低资源语言发音词典生成探索

李金朋1,陈谐2,张卫强1   

  1. 1. Department of Electronic Engineering, Tsinghua University, Beijing 100084, China; 2. Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
  2. 1. 清华大学 电子工程系,北京100084;2. 上海交通大学 计算机科学与工程系,上海200240
  • Received:2023-12-19 Accepted:2024-01-05 Online:2026-04-01 Published:2024-04-22

Abstract: The lexicon is an essential component in the hybrid automatic speech recognition (ASR) system. However, a high-quality lexicon requires significant efforts from the linguistic experts and is difficult to obtain, especially for low-resource languages. This paper addresses the problem of using a well-trained universal phone recognizer, obtained through the training of multilingual speech data and pronunciation lexicons, to generate pronunciation lexicons for low-resource languages driven by speech data. We propose a simple pipeline that utilizes this approach to generate pronunciation lexicons and apply them into ASR systems. The steps to generate the lexicon are simple and generic: applying the International Phonetic Alphabet (IPA) phone recognizer on the speech, then aligning it with the reference word sequence, followed by filtering to obtain a series of AUTO-subwords, using them to generate the AUTO-subword lexicon and the AUTO-IPA lexicon. We used the pronunciation lexicon generated for the hybrid system and for fine-tuning the pre-trained model. According to the experiment results, we are able to construct the lexicon without resourcing to linguistic experts. Furthermore, the generated lexicon is able to outperform grapheme-based lexicon and is comparable to expert lexicon.

Key words: International Phonetic Alphabet (IPA), lexicon learning, phone recognition, low-resource speech recognition

摘要: 发音词典是传统混合自动语音识别系统的重要组成部分。然而,高质量词典需要语言专家的精心标注,通常难以获得,特别是对于低资源语言。本文要解决的问题是,如何利用多语言语音数据和发音词典训练获得的通用音素识别器,通过语音数据驱动的方式为低资源语言生成发音词典。提出了一个简易的方案来生成发音词典,并将其应用到自动语音识别系统中。生成词典步骤是通用的:首先,在语音数据上使用国际音标(IPA)音素识别器,然后将音素识别结果与参考文本进行对齐,接着进行过滤以获得一系列子词,利用来生成AUTO-subword词典和AUTO-IPA词典。将生成的发音词典用于混合系统和微调预训练模型。实验结果表明,能够在无需语言专家资源的情况下构建词典,并应用到语音识别系统中。此外,生成词典的性能优于基于字素的词典,并可与专家词典相媲美。

关键词: 国际音标,发音词典学习,音素识别,低资源语音识别

CLC Number: