J Shanghai Jiaotong Univ Sci ›› 2022, Vol. 27 ›› Issue (1): 90-98.doi: 10.1007/s12204-021-2376-3

• • 上一篇    下一篇

  

  • 收稿日期:2021-01-18 出版日期:2022-01-28 发布日期:2022-01-14
  • 通讯作者: LI Yongfu?(李永福),yongfu.li@sjtu.edu.cn

Enhancing Speech Recognition for Parkinson’s Disease Patient Using Transfer Learning Technique

YU Qing (余青), MA Yi (马祎), LI Yongfu∗ (李永福)   

  1. (a. Department of Micro-Nano Electronics; b. MoE Key Lab of Artificial Intelligence, Shanghai Jiao Tong University, Shanghai 200240, China)
  • Received:2021-01-18 Online:2022-01-28 Published:2022-01-14

Abstract: Parkinson’s disease patients suffer from disorders of speech. The most frequently reported speech problems are weak, hoarse, nasal or monotonous voice, imprecise articulation, slow or fast speech, difficulty starting speech, impaired stress or rhythm, stuttering, and tremor. To improve the speech quality and assist the patient with speech rehabilitation therapy, we have proposed the speech recognition model for Parkinson’s disease patients using transfer learning technique (PSTL), where we have pre-trained the long short-term memory (LSTM) neural network model with our developed publicly available dataset that has been obtained from healthy people through the social media platform. Then, we applied the transfer learning technique to improve the performance of the PSTL framework. The frequency spectrogram masking data augmentation method has been used to alleviate the over-fitting problem so that the word error rate (WER) is further reduced. Even with a limited dataset, our proposed model has effectively reduced the WER from 58% to 44.5% on the original speech dataset and 53.1% to 43% on the denoised speech dataset, which demonstrated the feasibility of our framework.

Key words: speech recognition, parkinson’s disease, transfer learning technique, data augmentation, scarce data

中图分类号: