J Shanghai Jiaotong Univ Sci ›› 2022, Vol. 27 ›› Issue (2): 160-167.doi: 10.1007/s12204-021-2384-3

• • 上一篇    下一篇

  

  • 收稿日期:2020-12-08 出版日期:2022-03-28 发布日期:2022-05-02
  • 通讯作者: YUAN Zhenming* (袁贞明),zmyuan@hznu.edu.cn

Spontaneous Language Analysis in Alzheimer’s Disease:Evaluation of Natural Language Processing Technique for Analyzing Lexical Performance

LIU Ning1,2 (刘宁), YUAN Zhenming1,3 * (袁贞明)   

  1. (1. School of Public Health, Hangzhou Normal University, Hangzhou 311121, China; 2. Department of Mathematics and Computer Science; Fujian Provincial Key Laboratory of Data-Intensive Computing, Quanzhou Normal University, Quanzhou 362000, Fujian, China; 3. School of Information Science and Technology, Hangzhou Normal University, Hangzhou 311121, China)
  • Received:2020-12-08 Online:2022-03-28 Published:2022-05-02

Abstract: Language disorder, a common manifestation of Alzheimer’s disease (AD), has attracted widespread attention in recent years. This paper uses a novel natural language processing (NLP) method, compared with latest deep learning technology, to detect AD and explore the lexical performance. Our proposed approach is based on two stages. First, the dialogue contents are summarized into two categories with the same category. Second,term frequency - inverse document frequency (TF-IDF) algorithm is used to extract the keywords of transcripts,and the similarity of keywords between the groups was calculated separately by cosine distance. Several deep learning methods are used to compare the performance. In the meanwhile, keywords with the best performance are used to analyze AD patients’ lexical performance. In the Predictive Challenge of Alzheimer’s Disease held by iFlytek in 2019, the proposed AD diagnosis model achieves a better performance in binary classification by adjusting the number of keywords. The F1 score of the model has a considerable improvement over the baseline of 75.4%, and the training process of which is simple and efficient. We analyze the keywords of the model and find that AD patients use less noun and verb than normal controls. A computer-assisted AD diagnosis model on small Chinese dataset is proposed in this paper, which provides a potential way for assisting diagnosis of AD and analyzing lexical performance in clinical setting.

Key words: natural language processing (NLP), Alzheimer’s disease (AD), mild cognitive impairment, term frequency - inverse document frequency (TF-IDF), bag of words

中图分类号: