J Shanghai Jiaotong Univ Sci ›› 2022, Vol. 27 ›› Issue (1): 90-98.doi: 10.1007/s12204-021-2376-3
收稿日期:
2021-01-18
出版日期:
2022-01-28
发布日期:
2022-01-14
通讯作者:
LI Yongfu?(李永福),yongfu.li@sjtu.edu.cn
YU Qing (余青), MA Yi (马祎), LI Yongfu∗ (李永福)
Received:
2021-01-18
Online:
2022-01-28
Published:
2022-01-14
中图分类号:
. [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(1): 90-98.
YU Qing (余青), MA Yi (马祎), LI Yongfu∗ (李永福). Enhancing Speech Recognition for Parkinson’s Disease Patient Using Transfer Learning Technique[J]. J Shanghai Jiaotong Univ Sci, 2022, 27(1): 90-98.
[1] | TRAN J, ANASTACIO H, BARDY C. Genetic predispositionsof Parkinson’s disease revealed in patientderivedbrain cells [J]. Npj Parkinson’s Disease, 2020,6: 8. |
[2] | DASHTIPOUR K, TAFRESHI A, LEE J, et al. Speechdisorders in Parkinson’s disease: Pathophysiology,medical management and surgical approaches [J]. NeurodegenerativeDisease Management, 2018, 8(5): 337-348. |
[3] | HO A K, IANSEK R, MARIGLIANI C, et al. Speechimpairment in a large sample of patients with Parkinson’sdisease [J]. Behavioural Neurology, 1998, 11(3):131-137. |
[4] | ESPA?NA-BONET C, FONOLLOSA J A R. Automaticspeech recognition with deep neural networksfor impaired speech [M]//Advances in speech andlanguage technologies for Iberian languages. Cham:Springer, 2016: 97-107. |
[5] | Y?LMAZ E, GANZEBOOM M, CUCCHIARINI C, etal. Multi-stage DNN training for automatic recognitionof dysarthric speech [C]//Interspeech 2017. Stockholm:ISCA, 2017: 2685-2689. |
[6] | KONS Z, SHECHTMAN S, SORIN A, et al. NeuralTTS voice conversion [C]//2018 IEEE Spoken LanguageTechnology Workshop (SLT). Athens: IEEE,2018: 290-296. |
[7] | MORO-VELAZQUEZ L, CHO J, WATANABE S, etal. Study of the performance of automatic speechrecognition systems in speakers with Parkinson’s disease[C]//Interspeech 2019. Graz: ISCA, 2019: 3875-3879. |
[8] | PANAYOTOV V, CHEN G G, POVEY D, et al. Librispeech:An ASR corpus based on public domain audiobooks [C]//2015 IEEE International Conferenceon Acoustics, Speech and Signal Processing (ICASSP).South Brisbane: IEEE, 2015: 5206-5210. |
[9] | RUSZ J, CMEJLA R, RUZICKOVA H, et al. Quantitativeacoustic measurements for characterization ofspeech and voice disorders in early untreated Parkinson’sdisease [J]. The Journal of the Acoustical Societyof America, 2011, 129(1): 350-367. |
[10] | BAYESTEHTASHK A, ASGARI M, SHAFRAN I,et al. Fully automated assessment of the severity ofParkinson’s disease from speech [J]. Computer Speech& Language, 2015, 29(1): 172-185. |
[11] | OROZCO-ARROYAVE J R, ARIAS-LONDO?NO J D,VARGAS-BONILLA J F, et al. New Spanish speechcorpus database for the analysis of people sufferingfrom Parkinson’s disease [C]//International Conferenceon Language Resources & Evaluation. Reykjavik:ELRA, 2014: 342-347. |
[12] | MORO-VELAZQUEZ L, GOMEZ-GARCIA J A,GODINO-LLORENTE J I, et al. A forced Gaussiansbased methodology for the differential evaluation ofParkinson’s Disease by means of speech processing [J].Biomedical Signal Processing and Control, 2019, 48:205-220. |
[13] | Adobe. Adobe Audition CC Help [M]. San Jose: AdobeInc., 2018. |
[14] | RIX A W, BEERENDS J G, HOLLIER M P, et al.Perceptual evaluation of speech quality (PESQ)-a newmethod for speech quality assessment of telephone networksand codecs [C]//2001 IEEE International Conferenceon Acoustics, Speech, and Signal Processing.Salt Lake City, UT: IEEE, 2001: 749-752. |
[15] | TAAL C H, HENDRIKS R C, HEUSDENS R,et al. An algorithm for intelligibility predictionof time-frequency weighted noisy speech [J]. IEEE Transactions on Audio, Speech, and Language Processing,2011, 19(7): 2125-2136. |
[16] | READ J, MAZZONE E, HORTON M. Recognitionerrors and recognizing errors - children writingon the tablet PC [C]//Human-Computer Interaction-INTERACT 2005. Rome: IFIP TC13, 2005: 1096-1099. |
[17] | PARK D S, CHAN W, ZHANG Y, et al. SpecAugment:A simple data augmentation method for automaticspeech recognition [C]//Interspeech 2019. Graz:ISCA, 2019: 2613-2617. |
[18] | FLANAGAN J L. Speech synthesis [M]//Speech analysissynthesis and perception. Berlin, Heidelberg:Springer, 1965: 166-209. |
[19] | AMODEI D, ANANTHANARAYANAN S, ANUBHAIR, et al. Deep speech 2: End-to-end speech recognitionin english and mandarin [C]// 33rd InternationalConference on Machine Learning. New York:JMLR, 2016: 173-182. |
[20] | ZHENG F, ZHANG G L, SONG Z J. Comparisonof different implementations of MFCC [J]. Journal ofComputer Science and Technology, 2001, 16(6): 582-589. |
[21] | ZHAO X J, WANG D L. Analyzing noise robustnessof MFCC and GFCC features in speaker identification[C]//2013 IEEE International Conference on Acoustics,Speech and Signal Processing. Vancouver, BC:IEEE, 2013: 7204-7208. |
[22] | JIANG H. Feature extraction and dimensionality reductionin pattern recognition with applications inspeech recognition [D]. Singapore: Nanyang TechnologicalUniversity, 2006. |
[23] | ZHANG C, WOODLAND P C. DNN speaker adaptationusing parameterised sigmoid and ReLU hiddenactivation functions [C]//2016 IEEE InternationalConference on Acoustics, Speech and Signal Processing(ICASSP). Shanghai: IEEE, 2016: 5300-5304. |
[24] | GERS F A, SCHMIDHUBER J, CUMMINS F. Learningto forget: Continual prediction with LSTM [J].Neural Computation, 2000, 12(10): 2451-2471. |
[25] | GRAVES A, FERN′ANDEZ S, GOMEZ F, et al. Connectionisttemporal classification: Labelling unsegmentedsequence data with recurrent neural networks[C]//Proceedings of the 23rd international conferenceon Machine learning. Pittsburgh, PA: ACM Press,2006: 369-376. |
[26] | HEAFIELD K, POUZYREVSKY I, CLARK J H, etal. Scalable modified Kneser-Ney language model estimation[C]//51st Annual Meeting of the Associationfor Computational Linguistics. Sofia: Association forComputational Linguistics, 2013: 690-696. |
[27] | NASEER A, RANI M, NAZ S, et al. Refining Parkinson’sneurological disorder identification through deeptransfer learning [J]. Neural Computing and Applications,2020, 32(3): 839-854. |
[28] | YOON H, LI J. A novel positive transfer learning approachfor telemonitoring of Parkinson’s disease [J].IEEE Transactions on Automation Science and Engineering,2019, 16(1): 180-191. |
[29] | TORVI V G, BHATTACHARYA A,CHAKRABORTY S. Deep domain adaptationto predict freezing of gait in patients with Parkinson’sdisease [C]//2018 17th IEEE International Conferenceon Machine Learning and Applications (ICMLA).Orlando, FL: IEEE, 2018: 1001-1006. |
[30] | PAN S J, YANG Q. A survey on transfer learning [J].IEEE Transactions on Knowledge and Data Engineering,2010, 22(10): 1345-1359.[31] CHEN Z X, LIN Y. Improving X-vector and PLDA fortext-dependent speaker verification [C]//Interspeech2020. Shanghai: ISCA, 2020: 726-730. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||