Journal of Shanghai Jiao Tong University (Science) ›› 2018, Vol. 23 ›› Issue (3): 392-.doi: 10.1007/s12204-018-1954-5

Previous Articles     Next Articles

Research of Clinical Named Entity Recognition Based on Bi-LSTM-CRF

Research of Clinical Named Entity Recognition Based on Bi-LSTM-CRF

QIN Ying (秦颖), ZENG Yingfei (曾颖菲)   

  1. (Department of Computer Science, Beijing Foreign Studies University, Beijing 100089, China)
  2. (Department of Computer Science, Beijing Foreign Studies University, Beijing 100089, China)
  • Online:2018-05-31 Published:2018-06-17
  • Contact: QIN Ying (秦颖) E-mail:qinying@bfsu.edu.cn

Abstract: Electronic Medical Records (EMR) with unstructured sentences and various conceptual expressions provide rich information for medical information extraction. However, common Named Entity Recognition (NER) in Natural Language Processing (NLP) are not well suitable for clinical NER in EMR. This study aims at applying neural networks to clinical concept extractions. We integrate Bidirectional Long Short-Term Memory Networks (Bi-LSTM) with a Conditional Random Fields (CRF) layer to detect three types of clinical named entities. Word representations fed into the neural networks are concatenated by character-based word embeddings and Contin- uous Bag of Words (CBOW) embeddings trained both on domain and non-domain corpus. We test our NER system on i2b2/VA open datasets and compare the performance with six related works, achieving the best result of NER with F1 value 0.853 7. We also point out a few speciˉc problems in clinical concept extractions which will give some hints to deeper studies.

Key words: clinical named entity recognition| bidirectional long short-term memory networks| conditional random fields

摘要: Electronic Medical Records (EMR) with unstructured sentences and various conceptual expressions provide rich information for medical information extraction. However, common Named Entity Recognition (NER) in Natural Language Processing (NLP) are not well suitable for clinical NER in EMR. This study aims at applying neural networks to clinical concept extractions. We integrate Bidirectional Long Short-Term Memory Networks (Bi-LSTM) with a Conditional Random Fields (CRF) layer to detect three types of clinical named entities. Word representations fed into the neural networks are concatenated by character-based word embeddings and Contin- uous Bag of Words (CBOW) embeddings trained both on domain and non-domain corpus. We test our NER system on i2b2/VA open datasets and compare the performance with six related works, achieving the best result of NER with F1 value 0.853 7. We also point out a few speciˉc problems in clinical concept extractions which will give some hints to deeper studies.

关键词: clinical named entity recognition| bidirectional long short-term memory networks| conditional random fields

CLC Number: