J Shanghai Jiaotong Univ Sci ›› 2025, Vol. 30 ›› Issue (4): 778-789.doi: 10.1007/s12204-025-2825-5

• • 上一篇    下一篇

基于深度学习的肺癌病例文本结构化算法

  

  1. 1. 上海交通大学医学院附属胸科医院 信息中心,上海 200030;2. 东华大学 理学院,上海 201620;3. 上海交通大学医学院附属胸科医院 院长办公室,上海 200030
  • 收稿日期:2024-12-02 接受日期:2025-02-25 出版日期:2025-07-31 发布日期:2025-07-31

Text Structured Algorithm of Lung Cancer Cases Based on Deep Learning

宓林晖1,袁骏毅1,周延康2,侯旭敏3   

  1. 1. Information Center, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200030, China; 2. College of Science, Donghua University, Shanghai 201620, China; 3. Hospital Director’s Office, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200030, China
  • Received:2024-12-02 Accepted:2025-02-25 Online:2025-07-31 Published:2025-07-31

摘要: 手术部位感染是肺癌患者中最常见的医疗相关感染。构建肺癌手术部位感染的风险预测模型需要从肺癌病例文本中提取相关风险因素,这涉及两种类型的文本结构化任务:属性判别和属性提取。围绕这两种任务提出了一种联合模型,即Multi BGLC;该模型使用BERT作为编码器,并基于癌症病例数据,对由GCNN+LSTM+CRF组成的解码器进行微调。其中,GCNN用于属性判别,而LSTM和CRF用于属性提取。实验证明,与其他基线模型相比,该模型的有效性和准确性更高。

关键词: 文本结构化, 文本分类, 序列标注, 数据增强, 肺癌, 电子病历

Abstract: Surgical site infections (SSIs) are the most common healthcare-related infections in patients with lung cancer. Constructing a lung cancer SSI risk prediction model requires the extraction of relevant risk factors from lung cancer case texts, which involves two types of text structuring tasks: attribute discrimination and attribute extraction. This article proposes a joint model, Multi-BGLC, around these two types of tasks, using bidirectional encoder representations from transformers (BERT) as the encoder and fine-tuning the decoder composed of graph convolutional neural network (GCNN) + long short-term memory (LSTM) + conditional random field (CRF) based on cancer case data. The GCNN is used for attribute discrimination, whereas the LSTM and CRF are used for attribute extraction. The experiment verified the effectiveness and accuracy of the model compared with other baseline models.

中图分类号: