J Shanghai Jiaotong Univ Sci ›› 2025, Vol. 30 ›› Issue (1): 153-165.doi: 10.1007/s12204-023-2611-1

• Medicine-Engineering Interdisciplinary • Previous Articles    

Ensemble Learning-Based Mortality Prediction After Acute Myocardial Infarction

基于集成学习的急性心肌梗死死亡预测

YAN Mingxuan1 (颜铭萱), MIAO Yutong2,3 (苗雨桐), SHENG Shuqian1 (盛淑茜), GAN Xiaoying1 (甘小莺), HE Ben2 (何 奔), SHEN Lan2,3* (沈 兰)   

  1. (1. School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; 2. Department of Cardiology, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200030, China; 3. Clinical Research Center, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200030, China)
  2. (1.上海交通大学 电子信息与电气工程学院,上海 200240;2.上海交通大学医学院附属胸科医院 心内科,上海 200030;3.上海交通大学附属胸科医院 临床研究中心,上海 200030)
  • Received:2022-06-24 Accepted:2022-10-15 Online:2025-01-28 Published:2025-01-28

Abstract: A mortality prediction model based on small acute myocardial infarction (AMI) patients coherent with low death rate is established. In total, 1 639 AMI patients are selected as research objects who received treatment in seven tertiary and secondary hospitals in Shanghai between January 1, 2016 and January 1, 2018. Among them, 72 patients deceased during the two-year follow-up. Models are established with ensemble learning framework and machine learning algorithms based on 51 physiological indicators of the patient. Shapley additive explanations algorithm and univariate test with point-biserial and phi correlation coefficients are employed to determine significant features and rank feature importance. Based on 5-fold cross validation experiment and external validation, prediction model with self-paced ensemble framework and random forest algorithm achieves the best performance with area under receiver operating characteristic curve (AUROC) score of 0.911 and recall of 0.864. Both feature ranking methods showed that ejection fractions, serum creatinine (admission), hemoglobin and Killip class are the most important features. With these top-ranked features, the simplified prediction model is capable of achieving a comparable result with AUROC score of 0.872 and recall of 0.818. This work proposes a new method to establish mortality prediction models for AMI patients based on self-paced ensemble framework, which allows models to achieve high performance with small scale of patients coherent with low death rate. It will assist in medical decision and prognosis as a new reference.

Key words: acute myocardial infarction (AMI), ensemble learning, machine learning, feature engineering

摘要: 本研究构建了一个基于小规模低死亡率的急性心肌梗死患者群体的死亡预测模型。研究数据来源于2016年1月1日至2018年1月1日间就诊于上海市七家三级及二级医院的急性心肌梗死患者,共计1639例。其中72名患者在2年的随访期间内死亡。预测模型利用集成学习框架采用这些患者的51项生理指标进行构建,并通过SHAP算法和基于二列相关系数与Phi相关系数的单变量检验方法计算这些特征的重要性并进行排序。根据五折交叉检验实验和外部检验,采用自步集成学习框架和随机森林算法构建的预测模型可以取得最优的实验性能,其受试者工作特性曲线线下面积(AUROC)为0.911,召回率为0.864。上述两种特征排序方法均表明射血分数,血肌酐(入院),血红蛋白和Killip分级为与患者死亡最相关的重要特征。基于这些重要特征,本研究进一步构建了一个简化的预测模型,并取得了AUROC为0.872,召回率为0.818的接近使用特征全集构建的完备模型的实验性能。本研究提出了一种基于自步集成学习框架的急性心肌梗死患者死亡预测模型的构建方法,并且可以在小规模低死亡率的患者群体上构建高预测性能的模型。此项工作将在进行医疗决策和诊断的过程中提供参考和辅助。

关键词: 急性心肌梗死,集成学习,机器学习,特征工程

CLC Number: