J Shanghai Jiaotong Univ Sci ›› 2025, Vol. 30 ›› Issue (4): 790-799.doi: 10.1007/s12204-024-2761-9

• • 上一篇    下一篇

基于CatBoost特征选择和Stacking集成学习的磨玻璃肺结节识别

  

  1. 1. 北京信息科技大学 计算机学院,北京 100101;2. 济南超级计算技术研究院 病理信息工程技术中心,济南 250100;3. 中国科学院大学 计算机科学与技术学院,北京 100101;4. 天津康汇医院 放射科,天津 300385;5. 华北石油管理局总医院 放射科,河北任丘 062550
  • 收稿日期:2023-12-04 接受日期:2023-12-25 发布日期:2025-07-31

Ground-Glass Lung Nodules Recognition Based on CatBoost Feature Selection and Stacking Ensemble Learning

苗军1,常艺茹1,陈辰2,张茂炫1,刘艳3,齐洪钢3,郭志军4,徐倩5   

  1. 1. School of Computer Science, Beijing Information Science and Technology University, Beijing 100101, China; 2. Pathological Information Engineering Technology Center, Jinan Supercomputing Technology Research Institute, Jinan 250100, China; 3. School of Computer Science and Technology, University of the Chinese Academy of Sciences, Beijing 100049, China; 4. Department of Radiology, Tianjin Kanghui Hospital, Tianjin 300385, China; 5. Department of Radiology, Huabei Petroleum General Hospital, Renqiu 062550, Hebei, China
  • Received:2023-12-04 Accepted:2023-12-25 Published:2025-07-31

摘要: 针对当前磨玻璃肺结节特征维数高、冗余数据多、单一分类器识别准确率较低的问题,提出了一种基于CatBoost特征选择和Stacking集成学习的磨玻璃肺结节识别方法。该方法首先使用特征选择算法进行重要特征筛选,去除作用较少的特征,达到数据降维的效果;其次,将随机森林、决策树、KNN分类、LightGBM作为基分类器,支持向量机作为元分类器进行集成学习模型的融合和搭建,在保持基分类器多样性的同时提升分类模型的准确率。实验结果显示,所提方法的识别准确率达到94.375%。与单分类器中性能最好的随机森林算法相比,该方法的准确率提高了1.875%。与磨玻璃肺结节识别领域最近的深度学习方法ResNet + GBM + Attention和MVCSNet相比,准确率也获得了提升或者性能可比。实验表明,所提出的模型能够对肺结节进行有效的特征选择和分类识别。

关键词: 磨玻璃肺结节, 特征选择, 集成学习

Abstract: Aimed at the issues of high feature dimensionality, excessive data redundancy, and low recognition accuracy of using single classifiers on ground-glass lung nodule recognition, a recognition method was proposed based on CatBoost feature selection and Stacking ensemble learning. First, the method uses a feature selection algorithm to filter important features and remove features with less impact, achieving the effect of data dimensionality reduction. Second, random forests classifier, decision trees, K-nearest neighbor classifier, and light gradient boosting machine were used as base classifiers, and support vector machine was used as meta classifier to fuse and construct the ensemble learning model. This measure increases the accuracy of the classification model while maintaining the diversity of the base classifiers. The experimental results show that the recognition accuracy of the proposed method reaches 94.375%. Compared to the random forest algorithm with the best performance among single classifiers, the accuracy of the proposed method is increased by 1.875%. Compared to the recent deep learning methods (ResNet+GBM+Attention and MVCSNet) on ground-glass pulmonary nodule recognition, the proposed method’s performance is also better or comparative. Experiments show that the proposed model can effectively select features and make recognition on ground-glass pulmonary nodules.

Key words: ground-glass pulmonary nodule, feature selection, ensemble learning

中图分类号: