基于CatBoost特征选择和Stacking集成学习的磨玻璃肺结节识别

doi:10.1007/s12204-024-2761-9

J Shanghai Jiaotong Univ Sci ›› 2025, Vol. 30 ›› Issue (4): 790-799.doi: 10.1007/s12204-024-2761-9

所属专题：医学图像

基于CatBoost特征选择和Stacking集成学习的磨玻璃肺结节识别

1. 北京信息科技大学计算机学院，北京 100101；2. 济南超级计算技术研究院病理信息工程技术中心，济南 250100；3. 中国科学院大学计算机科学与技术学院，北京 100101；4. 天津康汇医院放射科，天津 300385；5. 华北石油管理局总医院放射科，河北任丘 062550

收稿日期:2023-12-04 接受日期:2023-12-25 出版日期:2025-07-31 发布日期:2025-07-31

Ground-Glass Lung Nodules Recognition Based on CatBoost Feature Selection and Stacking Ensemble Learning

苗军¹，常艺茹¹，陈辰²，张茂炫¹，刘艳³，齐洪钢³，郭志军⁴，徐倩⁵

1. School of Computer Science, Beijing Information Science and Technology University, Beijing 100101, China; 2. Pathological Information Engineering Technology Center, Jinan Supercomputing Technology Research Institute, Jinan 250100, China; 3. School of Computer Science and Technology, University of the Chinese Academy of Sciences, Beijing 100049, China; 4. Department of Radiology, Tianjin Kanghui Hospital, Tianjin 300385, China; 5. Department of Radiology, Huabei Petroleum General Hospital, Renqiu 062550, Hebei, China

Received:2023-12-04 Accepted:2023-12-25 Online:2025-07-31 Published:2025-07-31

摘要/Abstract

摘要： 针对当前磨玻璃肺结节特征维数高、冗余数据多、单一分类器识别准确率较低的问题，提出了一种基于CatBoost特征选择和Stacking集成学习的磨玻璃肺结节识别方法。该方法首先使用特征选择算法进行重要特征筛选，去除作用较少的特征，达到数据降维的效果；其次，将随机森林、决策树、KNN分类、LightGBM作为基分类器，支持向量机作为元分类器进行集成学习模型的融合和搭建，在保持基分类器多样性的同时提升分类模型的准确率。实验结果显示，所提方法的识别准确率达到94.375%。与单分类器中性能最好的随机森林算法相比，该方法的准确率提高了1.875%。与磨玻璃肺结节识别领域最近的深度学习方法ResNet + GBM + Attention和MVCSNet相比，准确率也获得了提升或者性能可比。实验表明，所提出的模型能够对肺结节进行有效的特征选择和分类识别。

Abstract: Aimed at the issues of high feature dimensionality, excessive data redundancy, and low recognition accuracy of using single classifiers on ground-glass lung nodule recognition, a recognition method was proposed based on CatBoost feature selection and Stacking ensemble learning. First, the method uses a feature selection algorithm to filter important features and remove features with less impact, achieving the effect of data dimensionality reduction. Second, random forests classifier, decision trees, K-nearest neighbor classifier, and light gradient boosting machine were used as base classifiers, and support vector machine was used as meta classifier to fuse and construct the ensemble learning model. This measure increases the accuracy of the classification model while maintaining the diversity of the base classifiers. The experimental results show that the recognition accuracy of the proposed method reaches 94.375%. Compared to the random forest algorithm with the best performance among single classifiers, the accuracy of the proposed method is increased by 1.875%. Compared to the recent deep learning methods (ResNet+GBM+Attention and MVCSNet) on ground-glass pulmonary nodule recognition, the proposed method’s performance is also better or comparative. Experiments show that the proposed model can effectively select features and make recognition on ground-glass pulmonary nodules.

中图分类号:

TP181
R319

. 基于CatBoost特征选择和Stacking集成学习的磨玻璃肺结节识别[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(4): 790-799.

Miao Jun, Chang Yiru, Chen Chen, Zhang Maoyuan, Liu Yan, Qi Honggang, Guo Zhijun, Xu Qian. Ground-Glass Lung Nodules Recognition Based on CatBoost Feature Selection and Stacking Ensemble Learning[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(4): 790-799.

参考文献

[1] AGGARWAL P, VIG R, SARDANA H K. Semantic and content-based medical image retrieval for lung cancer diagnosis with the inclusion of expert knowledge and proven pathology [C]//2013 IEEE Second International Conference on Image Information Processing. Shimla: IEEE, 2013: 346-351.
[2] WANG X, MA D. Advances in computer-aided diagnosis in pulmonary nodules [J]. Chinese Journal of Radiology, 2006, 40(4): 443-445 (in Chinese).
[3] GAO L, YU X X, KANG B, et al. Predictive value of CT-based radiomics nomogram for the invasiveness of lung pure ground-glass nodules [J]. Journal of Shandong University (Health Science), 2022, 60(5): 87-97 (in Chinese).
[4] WAN H Y, LI J, WANG B, et al. Establishment of prediction model for isolated pulmonary benign or malignant nodule by Bayesian network [J]. Journal of Chinese Oncology, 2022, 28(5): 380-384 (in Chinese).
[5] CAI J H, DUAN S F, YUAN H, et al. Machine learning in differentiating pulmonary invasive adenocarcinoma from non-invasive adenocarcinoma manifested as pure ground-glass nodule [J]. Chinese Journal of Medical Imaging Technology, 2020, 36(3): 405-410 (in Chinese).
[6] MAĆKIEWICZ A, RATAJCZAK W. Principal components analysis (PCA) [J]. Computers & Geosciences, 1993, 19(3): 303-342.
[7] LIU X F. The clinical value of CT radiomics in the diagnosis of ground-glass pulmonary nodules [D]. Wuhu: Wannan Medical College, 2021 (in Chinese).
[8] DAI Y Q, GUO X Y, WANG M, et al. Feature selection of high-dimensional biomedical data based on shuffled frog leaping algorithm [J]. Application Research of Computers, 2021, 38(4): 1062-1068 (in Chinese).
[9] DARABI N, REZAI A, HAMIDPOUR S S F. Breast cancer detection using RSFS-based feature selection algorithms in thermal images J]. Biomedical Engineering: Applications, Basis and Communications, 2021, 33(3): 2150020.
[10] LI Y F, LUO Y, GUO L, et al. Radiomics analysis and machine learning for classification of benign and malignant pulmonary nodules [J]. Radiologic Practice, 2021, 36(4): 464-469 (in Chinese).
[11] MIAO X F, LIU M, JIANG Y. Hepatitis C prediction based on machine learning algorithms [J]. Journal of Jilin University (Information Science Edition), 2022, 40(4): 638-643 (in Chinese).
[12] WU T F, ZHANG R S. Research on the application of machine learning in the malignant grinding glass density nodules of lung [J]. Journal of Guangzhou University (Natural Science Edition), 2018, 17(3): 33-39 (in Chinese).
[13] CHANG T T, LIU H W, FENG J. Support vector machine ensemble learning algorithm research based on heterogeneous data [J]. Journal of Xidian University, 2010, 37(1): 136-141 (in Chinese).
[14] PANG L, LAN W X, WANG Q Q, et al. Machine learning-based survival prediction model for colorectal adenocarcinoma cancer [J]. Modern Preventive Medicine, 2023, 50(2): 227-232 (in Chinese).
[15] BARTLETT P, FREUND Y, LEE W S, et al. Boosting the margin: A new explanation for the effectiveness of voting methods [J]. The Annals of Statistics, 1998, 26(5): 1651-1686.
[16] CHE X J, YU Y J, LIU Q L, et al. Enhanced Bagging ensemble learning and multi⁃target detection algorithm [J]. Journal of Jilin University (Engineering and Technology Edition), 2022, 52(12): 2916-2923 (in Chinese).
[17] KUANG J, HONG M J, LIU X C, et al. Classification of pulmonary nodules based on attention mechanism [J]. Computer Applications and Software, 2022, 39(1): 163-167 (in Chinese).
[18] ZHU Q K, WANG Y Q, CHU X P, et al. Multi-view coupled self-attention network for pulmonary nodules classification [M]// Computer vision – ACCV 2022. Cham: Springer, 2022: 37-51.
[19] KIRA K, RENDELL L. The feature selection problem: Traditional methods and a new algorithm [C]// 10th National Conference on Artificial Intelligence. San Jose: AAAI, 1992: 129-134.
[20] HE X Y, GONG J, WANG L J, et al. Feature selection based on feature vectorization on computer tomography scan of pulmonary nodules [J]. Application Research of Computers, 2018, 35(8): 2544-2548 (in Chinese).
[21] WANG J, ZHANG X L, ZHAO J J. Feature selection algorithm for diagnostic model of solitary pulmonary nodules [J]. China Sciencepaper, 2014, 9(10): 1201-1205 (in Chinese).
[22] DIMITRIADOU E, WEINGESSEL A, HORNIK K. Voting-merging: An ensemble method for clustering [M]// Artificial neural networks — ICANN 2001. Berlin, Heidelberg: Springer, 2001: 217-224.
[23] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[24] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [J]. Communications of the ACM, 2017, 60(6): 84-90.
[25] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2261-2269.

基于CatBoost特征选择和Stacking集成学习的磨玻璃肺结节识别

Ground-Glass Lung Nodules Recognition Based on CatBoost Feature Selection and Stacking Ensemble Learning

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 5

编辑推荐

Metrics

本文评价

[1]	. 基于深度学习的肺癌病例文本结构化算法[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(4): 778-789.
[2]	. 基于表面肌电信号的BP-LSTM混合模型肘部运动实时预测[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(3): 455-462.
[3]	刘月笙, 贺宁, 贺利乐, 张译文, 习坤, 张梦芮. 基于机器学习的移动机器人路径跟踪MPC控制器参数自整定[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(6): 1028-1036.
[4]	刘文1, 3, 许剑新2, 4, 杨根科1, 3, 陈媛芳5. 基于LSTM-BiDBN入侵检测系统的在线车辆取证责任方认定方法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(6): 1161-1168.
[5]	李明爱1,2,3，许冬芹1. 综述：运动想像脑机接口中的迁移学习[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(1): 37-59.