用于半监督医学图像分割的多一致性训练

doi:10.1007/s12204-024-2733-0

摘要/Abstract

摘要： 医学图像分割是临床应用中的一项重要任务。然而，获得医学图像的标记数据通常具有挑战性。这就提高了半监督学习（SSL）的吸引力。半监督学习是一种只需要少量标记数据的技术。尽管如此，大多数流行的医学图像SSL分割方法要么依赖于单一的一致性训练方法，要么直接微调为自然图像设计的SSL方法。本文提出了一种创新的半监督方法，称为多一致性训练（MCT），用于医学图像分割。我们的方法超越了现有方法的限制，从两个角度考虑一致性：不同上采样方法的输出一致性，以及对中间特征的各种扰动下，同一网络相同数据的输出一致性。为这两种类型的一致性，设计了不同的半监督损失函数。为了增强MCT模型的应用，还开发了一个专用解码器作为神经网络的核心。在息肉数据集和牙科数据集上进行了彻底的实验，并与其他SSL方法进行了严格的比较。实验结果证明了该方法的优越性，实现了更高的分割精度。此外，全面的消融研究和深入的讨论证实了我们的方法在处理复杂的医学图像分割方面的有效性。

关键词: 半监督学习, 一致性训练, 医学图像分割, 中间特征扰动

Abstract: Medical image segmentation is a crucial task in clinical applications. However, obtaining labeled data for medical images is often challenging. This has led to the appeal of semi-supervised learning (SSL), a technique adept at leveraging a modest amount of labeled data. Nonetheless, most prevailing SSL segmentation methods for medical images either rely on the single consistency training method or directly fine-tune SSL methods designed for natural images. In this paper, we propose an innovative semi-supervised method called multi-consistency training (MCT) for medical image segmentation. Our approach transcends the constraints of prior methodologies by considering consistency from a dual perspective: output consistency across different up-sampling methods and output consistency of the same data within the same network under various perturbations to the intermediate features. We design distinct semi-supervised loss regression methods for these two types of consistencies. To enhance the application of our MCT model, we also develop a dedicated decoder as the core of our neural network. Thorough experiments were conducted on the polyp dataset and the dental dataset, rigorously compared against other SSL methods. Experimental results demonstrate the superiority of our approach, achieving higher segmentation accuracy. Moreover, comprehensive ablation studies and insightful discussion substantiate the efficacy of our approach in navigating the intricacies of medical image segmentation.

中图分类号:

TP391
R445

. 用于半监督医学图像分割的多一致性训练[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(4): 800-814.

Wu Changxue, Zhang Wenxi, Han Jiaozhi, Wang Hongyu. Multi-Consistency Training for Semi-Supervised Medical Image Segmentation[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(4): 800-814.

参考文献

[1] LI Q, HUANGFU Y, LI J, et al. UConvTrans: A dual-flow cardiac image segmentation network by global and local information integration [J]. Journal of Shanghai Jiao Tong University, 2023, 57(5): 570-581.
[2] ZHANG Y, LIU S J, LI C L, et al. Rethinking the dice loss for deep learning lesion segmentation in medical images [J]. Journal of Shanghai Jiao Tong University (Science), 2021, 26(1): 93-102.
[3] JIANG Z G, CHANG Q. USSL net: Focusing on structural similarity with light U-structure for stroke lesion segmentation [J]. Journal of Shanghai Jiao Tong University (Science), 2022, 27(4): 485-497.
[4] TRAJANOVSKI S, MAVROEIDIS D, SWISHER C L, et al. Towards radiologist-level cancer risk assessment in CT lung screening using deep learning [J]. Computerized Medical Imaging and Graphics, 2021, 90: 101883.
[5] KANG J, DING J M, LEI T, et al. Interactive liver segmentation algorithm based on geodesic distance and V-net [J]. Journal of Shanghai Jiao Tong University (Science), 2022, 27(2): 190-201.
[6] WANG Z M, DONG J J, ZHANG J P. Multi-model ensemble deep learning method to diagnose COVID-19 using chest computed tomography images [J]. Journal of Shanghai Jiao Tong University (Science), 2022, 27(1): 70-80.
[7] BERTHELOT D, CARLINI N, GOODFELLOW I, et al. MixMatch: A holistic approach to semi-supervised learning [C]// 33rd Conference on Neural Information Processing Systems. Vancouver: NIPS, 2019: 1-11.
[8] RASMUS A, BERGLUND M, HONKALA M, et al. Semi-supervised learning with ladder networks [C]// 29th Conference on Neural Information Processing Systems. Vancouver: NIPS, 2015: 1-9.
[9] LAINE S, AILA T. Temporal ensembling for semi-supervised learning [DB/OL]. (2016-10-07). http://arxiv.org/abs/1610.02242
[10] TARVAINEN A, VALPOLA H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results [C]// 31st Conference on Neural Information Processing Systems. Long Beach: NIPS, 2017: 1-10.
[11] MIYATO T, MAEDA S I, KOYAMA M, et al. Virtual adversarial training: A regularization method for supervised and semi-supervised learning [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1979-1993.
[12] ZHANG B W, WANG Y D, HOU W X, et al. FlexMatch: Boosting semi-supervised learning with curriculum pseudo labeling [C]// 35th Conference on Neural Information Processing Systems. Online: NIPS, 2021: 18408-18419.
[13] VOLPI R, MORERIO P, SAVARESE S, et al. Adversarial feature augmentation for unsupervised domain adaptation [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 5495-5504.
[14] CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations [C]// 37th International Conference on Machine Learning. Vienna: PMLR, 2020: 1597-1607.
[15] XIE Q, DAI Z, HOVY E, et al. Unsupervised data augmentation for consistency training [C]// 34th Conference on Neural Information Processing Systems. Vancouver: NIPS, 2020: 6256-6268.
[16] CAI Z W, RAVICHANDRAN A, MAJI S, et al. Exponential moving average normalization for self-supervised and semi-supervised learning [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Online: IEEE, 2021: 194-203.
[17] LI X M, YU L Q, CHEN H, et al. Transformation-consistent self-ensembling model for semisupervised medical image segmentation [J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(2): 523-534.
[18] XU Z, LU D, LUO J, et al. Anti-interference from noisy labels: Mean-teacher-assisted confident learning for medical image segmentation [J]. IEEE Transactions on Medical Imaging, 2022, 41(11): 3062-3073.
[19] KE Z H, WANG D Y, YAN Q, et al. Dual student: Breaking the limits of the teacher in semi-supervised learning [C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6727-6735.
[20] SAJJADI M, JAVANMARDI M, TASDIZEN T. Regularization with stochastic transformations and perturbations for deep semi-supervised learning [C]// 30th Conference on Neural Information Processing Systems. Barcelona: NIPS, 2016: 1-9.
[21] BERTHELOT D, CARLINI N, CUBUK E D, et al. ReMixMatch: Semi-supervised learning with distribution alignment and augmentation anchoring [DB/OL]. (2019-11-21). http://arxiv.org/abs/1911.09785
[22] SOHN K, BERTHELOT D, CARLINI N, et al. Fixmatch: Simplifying semi-supervised learning with consistency and confidence [C]// 34th Conference on Neural Information Processing Systems. Vancouver: NIPS, 2020: 596-608.
[23] SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651.
[24] RONNEBERGER O, FISCHER P, BROX T. U-net: Convolutional networks for biomedical image segmentation [M]// Medical image computing and computer-assisted intervention – MICCAI 2015. Cham: Springer, 2015: 234-241.
[25] ZHOU Z, SIDDIQUEE M M R, TAJBAKHSH N, et al. UNet++: Redesigning skip connections to exploit multiscale features in image segmentation [J]. IEEE Transactions on Medical Imaging, 2020, 39(6): 1856-1867.
[26] HUANG H M, LIN L F, TONG R F, et al. UNet 3: A full-scale connected UNet for medical image segmentation [C]// 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Barcelona: IEEE, 2020: 1055-1059.
[27] OKTAY O, SCHLEMPER J, LE FOLGOC L, et al. Attention U-net: Learning where to look for the pancreas [DB/OL]. (2018-04-11). http://arxiv.org/abs/1804.03999
[28] MILLETARI F, NAVAB N, AHMADI S A. V-Net: Fully convolutional neural networks for volumetric medical image segmentation [C]// 2016 Fourth International Conference on 3D Vision. Stanford: IEEE, 2016: 565-571.
[29] CAO H, WANG Y, CHEN J, et al. Swin-Unet: Unet-like pure transformer for medical image segmentation [M]// Computer vision – ECCV 2022 Workshops. Cham: Springer, 2023: 205-218. [30] LIU Z, LIN Y, CAO Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows [C]// IEEE/CVF International Conference on Computer Vision. Online: 2021: 10012-10022.
[31] CAO X, CHEN H, LI Y, et al. Uncertainty aware temporal-ensembling model for semi-supervised ABUS mass segmentation [J]. IEEE Transactions on Medical Imaging, 2020, 40(1): 431-443.
[32] SHI J, GONG T, WANG C, et al. Semi-supervised pixel contrastive learning framework for tissue segmentation in histopathological image [J]. IEEE Journal of Biomedical and Health Informatics, 2022, 27(1): 97-108.
[33] BAI W, OKTAY O, SINCLAIR M, et al. Semi-supervised learning for network-based cardiac MR image segmentation [M]// Medical image computing and computer-assisted intervention − MICCAI 2017. Cham: Springer, 2017: 253-260.
[34] OUALI Y, HUDELOT C, TAMI M. Semi-supervised semantic segmentation with cross-consistency training [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 12671-12681.
[35] WANG W H, XIE E Z, LI X, et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions [C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 548-558.
[36] WANG W H, XIE E Z, LI X, et al. PVT v2: Improved baselines with pyramid vision transformer [J]. Computational Visual Media, 2022, 8(3): 415-424.
[37] LI Z, WANG W, XIE E, et al. Panoptic SegFormer: Delving deeper into panoptic segmentation with transformers [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 1280-1289.
[38] XIE E, WANG W, YU Z, et al. SegFormer: Simple and efficient design for semantic segmentation with transformers [C]// 35th Conference on Neural Information Processing Systems. Online: NIPS, 2021: 12077-12090.
[39] HAN G X, MA J W, HUANG S Y, et al. Few-shot object detection with fully cross-transformer [C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 5311-5320.
[40] CHENG X L, XIONG H, FAN D P, et al. Implicit motion handling for video camouflaged object detection [C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 13854-13863.
[41] WU Z, SU L, HUANG Q M. Cascaded partial decoder for fast and accurate salient object detection [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 3902-3911.
[42] ZHANG Y, XIANG T, HOSPEDALES T M, et al. Deep mutual learning [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4320-4328.
[43] WU Y C, GE Z Y, ZHANG D H, et al. Mutual consistency learning for semi-supervised medical image segmentation [J]. Medical Image Analysis, 2022, 81: 102530.
[44] PASZKE A, GROSS S, MASSA F, et al. Pytorch: An imperative style, high-performance deep learning library [C]// 33rd Conference on Neural Information Processing Systems. Vancouver: 2019: 1-12.
[45] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: A simple way to prevent neural networks from overfitting [J]. Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[46] URIA B, CÔTÉ M A, GREGOR K, et al. Neural autoregressive distribution estimation [J]. Journal of Machine Learning Research, 2016, 17(205): 1-37.
[47] FAN D P, JI G P, ZHOU T, et al. PraNet: parallel reverse attention network for polyp segmentation[M]// Medical image computing and computer assisted intervention – MICCAI 2020. Cham: Springer, 2020: 263-273.
[48] ZHONG Z, ZHENG L, KANG G L, et al. Random erasing data augmentation [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 13001-13008.
[49] GIDARIS S, SINGH P, KOMODAKIS N. Unsupervised representation learning by predicting image rotations [DB/OL]. (2018-03-21). http://arxiv.org/abs/1803.07728
[50] WEI J, WANG S H, HUANG Q M. F³Net: Fusion, feedback and focus for salient object detection [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12321-12328.
[51] MARGOLIN R, ZELNIK-MANOR L, TAL A. How to evaluate foreground maps [C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 248-255.
[52] FAN D P, CHENG M M, LIU Y, et al. Structure-measure: A new way to evaluate foreground maps [C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 4558-4567.
[53] FAN D P, GONG C, CAO Y, et al. Enhanced-alignment measure for binary foreground map evaluation [C]// Twenty-Seventh International Joint Conference on Artificial Intelligence. Stockholm: IJCAI, 2018: 698-704.
[54] PANETTA K, RAJENDRAN R, RAMESH A, et al. Tufts dental database: A multimodal panoramic X-ray dataset for benchmarking diagnostic systems [J]. IEEE Journal of Biomedical and Health Informatics, 2022, 26(4): 1650-1659.
[55] ALI QADIR H, BALASINGHAM I, SOLHUSVIK J, et al. Improving automatic polyp detection using CNN by exploiting temporal dependency in colonoscopy video [J]. IEEE Journal of Biomedical and Health Informatics, 2020, 24(1): 180-193.
[56] BERNAL J, SÁNCHEZ F J, FERNÁNDEZ-ESPARRACH G, et al. WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians [J]. Computerized Medical Imaging and Graphics, 2015, 43: 99-111.
[57] JHA D, SMEDSRUD P H, RIEGLER M A, et al. Kvasir-SEG: A segmented polyp dataset [M]// MultiMedia modeling. Cham: Springer, 2020: 451-462.
[58] LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization [DB/OL]. (2017-11-14). http://arxiv.org/abs/1711.05101
[59] VU T H, JAIN H, BUCHER M, et al. ADVENT: Adversarial entropy minimization for domain adaptation in semantic segmentation [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 2512-2521.
[60] YU L Q, WANG S J, LI X M, et al. Uncertainty-aware self-ensembling model for semi-supervised 3D left atrium segmentation [M]// Medical image computing and computer assisted intervention – MICCAI 2019. Cham: Springer, 2019: 605-613.
[61] ZHAO X K, FANG C W, FAN D J, et al. Cross-level contrastive learning and consistency constraint for semi-supervised medical image segmentation [C]//2022 IEEE 19th International Symposium on Biomedical Imaging. Kolkata: IEEE, 2022: 1-5.
[62] TAJBAKHSH N, GURUDU S R, LIANG J M. Automated polyp detection in colonoscopy videos using shape and context information [J]. IEEE Transactions on Medical Imaging, 2016, 35(2): 630-644.
[63] SILVA J, HISTACE A, ROMAIN O, et al. Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer [J]. International Journal of Computer Assisted Radiology and Surgery, 2014, 9: 283-293.
[64] VÁZQUEZ D, BERNAL J, SÁNCHEZ F J, et al. A benchmark for endoluminal scene segmentation of colonoscopy images [J]. Journal of Healthcare Engineering, 2017, 2017: 4037190.
[65] LUO X D, WANG G T, LIAO W J, et al. Semi-supervised medical image segmentation via uncertainty rectified pyramid consistency [J]. Medical Image Analysis, 2022, 80: 102517.
[66] LUO X D, CHEN J N, SONG T, et al. Semi-supervised medical image segmentation through dual-task consistency [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(10): 8801-8809.