在三维医学图像分割中位置信息可以作为强监督信息

doi:10.1007/s12204-023-2614-y

摘要/Abstract

摘要： 医学图像分割是许多下游诊断任务中的关键步骤。随着深度卷积神经网络极大地促进了计算机视觉的发展，半自动化的医学图像分割方法逐渐成熟，即通过应用深度卷积神经网络来检测感兴趣区域，然后由放射科医生进行修改。然而，有监督学习需要大量的人工标注，这些标注数据很难获得，特别是在医学图像领域。自监督学习能够利用无标签数据，为模型提供良好的初始化参数，然后在带标签数据量有限的下游任务上进行微调。考虑到大多数自监督学习特别是对比学习主要应用于自然图像领域，并且在预训练过程中需要昂贵的GPU资源，我们提出了一种新颖而简单的基于辅助任务的自监督学习方法，该方法利用了三维医学图像中的位置信息。具体来说，我们将二维切片在三维坐标系中的纵坐标作为伪标签，在预训练阶段以模型预测该标签作为辅助任务。我们在四个语义分割数据集上证明了本文的方法在医学图像分割任务中优于其他自监督学习方法。代码已在https://github.com/alienzyj/PPos 公开。

关键词: 自监督学习，医学图像分析，语义分割

Abstract: Medical image segmentation is a crucial preliminary step for a number of downstream diagnosis tasks. As deep convolutional neural networks successfully promote the development of computer vision, it is possible to make medical image segmentation a semi-automatic procedure by applying deep convolutional neural networks to finding the contours of regions of interest that are then revised by radiologists. However, supervised learning necessitates large annotated data, which are difficult to acquire especially for medical images. Self-supervised learning is able to take advantage of unlabeled data and provide good initialization to be finetuned for downstream tasks with limited annotations. Considering that most self-supervised learning especially contrastive learning methods are tailored to natural image classification and entail expensive GPU resources, we propose a novel and simple pretext-based self-supervised learning method that exploits the value of positional information in volumetric medical images. Specifically, we regard spatial coordinates as pseudo labels and pretrain the model by predicting positions of randomly sampled 2D slices in volumetric medical images. Experiments on four semantic segmentation datasets demonstrate the superiority of our method over other self-supervised learning methods in both semisupervised learning and transfer learning settings. Codes are available at https://github.com/alienzyj/PPos.

Key words: self-supervised learning, medical image analysis, semantic segmentation

中图分类号:

TP39
R445

赵寅杰1，侯润萍1，曾琬琴2，秦玉磊1，沈天乐2，徐志勇2，傅小龙2，沈红斌1. 在三维医学图像分割中位置信息可以作为强监督信息[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(1): 121-129.

ZHAO Yinjie¹ (赵寅杰), HOU Runpingg¹ (侯润萍), ZENG Wanqin² (曾琬琴), QIN Yulei¹ (秦玉磊), SHEN Tianle² (沈天乐), XU Zhiyong² (徐志勇), FU Xiaolong^2* (傅小龙), SHEN Hongbin^1* (沈红斌). Positional Information is a Strong Supervision for Volumetric Medical Image Segmentation[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(1): 121-129.

参考文献

[1] TAGHANAKI S A, ABHISHEK K, COHEN J P, et al. Deep semantic segmentation of natural and medical images: A review [J]. Artificial Intelligence Review, 2021, 54(1): 137-178.
[2] ZHANG S, XU J C, CHEN Y C, et al. Revisiting 3D context modeling with supervised pre-training for universal lesion detection in CT slices [M]//Medical image computing and computer assisted intervention—MICCAI 2020. Cham: Springer, 2020: 542-551.
[3] JING L L, TIAN Y L. Self-supervised visual feature learning with deep neural networks: A survey [J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(11): 4037-4058.
[4] CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations [C]//37th International Conference on Machine Learning. Vienna: IMLS, 2020: 1597-1607.
[5] HE K M, FAN H Q, WU Y X, et al. Momentum contrast for unsupervised visual representation learning [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 9726-9735.
[6] GRILL J B, STRUB F, ALTCHE F, et al. Bootstrap ′your own latent: A new approach to self-supervised learning [C]//34th Conference on Neural Information Processing Systems. Vancouver: NIPS, 2020: 21271-21284.
[7] WU Z R, XIONG Y J, YU S X, et al. Unsupervised feature learning via non-parametric instance discrimination [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 3733-3742.
[8] CHAITANYA K, ERDIL E, KARANI N, et al. Contrastive learning of global and local features for medical image segmentation with limited annotations [C]//34th Conference on Neural Information Processing Systems. Vancouver: NIPS, 2020: 12546-12558.
[9] ZENG D W, WU Y W, HU X R, et al. Positional contrastive learning for volumetric medical image segmentation [M]//Medical image computing and computer assisted intervention— MICCAI 2021. Cham:Springer, 2021: 221-230.
[10] RONNEBERGER O, FISCHER P, BROX T. UNet: Convolutional networks for biomedical image segmentation [M]//Medical image computing and computer-assisted intervention— MICCAI 2015. Cham: Springer, 2015: 234-241.
[11] MILLETARI F, NAVAB N, AHMADI S A. V-Net: Fully convolutional neural networks for volumetric medical image segmentation [C]//2016 Fourth International Conference on 3D Vision. Stanford: IEEE,2016: 565-571.
[12] C? IC?EK O, ABDULKADIR A, LIENKAMP S S, et al. 3D U-net: Learning dense volumetric segmentation from sparse annotation [M]//Medical image computing and computer-assisted intervention— MICCAI 2016. Cham: Springer, 2016: 424-432.
[13] LOU A, GUAN S, LOEW M. DC-UNet: Rethinking the U-Net architecture with dual channel efficient CNN for medical image segmentation [C]//Medical Imaging 2021: Image Processing. Online: SPIE, 2021, 11596: 758-768.
[14] ZHOU Z W, SIDDIQUEE M M R, TAJBAKHSH N, et al. UNet: Redesigning skip connections to exploit multiscale features in image segmentation [J]. IEEE Transactions on Medical Imaging, 2020, 39(6): 1856-1867.
[15] ISENSEE F, JAEGER P F, KOHL S A A, et al. nnUNet: A self-configuring method for deep learning-based biomedical image segmentation [J]. Nature Methods, 2021, 18(2): 203-211.
[16] NOROOZI M, FAVARO P. Unsupervised learning of visual representations by solving jigsaw puzzles [M]//Computer vision — ECCV 2016. Cham: Springer, 2016: 69-84.
[17] DOERSCH C, GUPTA A, EFROS A A. Unsupervised visual representation learning by context prediction [C]//2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1422-1430.
[18] ZHANG R, ISOLA P, EFROS A A. Colorful image colorization [M]//Computer vision — ECCV 2016. Cham: Springer International Publishing, 2016: 649-666.
[19] PATHAK D, KRAHENB ¨ UHL P, DONAHUE J, et ¨ al. Context encoders: Feature learning by inpainting [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 2536-2544.
[20] KHOSLA P, TETERWAK P, WANG C, et al. Supervised contrastive learning [C]// 34th Conference on Neural Information Processing Systems. Vancouver: NIPS, 2020: 18661-18673.
[21] CHEN X L, HE K M. Exploring simple Siamese representation learning [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 15745-15753.
[22] ZHOU Z W, SODHA V, RAHMAN SIDDIQUEE M M, et al. Models genesis: generic autodidactic models for 3D medical image analysis [M]//Medical image computing and computer assisted intervention— MICCAI 2019. Cham: Springer, 2019: 384-393.
[23] ZHOU Z W, SODHA V, PANG J X, et al. Models genesis [J]. Medical Image Analysis, 2021, 67: 101840.
[24] ZHUANG X R, LI Y X, HU Y F, et al. Self-supervised feature learning for 3D medical images by playing a rubik’s cube [M]//Medical image computing and computer assisted intervention— MICCAI 2019. Cham: Springer, 2019: 420-428.
[25] ZHU J W, LI Y X, HU Y F, et al. Rubik’s Cube+: A self-supervised feature learning framework for 3D medical image analysis [J]. Medical Image Analysis, 2020, 64: 101746.
[26] HAGHIGHI F, TAHER M R H, ZHOU Z W, et al. Transferable visual words: Exploiting the semantics of anatomical patterns for self-supervised learning [J]. IEEE Transactions on Medical Imaging, 2021, 40(10): 2857-2868.
[27] YAN K, LU L, SUMMERS R M. Unsupervised body part regression via spatially self-ordering convolutional neural networks [C]//2018 IEEE 15th International Symposium on Biomedical Imaging. Washington: IEEE, 2018: 1022-1025.