Real-Time Lightweight Convolutional Neural Network for Polyp Detection in Endoscope Images

doi:10.1007/s12204-023-2671-2

Abstract

Abstract: Colorectal cancer is the most common cancer with a second mortality rate. Polyp lesion is a precursor symptom of colorectal cancer. Detection and removal of polyps can effectively reduce the mortality of patients in the early period. However, mass images will be generated during an endoscopy, which will greatly increase the workload of doctors, and long-term mechanical screening of endoscopy images will also lead to a high misdiagnosis rate. Aiming at the problem that computer-aided diagnosis models deeply depend on the computational power in the polyp detection task, we propose a lightweight model, coordinate attention-YOLOv5-Lite-Prune, based on the YOLOv5 algorithm, which is different from state-of-the-art methods proposed by the existing research that applied object detection models or their variants directly to prediction task without any lightweight processing, such as faster region-based convolutional neural networks, YOLOv3, YOLOv4, and single shot multibox detector. The innovations of our model are as follows: First, the lightweight EfficientNetLite network is introduced as the new feature extraction network. Second, the depthwise separable convolution and its improved modules with different attention mechanisms are used to replace the standard convolution in the detection head structure. Then, the α-intersection over union loss function is applied to improve the precision and convergence speed of the model. Finally, the model size is compressed with a pruning algorithm. Our model effectively reduces parameter amount and computational complexity without significant accuracy loss. Therefore, the model can be successfully deployed on the embedded deep learning platform, and detect polyps with a speed above 30 frames per second, which means the model gets rid of the limitation that deep learning models must rely on high-performance servers.

Key words: YOLOv5, polyp lesions, object detection, lightweight, weight pruning

摘要： 结直肠癌是最常见的癌症，死亡率第二。息肉病变是结直肠癌的前兆症状。息肉的发现和切除可有效降低患者早期的死亡率。然而，内窥镜检查过程中会产生大量的图像，这将大大增加医生的工作量，并且长期的机械筛选内镜图像也会导致高误诊率。针对计算机辅助诊断模型在息肉检测任务中严重依赖计算能力的问题，我们提出了一种基于YOLOv5算法的轻量级模型，坐标注意力-YOLOv5-Lite-Prune；这个模型不同于现有研究中提出的最新方法，例如更快的基于区域的卷积神经网络、YOLOv3、YOLOv4和单次多边框检测，这些方法将目标检测模型或其变体直接应用于预测任务而不进行任何轻量级处理。本文模型的创新点如下：首先，引入轻量级的EfficientNetLite作为新的特征提取网络；其次，采用深度可分卷积及其改进模块，采用不同的注意机制取代检测头结构中的标准卷积；然后，利用α-IoU损失函数提高模型的精度和收敛速度；最后，利用剪枝算法压缩模型大小。我们的模型有效地减少了参数的数量和计算复杂度，并且没有明显的精度损失。因此，该模型可以成功部署在嵌入式深度学习平台上，并以每秒30帧以上的速度检测息肉，这意味着该模型摆脱了深度学习模型必须依赖高性能服务器的限制。

关键词: YOLOv5，息肉病变，目标检测，轻量化，权值剪枝

CLC Number:

TP391
R735

Si Bingqi, Pang Chenxi, Wang Zhiwu, Jiang Pingping, Yan Guozheng. Real-Time Lightweight Convolutional Neural Network for Polyp Detection in Endoscope Images[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(3): 521-534.

References

[1] THANIKACHALAM K, KHAN G. Colorectal cancer and nutrition [J]. Nutrients, 2019, 11(1): 164.

[2] SUNG H, FERLAY J, SIEGEL R L, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries [J]. CA: A Cancer Journal for Clinicians, 2021, 71(3): 209-249.

[3] BRAY F, FERLAY J, SOERJOMATARAM I, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries [J]. CA: A Cancer Journal for Clinicians, 2018, 68(6): 394-424.

[4] SIMON K. Colorectal cancer development and advances in screening [J]. Clinical Interventions in Aging, 2016, 11: 967-976.

[5] LOEVE F, BOER R, ZAUBER A G, et al. National polyp study data: Evidence for regression of adenomas [J]. International Journal of Cancer, 2004, 111(4): 633-639.

[6] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector [M]//European conference on computer vision. Amsterdam: Springer, 2016: 21-37.

[7] BURLING D, International Collaboration for CT Colonography Standards. CT colonography standards [J]. Clinical Radiology, 2010, 65(6): 474-480.

[8] COX B F, STEWART F, LAY H, et al. Ultrasound capsule endoscopy: Sounding out the future [J]. Annals of Translational Medicine, 2017, 5(9): 201.

[9] SIEGEL R L, MILLER K D, FEDEWA S A, et al. Colorectal cancer statistics, 2017 [J]. CA: A Cancer Journal for Clinicians, 2017, 67(3): 177-193.

[10] GUO Z, ZHANG R Y, LI Q, et al. Reduce falsepositive rate by active learning for automatic polyp detection in colonoscopy videos [C]//2020 IEEE 17th International Symposium on Biomedical Imaging. Iowa City: IEEE, 2020: 1655-1658.

[11] NOGUEIRA-RODR´IGUEZ A, DOM´INGUEZCARBAJALES R, CAMPOS-TATO F, et al. Real-time polyp detection model using convolutional neural networks [J]. Neural Computing and Applications, 2022, 34(13): 10375-10396.

[12] SONG E M, PARK B, HA C A, et al. Endoscopic diagnosis and treatment planning for colorectal polyps using a deep-learning model [J]. Scientific Reports, 2020, 10: 30.

[13] XU JW, ZHAO R, YU Y Z, et al. Real-time automatic polyp detection in colonoscopy using feature enhancement module and spatiotemporal similarity correlation unit [J]. Biomedical Signal Processing and Control, 2021, 66: 102503.

[14] CAO C T, WANG R L, YU Y, et al. Gastric polyp detection in gastroscopic images using deep neural network [J]. PLoS One, 2021, 16(4): e0250632.

[15] CHEN B L, WAN J J, CHEN T Y, et al. A selfattention based faster R-CNN for polyp detection from colonoscopy images [J]. Biomedical Signal Processing and Control, 2021, 70: 103019.

[16] QIAN Z Q, JING W J, LV Y, et al. Automatic polyp detection by combining conditional generative adversarial network and modified you-only-look-once [J]. IEEE Sensors Journal, 2022, 22(11): 10841-10849.

[17] PASCUAL G, LAIZ P, GARC ´ IA A, et al. Timebased self-supervised learning forWireless Capsule Endoscopy [J]. Computers in Biology and Medicine, 2022, 146: 105631.

[18] PACAL I, KARABOGA D. A robust real-time deep learning based automatic polyp detection system [J]. Computers in Biology and Medicine, 2021, 134: 104519.

[19] PACAL I, KARAMAN A, KARABOGA D, et al. An efficient real-time colonic polyp detection with YOLO algorithms trained by using negative samples and large datasets [J]. Computers in Biology and Medicine, 2022, 141: 105031.

[20] WANG C Y, MARK LIAO H Y, WU Y H, et al. CSPNet: A new backbone that can enhance learning capability of CNN [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle: IEEE, 2020: 1571-1580.

[21] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.

[22] LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 8759-8768.

[23] TAN M X, LE Q V. EfficientNet: Rethinking model scaling for convolutional neural networks [DB/OL]. (2019-05-28). https://arxiv.org/abs/1905.11946

[24] LIU R. Higher accuracy on vision models with EfficientNet-Lite. TensorFlow Blog [EB/OL]. (2020- 03-16). https://blog.tensorflow.org/2020/03/higheraccuracy- on-vision-models-with-efficientnet-lite.html? continueFlag=fc4c98f37325a2fd6989afa002d20bec

[25] HE J B, ERFANI S, MA X J, et al. Alpha-IoU: A family of power intersection over union losses for bounding box regression [DB/OL]. (2021-10-26). https://arxiv.org/abs/2110.13675

[26] BOX G E P, COX D R. An analysis of transformations [J]. Journal of the Royal Statistical Society: Series B (Methodological ), 1964, 26(2): 211-243.

[27] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [M]//Computer vision– ECCV 2018. Munich: Springer, 2018: 3-19.

[28] HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.

[29] WANG Q L, WU B G, ZHU P F, et al. ECA-net: Efficient channel attention for deep convolutional neural networks [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11531-11539.

[30] HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13708-13717.

[31] IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift [C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37. New York: ACM, 2015: 448-456.

[32] ODAGAWA M. Implementation of real-time computer-aided diagnosis system with quantitative staging and navigation on customizable embedded digital signal processor [D]. Hiroshima: Hiroshima University, 2021 (in Japanese).

[33] KRENZER A, BANCK M, MAKOWSKI K, et al. A real-time polyp-detection system with clinical application in colonoscopy using deep convolutional neural networks [J]. Journal of Imaging, 2023, 9(2): 26.

[34] BERNAL J, TAJKBAKSH N, SANCHEZ F J, et al. Comparative validation of polyp detection methods in video colonoscopy: Results from the MICCAI 2015 endoscopic vision challenge [J]. IEEE Transactions on Medical Imaging, 2017, 36(6): 1231-1249.

[35] MESEJO P, PIZARRO D, ABERGEL A, et al. Computer-aided classification of gastrointestinal lesions in regular colonoscopy [J]. IEEE Transactions on Medical Imaging, 2016, 35(9): 2051-2063.

[36] BORGLI H, THAMBAWITA V, SMEDSRUD P H, et al. HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy [J]. Scientific Data, 2020, 7: 283.

[37] JHA D, SMEDSRUD P H, RIEGLER M A, et al. Kvasir-SEG: A segmented polyp dataset [C]//International Conference on Multimedia Modeling. Daejeon: Springer, 2020: 451-462.

[38] YANG Y J. The future of capsule endoscopy: The role of artificial intelligence and other technical advancements [J]. Clinical Endoscopy, 2020, 53(4): 387-394.

[39] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. Scaled-YOLOv4: Scaling cross stage partial network [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13024-13033.

[40] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-ofthe- art for real-time object detectors [DB/OL]. (2022- 07-06). https://arxiv.org/abs/2207.02696

[41] GE Z, LIU S T, WANG F, et al. YOLOX: Exceeding YOLO series in 2021 [DB/OL]. (2021-07-18). https://arxiv.org/abs/2107.08430

[42] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2999-3007.

[43] REN S Q, HE K M, GIRSHICK R, et al. Faster RCNN: Towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137- 1149.

[44] HOWARD A, SANDLERM, CHEN B, et al. Searching for MobileNetV3 [C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 1314-1324.

[45] ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6848-6856.

[46] HAN K, WANG Y H, TIAN Q, et al. GhostNet: more features from cheap operations [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 1577-1586.

[47] TAN M X, LE Q V. EfficientNet: Rethinking model scaling for convolutional neural networks [DB/OL]. (2019-05-28). https://arxiv.org/abs/1905.11946

[48] IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size [DB/OL]. (2016- 02-24). https://arxiv.org/abs/1602.07360

[49] JOCHER G, STOKEN A, BOROVEC J, et al. Ultralytics/ yolov5: v5.0-YOLOv5-P6 1280 models, AWS, Supervise.ly and YouTube integrations [EB/OL]. (2021-04-11). https://zenodo.org/records/4679653