Medicine-Engineering Interdisciplinary

Real-Time Lightweight Convolutional Neural Network for Polyp Detection in Endoscope Images

Expand
  • 1. School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; 2. Shanghai Engineering Research Center of Intelligent Drug Detoxification and Rehabilitation, Shanghai 200240, China; 3. College of Computer Science and Technology, Jilin University, Changchun 130012, China

Received date: 2022-11-14

  Accepted date: 2023-02-27

  Online published: 2025-06-06

Abstract

Colorectal cancer is the most common cancer with a second mortality rate. Polyp lesion is a precursor symptom of colorectal cancer. Detection and removal of polyps can effectively reduce the mortality of patients in the early period. However, mass images will be generated during an endoscopy, which will greatly increase the workload of doctors, and long-term mechanical screening of endoscopy images will also lead to a high misdiagnosis rate. Aiming at the problem that computer-aided diagnosis models deeply depend on the computational power in the polyp detection task, we propose a lightweight model, coordinate attention-YOLOv5-Lite-Prune, based on the YOLOv5 algorithm, which is different from state-of-the-art methods proposed by the existing research that applied object detection models or their variants directly to prediction task without any lightweight processing, such as faster region-based convolutional neural networks, YOLOv3, YOLOv4, and single shot multibox detector. The innovations of our model are as follows: First, the lightweight EfficientNetLite network is introduced as the new feature extraction network. Second, the depthwise separable convolution and its improved modules with different attention mechanisms are used to replace the standard convolution in the detection head structure. Then, the α-intersection over union loss function is applied to improve the precision and convergence speed of the model. Finally, the model size is compressed with a pruning algorithm. Our model effectively reduces parameter amount and computational complexity without significant accuracy loss. Therefore, the model can be successfully deployed on the embedded deep learning platform, and detect polyps with a speed above 30 frames per second, which means the model gets rid of the limitation that deep learning models must rely on high-performance servers.

Cite this article

Si Bingqi, Pang Chenxi, Wang Zhiwu, Jiang Pingping, Yan Guozheng . Real-Time Lightweight Convolutional Neural Network for Polyp Detection in Endoscope Images[J]. Journal of Shanghai Jiaotong University(Science), 2025 , 30(3) : 521 -534 . DOI: 10.1007/s12204-023-2671-2

References

[1] THANIKACHALAM K, KHAN G. Colorectal cancer  and nutrition [J]. Nutrients, 2019, 11(1): 164. 

[2] SUNG H, FERLAY J, SIEGEL R L, et al. Global  cancer statistics 2020: GLOBOCAN estimates of incidence  and mortality worldwide for 36 cancers in 185  countries [J]. CA: A Cancer Journal for Clinicians, 2021, 71(3): 209-249.

[3] BRAY F, FERLAY J, SOERJOMATARAM I, et al. Global cancer statistics 2018: GLOBOCAN estimates  of incidence and mortality worldwide for 36 cancers in 185 countries [J]. CA: A Cancer Journal for Clinicians, 2018, 68(6): 394-424. 

[4] SIMON K. Colorectal cancer development and advances  in screening [J]. Clinical Interventions in Aging, 2016, 11: 967-976. 

[5] LOEVE F, BOER R, ZAUBER A G, et al. National  polyp study data: Evidence for regression of adenomas [J]. International Journal of Cancer, 2004, 111(4): 633-639. 

[6] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single  shot MultiBox detector [M]//European conference on  computer vision. Amsterdam: Springer, 2016: 21-37. 

[7] BURLING D, International Collaboration for CT Colonography Standards. CT colonography standards [J]. Clinical Radiology, 2010, 65(6): 474-480. 

[8] COX B F, STEWART F, LAY H, et al. Ultrasound  capsule endoscopy: Sounding out the future [J]. Annals  of Translational Medicine, 2017, 5(9): 201. 

[9] SIEGEL R L, MILLER K D, FEDEWA S A, et al. Colorectal cancer statistics, 2017 [J]. CA: A Cancer Journal for Clinicians, 2017, 67(3): 177-193. 

[10] GUO Z, ZHANG R Y, LI Q, et al. Reduce falsepositive  rate by active learning for automatic polyp  detection in colonoscopy videos [C]//2020 IEEE 17th International Symposium on Biomedical Imaging. Iowa City: IEEE, 2020: 1655-1658. 

[11] NOGUEIRA-RODR´IGUEZ A, DOM´INGUEZCARBAJALES R, CAMPOS-TATO F, et al. Real-time polyp detection model using convolutional  neural networks [J]. Neural Computing and Applications, 2022, 34(13): 10375-10396. 

[12] SONG E M, PARK B, HA C A, et al. Endoscopic diagnosis  and treatment planning for colorectal polyps using  a deep-learning model [J]. Scientific Reports, 2020, 10: 30. 

[13] XU JW, ZHAO R, YU Y Z, et al. Real-time automatic  polyp detection in colonoscopy using feature enhancement  module and spatiotemporal similarity correlation  unit [J]. Biomedical Signal Processing and Control, 2021, 66: 102503. 

[14] CAO C T, WANG R L, YU Y, et al. Gastric polyp detection  in gastroscopic images using deep neural network [J]. PLoS One, 2021, 16(4): e0250632. 

[15] CHEN B L, WAN J J, CHEN T Y, et al. A selfattention  based faster R-CNN for polyp detection from  colonoscopy images [J]. Biomedical Signal Processing  and Control, 2021, 70: 103019. 

[16] QIAN Z Q, JING W J, LV Y, et al. Automatic polyp  detection by combining conditional generative adversarial  network and modified you-only-look-once [J]. IEEE Sensors Journal, 2022, 22(11): 10841-10849. 

[17] PASCUAL G, LAIZ P, GARC ´ IA A, et al. Timebased  self-supervised learning forWireless Capsule Endoscopy [J]. Computers in Biology and Medicine, 2022, 146: 105631. 

[18] PACAL I, KARABOGA D. A robust real-time deep  learning based automatic polyp detection system [J]. Computers in Biology and Medicine, 2021, 134: 104519. 

[19] PACAL I, KARAMAN A, KARABOGA D, et al. An  efficient real-time colonic polyp detection with YOLO  algorithms trained by using negative samples and large  datasets [J]. Computers in Biology and Medicine, 2022, 141: 105031. 

[20] WANG C Y, MARK LIAO H Y, WU Y H, et al. CSPNet: A new backbone that can enhance learning capability  of CNN [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle: IEEE, 2020: 1571-1580. 

[21] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid  pooling in deep convolutional networks for visual  recognition [J]. IEEE Transactions on Pattern Analysis  and Machine Intelligence, 2015, 37(9): 1904-1916. 

[22] LIU S, QI L, QIN H F, et al. Path aggregation network  for instance segmentation [C]//2018 IEEE/CVF Conference  on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 8759-8768.

[23] TAN M X, LE Q V. EfficientNet: Rethinking model  scaling for convolutional neural networks [DB/OL]. (2019-05-28). https://arxiv.org/abs/1905.11946 

[24] LIU R. Higher accuracy on vision models with EfficientNet-Lite. TensorFlow Blog [EB/OL]. (2020- 03-16). https://blog.tensorflow.org/2020/03/higheraccuracy-  on-vision-models-with-efficientnet-lite.html?  continueFlag=fc4c98f37325a2fd6989afa002d20bec 

[25] HE J B, ERFANI S, MA X J, et al. Alpha-IoU: A family of power intersection over union losses  for bounding box regression [DB/OL]. (2021-10-26). https://arxiv.org/abs/2110.13675 

[26] BOX G E P, COX D R. An analysis of transformations [J]. Journal of the Royal Statistical Society: Series B (Methodological ), 1964, 26(2): 211-243. 

[27] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional  block attention module [M]//Computer vision– ECCV 2018. Munich: Springer, 2018: 3-19. 

[28] HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141. 

[29] WANG Q L, WU B G, ZHU P F, et al. ECA-net: Efficient  channel attention for deep convolutional neural  networks [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11531-11539. 

[30] HOU Q B, ZHOU D Q, FENG J S. Coordinate attention  for efficient mobile network design [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13708-13717. 

[31] IOFFE S, SZEGEDY C. Batch normalization: Accelerating  deep network training by reducing internal covariate  shift [C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37. New York: ACM, 2015: 448-456. 

[32] ODAGAWA M. Implementation of real-time  computer-aided diagnosis system with quantitative  staging and navigation on customizable embedded  digital signal processor [D]. Hiroshima: Hiroshima University, 2021 (in Japanese). 

[33] KRENZER A, BANCK M, MAKOWSKI K, et al. A  real-time polyp-detection system with clinical application  in colonoscopy using deep convolutional neural  networks [J]. Journal of Imaging, 2023, 9(2): 26. 

[34] BERNAL J, TAJKBAKSH N, SANCHEZ F J, et al. Comparative validation of polyp detection methods in  video colonoscopy: Results from the MICCAI 2015 endoscopic  vision challenge [J]. IEEE Transactions on Medical Imaging, 2017, 36(6): 1231-1249. 

[35] MESEJO P, PIZARRO D, ABERGEL A, et al. Computer-aided classification of gastrointestinal lesions  in regular colonoscopy [J]. IEEE Transactions on Medical Imaging, 2016, 35(9): 2051-2063. 

[36] BORGLI H, THAMBAWITA V, SMEDSRUD P H,  et al. HyperKvasir, a comprehensive multi-class image  and video dataset for gastrointestinal endoscopy [J]. Scientific Data, 2020, 7: 283. 

[37] JHA D, SMEDSRUD P H, RIEGLER M A,  et al. Kvasir-SEG: A segmented polyp dataset [C]//International Conference on Multimedia Modeling. Daejeon: Springer, 2020: 451-462. 

[38] YANG Y J. The future of capsule endoscopy: The role  of artificial intelligence and other technical advancements [J]. Clinical Endoscopy, 2020, 53(4): 387-394. 

[39] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. Scaled-YOLOv4: Scaling cross stage partial network [C]//2021 IEEE/CVF Conference on Computer Vision  and Pattern Recognition. Nashville: IEEE, 2021: 13024-13033. 

[40] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-ofthe-  art for real-time object detectors [DB/OL]. (2022- 07-06). https://arxiv.org/abs/2207.02696 

[41] GE Z, LIU S T, WANG F, et al. YOLOX: Exceeding YOLO series in 2021 [DB/OL]. (2021-07-18). https://arxiv.org/abs/2107.08430 

[42] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for  dense object detection [C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2999-3007. 

[43] REN S Q, HE K M, GIRSHICK R, et al. Faster RCNN: Towards real-time object detection with region  proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137- 1149. 

[44] HOWARD A, SANDLERM, CHEN B, et al. Searching  for MobileNetV3 [C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 1314-1324. 

[45] ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet:  an extremely efficient convolutional neural network for  mobile devices [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6848-6856. 

[46] HAN K, WANG Y H, TIAN Q, et al. GhostNet: more  features from cheap operations [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 1577-1586. 

[47] TAN M X, LE Q V. EfficientNet: Rethinking model  scaling for convolutional neural networks [DB/OL]. (2019-05-28). https://arxiv.org/abs/1905.11946 

[48] IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer  parameters and <0.5 MB model size [DB/OL]. (2016- 02-24). https://arxiv.org/abs/1602.07360 

[49] JOCHER G, STOKEN A, BOROVEC J, et al. Ultralytics/  yolov5: v5.0-YOLOv5-P6 1280 models, AWS, Supervise.ly and YouTube integrations [EB/OL]. (2021-04-11). https://zenodo.org/records/4679653


Outlines

/