J Shanghai Jiaotong Univ Sci ›› 2025, Vol. 30 ›› Issue (3): 521-534.doi: 10.1007/s12204-023-2671-2

• Medicine-Engineering Interdisciplinary • Previous Articles     Next Articles

Real-Time Lightweight Convolutional Neural Network for Polyp Detection in Endoscope Images

用于内窥镜图像息肉检测的实时轻量级卷积神经网络

司丙奇1,2,逄晨曦3,王志武1,2,姜萍萍1,2,颜国正1,2   

  1. 1. School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; 2. Shanghai Engineering Research Center of Intelligent Drug Detoxification and Rehabilitation, Shanghai 200240, China; 3. College of Computer Science and Technology, Jilin University, Changchun 130012, China
  2. 1.上海交通大学 电子信息与电气工程学院,上海 200240;2.上海智慧戒毒与康复工程技术研究中心,上海 200240;3.吉林大学 计算机科学与技术学院,长春 130012
  • Received:2022-11-14 Accepted:2023-02-27 Online:2025-06-06 Published:2025-06-06

Abstract: Colorectal cancer is the most common cancer with a second mortality rate. Polyp lesion is a precursor symptom of colorectal cancer. Detection and removal of polyps can effectively reduce the mortality of patients in the early period. However, mass images will be generated during an endoscopy, which will greatly increase the workload of doctors, and long-term mechanical screening of endoscopy images will also lead to a high misdiagnosis rate. Aiming at the problem that computer-aided diagnosis models deeply depend on the computational power in the polyp detection task, we propose a lightweight model, coordinate attention-YOLOv5-Lite-Prune, based on the YOLOv5 algorithm, which is different from state-of-the-art methods proposed by the existing research that applied object detection models or their variants directly to prediction task without any lightweight processing, such as faster region-based convolutional neural networks, YOLOv3, YOLOv4, and single shot multibox detector. The innovations of our model are as follows: First, the lightweight EfficientNetLite network is introduced as the new feature extraction network. Second, the depthwise separable convolution and its improved modules with different attention mechanisms are used to replace the standard convolution in the detection head structure. Then, the α-intersection over union loss function is applied to improve the precision and convergence speed of the model. Finally, the model size is compressed with a pruning algorithm. Our model effectively reduces parameter amount and computational complexity without significant accuracy loss. Therefore, the model can be successfully deployed on the embedded deep learning platform, and detect polyps with a speed above 30 frames per second, which means the model gets rid of the limitation that deep learning models must rely on high-performance servers.

Key words: YOLOv5, polyp lesions, object detection, lightweight, weight pruning

摘要: 结直肠癌是最常见的癌症,死亡率第二。息肉病变是结直肠癌的前兆症状。息肉的发现和切除可有效降低患者早期的死亡率。然而,内窥镜检查过程中会产生大量的图像,这将大大增加医生的工作量,并且长期的机械筛选内镜图像也会导致高误诊率。针对计算机辅助诊断模型在息肉检测任务中严重依赖计算能力的问题,我们提出了一种基于YOLOv5算法的轻量级模型,坐标注意力-YOLOv5-Lite-Prune;这个模型不同于现有研究中提出的最新方法,例如更快的基于区域的卷积神经网络、YOLOv3、YOLOv4和单次多边框检测,这些方法将目标检测模型或其变体直接应用于预测任务而不进行任何轻量级处理。本文模型的创新点如下:首先,引入轻量级的EfficientNetLite作为新的特征提取网络;其次,采用深度可分卷积及其改进模块,采用不同的注意机制取代检测头结构中的标准卷积;然后,利用α-IoU损失函数提高模型的精度和收敛速度;最后,利用剪枝算法压缩模型大小。我们的模型有效地减少了参数的数量和计算复杂度,并且没有明显的精度损失。因此,该模型可以成功部署在嵌入式深度学习平台上,并以每秒30帧以上的速度检测息肉,这意味着该模型摆脱了深度学习模型必须依赖高性能服务器的限制。

关键词: YOLOv5,息肉病变,目标检测,轻量化,权值剪枝

CLC Number: