J Shanghai Jiaotong Univ Sci ›› 2023, Vol. 28 ›› Issue (3): 360-369.doi: 10.1007/s12204-023-2603-1

• • 上一篇    下一篇

具有增强注意力的前景分割网络

姜锐1,朱瑞祥1,蔡萧萃1,苏虎2   

  1. (1. 上海海事大学 信息工程学院,上海201306;2. 中国科学院 自动化研究所 精密感知与控制研究中心,北京100190)
  • 收稿日期:2022-12-27 修回日期:2023-01-11 接受日期:2023-05-28 出版日期:2023-05-28 发布日期:2023-05-22

Foreground Segmentation Network with Enhanced Attention

JIANG Rui1*(姜﹐锐),ZHU Ruiriang1(朱瑞祥),CAI Xiaocui1(蔡萧萃),SU Hu2(苏虎)   

  1. (1. College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China; 2. Research Center of Precision Sensing and Control, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China)
  • Received:2022-12-27 Revised:2023-01-11 Accepted:2023-05-28 Online:2023-05-28 Published:2023-05-22

摘要: 运动物体分割(MOS)是包括医疗机器人在内的所有机器人视觉系统的基本功能之一。基于深度学习的MOS方法,特别是深度端到端MOS方法,在该领域正得到积极研究。前景分割网络(FgSegNets)是最近提出的代表性深度端到端MOS方法。本研究探索了一种新的机制,通过引入相对较少的参数来提高FgSegNets的空间特征学习能力。具体来说,我们提出了增强注意力(EA)模块,它是注意力模块和轻量级增强模块的并行连接,顺序注意力和残差注意力为其特殊情况。还提出将EA与FgSegNet_v2集成,采用轻量级卷积块注意力模块作为注意力模块,并在编码器的两个最大池化层之后插入EA。派生的新模型名为FgSegNet_v2_EA。消融研究验证了所提出的EA模块和集成策略的有效性。CDnet2014数据集上的实验结果显示,FgSegNet_v2_EA在场景相关评估和场景无关评估设置下分别比FgSegNet_v2提高了0.08%和14.5%,这表明EA对提高FgSegNet_v2的空间特征学习能力具有积极作用。

关键词: 人机交互,运动目标分割,前景分割网络,增强注意力,卷积块注意力模块

Abstract: Moving object segmentation (MOS) is one of the essential functions of the vision system of all robots,including medical robots. Deep learning-based MOS methods, especially deep end-to-end MOS methods, are actively investigated in this field. Foreground segmentation networks (FgSegNets) are representative deep end-to-end MOS methods proposed recently. This study explores a new mechanism to improve the spatial feature learning capability of FgSegNets with relatively few brought parameters. Specifically, we propose an enhanced attention (EA) module, a parallel connection of an attention module and a lightweight enhancement module, with sequential attention and residual attention as special cases. We also propose integrating EA with FgSegNet v2 by taking the lightweight convolutional block attention module as the attention module and plugging EA module after the two Maxpooling layers of the encoder. The derived new model is named FgSegNet v2 EA. The ablation study verifies the effectiveness of the proposed EA module and integration strategy. The results on the CDnet2014 dataset, which depicts human activities and vehicles captured in different scenes, show that FgSegNet v2 EA outperforms FgSegNet v2 by 0.08% and 14.5% under the settings of scene dependent evaluation and scene independent evaluation, respectively, which indicates the positive effect of EA on improving spatial feature learning capability of FgSegNet v2.

Key words: human-computer interaction, moving object segmentation, foreground segmentation network,enhanced attention, convolutional block attention module

中图分类号: