Moving object segmentation (MOS) is one of the essential functions of the vision system of all robots,including medical robots. Deep learning-based MOS methods, especially deep end-to-end MOS methods, are actively investigated in this field. Foreground segmentation networks (FgSegNets) are representative deep end-to-end
MOS methods proposed recently. This study explores a new mechanism to improve the spatial feature learning capability of FgSegNets with relatively few brought parameters. Specifically, we propose an enhanced attention (EA) module, a parallel connection of an attention module and a lightweight enhancement module, with sequential attention and residual attention as special cases. We also propose integrating EA with FgSegNet v2 by taking the lightweight convolutional block attention module as the attention module and plugging EA module after the two
Maxpooling layers of the encoder. The derived new model is named FgSegNet v2 EA. The ablation study verifies the effectiveness of the proposed EA module and integration strategy. The results on the CDnet2014 dataset, which depicts human activities and vehicles captured in different scenes, show that FgSegNet v2 EA outperforms FgSegNet v2 by 0.08% and 14.5% under the settings of scene dependent evaluation and scene independent evaluation, respectively, which indicates the positive effect of EA on improving spatial feature learning capability of FgSegNet v2.
[1]MANDAL M, VIPPARTHI S K. An empirical review of deep learning frameworks for change detection: Model design, experimental frameworks, challenges and research needs [J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(7): 6101-6122.
[2]BOUWMANS T, JAVED S, SULTANA M, et al. Deep neural network concepts for background subtraction: A systematic review and comparative evaluation [J]. Neural Networks, 2019, 117: 8-66.
[3]RAMAMOORTHY M, BANU U S. Video enhancement for medical and surveillance applications [J]. Current Medical Imaging Reviews, 2017, 13(2): 195-203.
[4]CHEN M Q, ZHENG Y F, MUELLER K, et al. Enhancement of organ of interest via background subtraction in cone beam rotational angiocardiogram [C]//2012 9th IEEE International Symposium on Biomedical Imaging. Barcelona: IEEE, 2012: 622-625.
[5]JIANG R, ZHU R, SU H, et al. Deep learning-based moving object segmentation: Recent progress and research prospects [J]. Machine Intelligence Research, 2023. http://doi.org/10.1007/s11633-022-1378-4
[6]LIM L A, YALIM KELES H. Foreground segmentation using convolutional neural networks for multiscale feature encoding [J]. Pattern Recognition Letters, 2018, 112: 256-262.
[7]LIM L A, KELES H Y. Learning multi-scale features for foreground segmentation [J]. Pattern Analysis and Applications, 2020, 23(3): 1369-1380.
[8]WANG Y, JODOIN P M, PORIKLI F, et al. CD-net 2014: An expanded change detection benchmark dataset [C]//2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Columbus: IEEE, 2014: 393-400.
[9]TEZCAN M O, ISHWAR P, KONRAD J. BSUV-net: A fully-convolutional neural network for background subtraction of unseen videos [C]//2020 IEEE Winter Conference on Applications of Computer Vision. Snowmass: IEEE, 2020: 2763-2772.
[10]YANG Y Z, RUAN J H, ZHANG Y Q, et al. STP-Net: A spatial-temporal propagation network for back-ground subtraction [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(4): 2145-2157.
[11]ZHANG J, ZHANG X, ZHANG Y Y, et al. Meta-knowledge learning and domain adaptation for unseen background subtraction [J]. IEEE Transactions on Image Processing, 2021, 30: 9058-9068.
[12]POSNER M I, PETERSEN S E. The attention system of the human brain [J]. Annual Review of Neuroscience, 1990, 13: 25-42.
[13]GUO M H, XU T X, LIU J J, et al. Attention mechanisms in computer vision: A survey [J]. Computational Visual Media, 2022, 8(3): 331-368.
[14]DE SANTANA CORREIA A, COLOMBINI E L. Attention, please! A survey of neural attention models in deep learning [J]. Artificial Intelligence Review, 2022, 55(8): 6037-6124.
[15]PATIL P W, DUDHANE A, MURALA S, et al. Deep adversarial network for scene independent moving object segmentation [J]. IEEE Signal Processing Letters, 2021, 28: 489-493.
[16]SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [DB/OL]. (2014-09-04). https://arxiv.org/abs/ 1409.1556
[17]AKILAN T, JONATHAN WU Q M, ZHANG W D. Video foreground extraction using multi-view receptive field and encoder–decoder DCNN for traffic and surveillance applications [J]. IEEE Transactions on Vehicular Technology, 2019, 68(10): 9478-9493.
[18]AKILAN T, JONATHAN WU Q M. sEnDec: An improved image to image CNN for foreground localization [J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(10): 4435-4443.
[19]LIANG D, WEI Z Q, SUN H, et al. Robust cross-scene foreground segmentation in surveillance video [C]//2021 IEEE International Conference on Multimedia and Expo. Shenzhen: IEEE, 2021: 1-6.
[20]MANDAL M, DHAR V, MISHRA A, et al. 3DFR: A swift 3D feature reductionist framework for scene independent change detection [J]. IEEE Signal Processing Letters, 2019, 26(12): 1882-1886.
[21]MANDAL M, DHAR V, MISHRA A, et al. 3DCD: Scene independent end-to-end spatiotemporal feature learning framework for change detection in unseen videos [J]. IEEE Transactions on Image Processing, 2021, 30: 546-558.
[22]AKILAN T, WU Q J, SAFAEI A, et al. A 3D CNN-LSTM-based image-to-image foreground segmentation [J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(3): 959-971.
[23]TUNG H, ZHENG C, MAO X S, et al. Multi-lead ECG classification via an information-based attention convolutional neural network [J]. Journal of Shanghai Jiao Tong University (Science), 2022, 27(1): 55-69.
[24]HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
[25]LIU J J, HOU Q B, CHENG M M, et al. Improving convolutional networks with self-calibrated convolutions [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10093-10102.
[26]WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [M]//Computer vision – ECCV 2018. Cham: Springer, 2018: 3-19.
[27]PARK J, WOO S, LEE J Y, et al. BAM: Bottleneck attention module [DB/OL]. (2018-07-17). https://arxiv.org/abs/1807.06514
[28]CHEN Y Y, WANG J Q, ZHU B K, et al. Pixelwise deep sequence learning for moving object detection [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(9): 2567-2579.
[29]LIANG D, LIU X Y. Coarse-to-fine foreground segmentation based on Co-occurrence pixel-block and spatio-temporal attention model [C]//2020 25th International Conference on Pattern Recognition. Milan: IEEE, 2021: 3807-3813.
[30]LIANG D, KANG B, LIU X Y, et al. Cross-scene foreground segmentation with supervised and unsupervised model communication [J]. Pattern Recognition, 2021, 117: 107995.
[31]TANG Y Q, ZHANG X, CHEN D H, et al. Motion-augmented change detection for video surveillance [C]//2021 IEEE 23rd International Workshop on Mul-timedia Signal Processing. Tampere: IEEE, 2021: 1-6.
[32]HE K M, ZHANG X Y, REN S Q, et al. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification [C]//2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1026-1034.
[33]ZENG D D, ZHU M. Background subtraction using multiscale fully convolutional network [J]. IEEE Access, 2018, 6: 16010-16021.
[34]