Lightweight Human Pose Estimation Based on Multi-Attention Mechanism

LIN Xiao, LU Meichen, GAO Mufeng, LI Yan

doi:10.1007/s12204-023-2691-y

Journal of Shanghai Jiaotong University(Science) >

2025 , Vol. 30 >Issue 5: 899 - 910

DOI: https://doi.org/10.1007/s12204-023-2691-y

Computing & Computer Technologies

Lightweight Human Pose Estimation Based on Multi-Attention Mechanism

Expand

1. Institute of Artificial Intelligence on Education Research, College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, China; 2. Shanghai Intelligent Education Big Data Engineering Technology Research Center, Shanghai Normal University, Shanghai 200234, China; 3. Shanghai Online Education Research Base for Primary and Secondary Schools, Shanghai 200234, China; 4. School of Physical Education, Shanghai Normal University, Shanghai 200234, China

Received date: 2023-08-03

Accepted date: 2023-08-24

Online published: 2023-12-21

Fold

Abstract

Human pose estimation has received much attention from the research community because of its wide range of applications. However, current research for pose estimation is usually complex and computationally intensive, especially the feature loss problems in the feature fusion process. To address the above problems, we propose a lightweight human pose estimation network based on multi-attention mechanism (LMANet). In our method, network parameters can be significantly reduced by lightweighting the bottleneck blocks with depth-wise separable convolution on the high-resolution networks. After that, we also introduce a multi-attention mechanism to improve the model prediction accuracy, and the channel attention module is added in the initial stage of the network to enhance the local cross-channel information interaction. More importantly, we inject spatial crossawareness module in the multi-scale feature fusion stage to reduce the spatial information loss during feature extraction. Extensive experiments on COCO2017 dataset and MPII dataset show that LMANet can guarantee a higher prediction accuracy with fewer network parameters and computational effort. Compared with the highresolution network HRNet, the number of parameters and the computational complexity of the network are reduced by 67% and 73%, respectively.

Cite this article

LIN Xiao, LU Meichen, GAO Mufeng, LI Yan . Lightweight Human Pose Estimation Based on Multi-Attention Mechanism[J]. Journal of Shanghai Jiaotong University(Science), 2025 , 30(5) : 899 -910 . DOI: 10.1007/s12204-023-2691-y

References

[1] PEI S Y, CHEN A, LEE J, et al. Hand interfaces: Using hands to imitate objects in AR/VR for expressive interactions [C]//Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. New Orleans: ACM, 2022: 1-16.

[2] KHETA K, DELGOVE C, LIU R L, et al. Vision-based conflict detection within crowds based on high-resolution human pose estimation for smart and safe airport [DB/OL]. (2022-07-01). https://arxiv.org/abs/2207.00477

[3] ENDO M, POSTON K L, SULLIVAN E V, et al. GaitForeMer: self-supervised pre-training of transformers via human motion forecasting for few-shot gait impairment severity estimation[M]// Medical image computing and computer assisted intervention – MICCAI 2022. Cham: Springer, 2022: 130-139.

[4] TOSHEV A, SZEGEDY C. DeepPose: human pose estimation via deep neural networks [C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1653-1660.

[5] TOMPSON J, JAIN A, LECUN Y, et al. Joint training of a convolutional network and a graphical model for human pose estimation [C]// 27th International Conference on Neural Information Processing Systems. Montreal: NIPS, 2014: 1799-1807.

[6] WEI S H, RAMAKRISHNA V, KANADE T, et al. Convolutional pose machines [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 4724-4732.

[7] NEWELL A, YANG K Y, DENG J. Stacked hourglass networks for human pose estimation[M]// Computer vision – ECCV 2016. Cham: Springer, 2016: 483-499.

[8] XIAO B, WU H P, WEI Y C. Simple baselines for human pose estimation and tracking[M]// Computer vision – ECCV 2018. Cham: Springer, 2018: 472-487.

[9] SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5686-5696.

[10] WANG Q L, WU B G, ZHU P F, et al. ECA-net: Efficient channel attention for deep convolutional neural networks [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11531-11539.

[11] HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13708-13717.

[12] CHEN Y L, WANG Z C, PENG Y X, et al. Cascaded pyramid network for multi-person pose estimation [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7103-7112.

[13] YU C Q, XIAO B, GAO C X, et al. Lite-HRNet: A lightweight high-resolution network [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 10435-10445.

[14] CAO Z, SIMON T, WEI S H, et al. Realtime multi-person 2D pose estimation using part affinity fields [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1302-1310.

[15] CHENG B W, XIAO B, WANG J D, et al. HigherHRNet: scale-aware representation learning for bottom-up human pose estimation [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 5385-5394.

[16] MCNALLY W, VATS K, WONG A, et al. Rethinking keypoint representations: Modeling keypoints and poses as objects for multi-person human pose estimation[M]// Computer vision – ECCV 2022. Cham: Springer, 2022: 37-54.

[17] LI Z, YE J W, SONG M L, et al. Online knowledge distillation for efficient pose estimation [C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 11720-11730.

[18] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [J]. Communications of the ACM, 2017, 60(6): 84-90.

[19] HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.

[20] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[M]// Computer vision – ECCV 2018. Cham: Springer, 2018: 3-19.

[21] FU J, LIU J, TIAN H J, et al. Dual attention network for scene segmentation [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 3141-3149.

[22] HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3 [C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 1314-1324.

[23] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context [M]//Computer vision – ECCV 2014. Cham: Springer, 2014: 740-755.

[24] ANDRILUKA M, PISHCHULIN L, GEHLER P, et al. 2D human pose estimation: New benchmark and state of the art analysis [C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 3686-3693.

[25] ZHANG Z, TANG J, WU G S. Simple and lightweight human pose estimation [DB/OL]. (2019-11-23). https://arxiv.org/abs/1911.10346

[26] LI Q, ZHANG Z Y, XIAO F, et al. Dite-HRNet: Dynamic lightweight high-resolution network for human pose estimation [DB/OL]. (2022-04-22). https://arxiv.org/abs/2204.10762

[27] MAJI D, NAGORI S, MATHEW M, et al. YOLO-pose: Enhancing YOLO for multi person pose estimation using object keypoint similarity loss [C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New Orleans: IEEE, 2022: 2636-2645.

[28] PAPANDREOU G, ZHU T, CHEN L C, et al. PersonLab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model[M]// Computer vision – ECCV 2018. Cham: Springer, 2018: 282-299.

[29] KOCABAS M, KARAGOZ S, AKBAS E. MultiPoseNet: fast multi-person pose estimation using pose residual network[M]// Computer vision – ECCV 2018. Cham: Springer, 2018: 437-453.

[30] PAPANDREOU G, ZHU T, KANAZAWA N, et al. Towards accurate multi-person pose estimation in the wild [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3711-3719.

[31] CARREIRA J, AGRAWAL P, FRAGKIADAKI K, et al. Human pose estimation with iterative error feedback [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 4733-4742.

[32] GKIOXARI G, TOSHEV A, JAITLY N. Chained predictions using convolutional neural networks[M]// Computer vision – ECCV 2016. Cham: Springer, 2016: 728-743.

[33] WEI S H, RAMAKRISHNA V, KANADE T, et al. Convolutional pose machines [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 4724-4732.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References