J Shanghai Jiaotong Univ Sci ›› 2025, Vol. 30 ›› Issue (5): 899-910.doi: 10.1007/s12204-023-2691-y
收稿日期:
2023-08-03
接受日期:
2023-08-24
出版日期:
2025-09-26
发布日期:
2023-12-21
林晓1,2,3, 陆美晨1,3 , 高幕峰4, 李岩1,2
Received:
2023-08-03
Accepted:
2023-08-24
Online:
2025-09-26
Published:
2023-12-21
摘要: 人体姿态估计因其广泛的应用场景而受到研究界的关注,然而现有的网络结构通常结构复杂计算量大,并且在特征融合过程中存在特征丢失问题。针对上述问题,提出一个轻量化的基于多重注意力机制的人体姿态估计网络LMANet。在高分辨率网络的基础上利用深度可分离卷积对瓶颈块进行轻量化处理,能够大幅度减少网络参数量;之后引入多重注意力机制提高模型预测精度,在网络初始阶段加入通道注意力模块增强局部跨通道的信息交互;在多尺度特征融合阶段引入空间注意力机制,通过空间交叉感知模块减少特征提取过程中空间信息损失。在COCO2017数据集和MPII数据集上的实验结果表明,LMANet能够在较少的参数量和计算量的情况下保证较高的预测精确度;相较于高分辨率网络HRNet,网络的参数量和计算复杂度分别减少67%和73%。
中图分类号:
. 基于多注意力机制的轻量化人体姿态估计[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 899-910.
LIN Xiao, LU Meichen, GAO Mufeng, LI Yan. Lightweight Human Pose Estimation Based on Multi-Attention Mechanism[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 899-910.
[1] PEI S Y, CHEN A, LEE J, et al. Hand interfaces: Using hands to imitate objects in AR/VR for expressive interactions [C]//Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. New Orleans: ACM, 2022: 1-16. [2] KHETA K, DELGOVE C, LIU R L, et al. Vision-based conflict detection within crowds based on high-resolution human pose estimation for smart and safe airport [DB/OL]. (2022-07-01). https://arxiv.org/abs/2207.00477 [3] ENDO M, POSTON K L, SULLIVAN E V, et al. GaitForeMer: self-supervised pre-training of transformers via human motion forecasting for few-shot gait impairment severity estimation[M]// Medical image computing and computer assisted intervention – MICCAI 2022. Cham: Springer, 2022: 130-139. [4] TOSHEV A, SZEGEDY C. DeepPose: human pose estimation via deep neural networks [C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1653-1660. [5] TOMPSON J, JAIN A, LECUN Y, et al. Joint training of a convolutional network and a graphical model for human pose estimation [C]// 27th International Conference on Neural Information Processing Systems. Montreal: NIPS, 2014: 1799-1807. [6] WEI S H, RAMAKRISHNA V, KANADE T, et al. Convolutional pose machines [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 4724-4732. [7] NEWELL A, YANG K Y, DENG J. Stacked hourglass networks for human pose estimation[M]// Computer vision – ECCV 2016. Cham: Springer, 2016: 483-499. [8] XIAO B, WU H P, WEI Y C. Simple baselines for human pose estimation and tracking[M]// Computer vision – ECCV 2018. Cham: Springer, 2018: 472-487. [9] SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5686-5696. [10] WANG Q L, WU B G, ZHU P F, et al. ECA-net: Efficient channel attention for deep convolutional neural networks [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11531-11539. [11] HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13708-13717. [12] CHEN Y L, WANG Z C, PENG Y X, et al. Cascaded pyramid network for multi-person pose estimation [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7103-7112. [13] YU C Q, XIAO B, GAO C X, et al. Lite-HRNet: A lightweight high-resolution network [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 10435-10445. [14] CAO Z, SIMON T, WEI S H, et al. Realtime multi-person 2D pose estimation using part affinity fields [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1302-1310. [15] CHENG B W, XIAO B, WANG J D, et al. HigherHRNet: scale-aware representation learning for bottom-up human pose estimation [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 5385-5394. [16] MCNALLY W, VATS K, WONG A, et al. Rethinking keypoint representations: Modeling keypoints and poses as objects for multi-person human pose estimation[M]// Computer vision – ECCV 2022. Cham: Springer, 2022: 37-54. [17] LI Z, YE J W, SONG M L, et al. Online knowledge distillation for efficient pose estimation [C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 11720-11730. [18] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [J]. Communications of the ACM, 2017, 60(6): 84-90. [19] HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141. [20] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[M]// Computer vision – ECCV 2018. Cham: Springer, 2018: 3-19. [21] FU J, LIU J, TIAN H J, et al. Dual attention network for scene segmentation [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 3141-3149. [22] HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3 [C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 1314-1324. [23] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context [M]//Computer vision – ECCV 2014. Cham: Springer, 2014: 740-755. [24] ANDRILUKA M, PISHCHULIN L, GEHLER P, et al. 2D human pose estimation: New benchmark and state of the art analysis [C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 3686-3693. [25] ZHANG Z, TANG J, WU G S. Simple and lightweight human pose estimation [DB/OL]. (2019-11-23). https://arxiv.org/abs/1911.10346 [26] LI Q, ZHANG Z Y, XIAO F, et al. Dite-HRNet: Dynamic lightweight high-resolution network for human pose estimation [DB/OL]. (2022-04-22). https://arxiv.org/abs/2204.10762 [27] MAJI D, NAGORI S, MATHEW M, et al. YOLO-pose: Enhancing YOLO for multi person pose estimation using object keypoint similarity loss [C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New Orleans: IEEE, 2022: 2636-2645. [28] PAPANDREOU G, ZHU T, CHEN L C, et al. PersonLab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model[M]// Computer vision – ECCV 2018. Cham: Springer, 2018: 282-299. [29] KOCABAS M, KARAGOZ S, AKBAS E. MultiPoseNet: fast multi-person pose estimation using pose residual network[M]// Computer vision – ECCV 2018. Cham: Springer, 2018: 437-453. [30] PAPANDREOU G, ZHU T, KANAZAWA N, et al. Towards accurate multi-person pose estimation in the wild [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3711-3719. [31] CARREIRA J, AGRAWAL P, FRAGKIADAKI K, et al. Human pose estimation with iterative error feedback [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 4733-4742. [32] GKIOXARI G, TOSHEV A, JAITLY N. Chained predictions using convolutional neural networks[M]// Computer vision – ECCV 2016. Cham: Springer, 2016: 728-743. [33] WEI S H, RAMAKRISHNA V, KANADE T, et al. Convolutional pose machines [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 4724-4732. |
[1] | . 基于ALBERT的中国诗酒文化命名实体识别[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 1065-1072. |
[2] | . CenterLineFormer:基于单车载相机的车道中心线生成方法[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 1009-1017. |
[3] | . 基于多特征提取方法的多场景烟雾检测[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 866-879. |
[4] | . 迁移学习和注意机制融合用于CT图像COVID-19病灶分割的计算机辅助诊断[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(3): 566-581. |
[5] | 王可, 刘奕阳, 杨杰, 鲁爱国, 李哲, 徐明亮. 基于自适应特征增强和融合的舰载机着舰拉制状态识别[J]. 上海交通大学学报, 2025, 59(2): 274-282. |
[6] | 丁黎辉1, 2, 付立军1, 3, 杨光4, 5, 6, 万林4, 5, 常志军7. 基于视频的婴儿癫痫性痉挛综合征检测:建模、检测与评估[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(1): 1-9. |
[7] | 徐旺旺1,2,许良凤1,2,刘宁徽3,律娜3. 基于多注意力卷积神经网络的乳腺癌组织学图像诊断[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(1): 91-106. |
[8] | 李楚晨, 唐善军, 赵冰青. 一种基于无人机探测图像区块信息的弱小目标检测算法[J]. 空天防御, 2025, 8(1): 41-47. |
[9] | 李利娟, 刘海, 刘红良, 张青松, 陈永东. 融合外部注意力机制的序列到点非侵入式负荷分解[J]. 上海交通大学学报, 2024, 58(6): 846-854. |
[10] | 周成, 蒋祖华. 融入优质主题和注意力机制的设计规范命名实体识别方法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(6): 1169-1180. |
[11] | 彭诗玮1, 张希1, 朱旺旺1, 窦瑞2. 融合乘客感受量化指标的智能汽车舒适性研究[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(6): 1063-1070. |
[12] | 李翠明, 王华, 徐龙儿, 王龙. 基于改进DeepLabv3+的光伏电站道路识别方法[J]. 上海交通大学学报, 2024, 58(5): 776-782. |
[13] | 鄢丛强1,2, 郭正玉3,4, 蔡云泽 1,2. 基于改进CycleGAN的SAR图像舰船尾迹数据增强[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 702-711. |
[14] | 陈昊蓝, 靳冰莹, 刘亚东, 钱庆林, 王鹏, 陈艳霞, 于希娟, 严英杰. 基于门控循环注意力网络的配电网故障识别方法[J]. 上海交通大学学报, 2024, 58(3): 295-303. |
[15] | 顾星海,花 豹,刘亚辉,孙学民,鲍劲松. 面向装配工艺文档的装配语义实体识别与关系构建方法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 537-556. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||