J Shanghai Jiaotong Univ Sci ›› 2025, Vol. 30 ›› Issue (6): 1103-1113.doi: 10.1007/s12204-023-2658-z
收稿日期:2022-10-28
接受日期:2023-02-10
出版日期:2025-11-21
发布日期:2025-11-26
TAHIR Rizwana,b, 蔡云泽a,b,c
Received:2022-10-28
Accepted:2023-02-10
Online:2025-11-21
Published:2025-11-26
摘要: 多媒体和计算机视觉最新研究主要集中于利用图像分析人类行为和活动。骨架估计,又称姿态估计,受到广泛关注。对于人体姿态估计,深度学习方法主要强调关键点特征。相反,在遮挡或不完整姿势情况下,关键点特征不够丰富,尤其是当一个画面里有很多人的时候。除了关键点特征外,其他特征,如身体边界和可见性条件,也有助于姿态估计。利用掩码区域卷积神经网络(Mask-RCNN),模型框架集成了多个特征,即可以作为关键点位置估计约束的人体掩模特征,人体关键点特征,和关键点可见性。在整个结构中共享多个特征以形成一个连续的多特征学习设置,而在Mask-RCNN中,唯一可以通过系统共享的特征是区域感兴趣特征。共享权重过程的双向放大产生了掩码,另外解决了使用Mask-RCNN时分割不当、小入侵和对象丢失的问题,例如分割。准确率由正确关键点的百分比来表示,还有模型可以识别出86.1%正确关键点。
中图分类号:
. 基于深度学习序列方法的多人姿态估计用来检测人体与关键点位置[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(6): 1103-1113.
TAHIR Rizwana, CAI Yunze. Multi-Human Pose Estimation by Deep Learning-Based Sequential Approach for Human Keypoint Position and Human Body Detection[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(6): 1103-1113.
| [1] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//26th Annual Conference on Advance in Neural Information Process System. Lake Tahoe: Curran Assosiates, Inc., 2012: 1-9. [2] SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation [C]//IEEE Transactions on Pattern Analysis and Machine Intelligence. Boston: IEEE, 2016: 640-651. [3] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]//28th Annual Conference on Advances in Neural Information Processing Systems. Quebec: MIT Press, 2015: 91-99. [4] TOSHEV A, SZEGEDY C. DeepPose: Human pose estimation via deep neural networks [C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1653-1660. [5] KAMEL A, SHENG B, LI P, et al. Hybrid refinement-correction heatmaps for human pose estimation [J]. IEEE Transactions on Multimedia, 2021, 23: 1330-1342. [6] CAO Z, HIDALGO G, SIMON T, et al. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1): 172-186. [7] ARTACHO B, SAVAKIS A. BAPose: Bottom-up pose estimation with disentangled waterfall representations [C]//2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops. Waikoloa: IEEE, 2023: 528-537. [8] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: ACM, 2014: 580-587. [9] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944. [10] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN [C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980-2988. [11] LI J E, WANG Z X, QI B, et al. MEMe: A mutually enhanced modeling method for efficient and effective human pose estimation [J]. Sensors, 2022, 22(2): 632. [12] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [DB/OL]. (2014-09-04). https://arxiv.org/abs/1409.1556 [13] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778. [14] NEWELL A, YANG K Y, DENG J. Stacked hourglass networks for human pose estimation[M]//European conference on computer vision. Cham: Springer, 2016: 483-499. [15] HUA G G, LI L H, LIU S G. Multipath affinage stacked—Hourglass networks for human pose estimation [J]. Frontiers of Computer Science, 2020, 14(4): 144701. [16] CHEN Y L, WANG Z C, PENG Y X, et al. Cascaded pyramid network for multi-person pose estimation [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7103-7112. [17] SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5686-5696. [18] MAO W A, GE Y T, SHEN C H, et al. Poseur: direct human pose regression with transformers[M]//European conference on computer vision. Cham: Springer, 2022: 72-88. [19] LUVIZON D C, TABIA H, PICARD D. Human pose regression by combining indirect part detection and contextual information [J]. Computers & Graphics, 2019, 85: 15-22. [20] LIU H, LIU W, CHI Z, et al. Fast human pose estimation in compressed videos [J]. IEEE Transactions on Multimedia, 2022, 25: 1390-1400. [21] XIAO B, WU H P, WEI Y C. Simple baselines for human pose estimation and tracking[M]//European conference on computer vision. Cham: Springer, 2018: 472-487. [22] XIAO J, LI H, QU G, et al. Hope: Heatmap and offset for pose estimation[J]. Journal of Ambient Intelligence and Humanized Computing, 2022, 13: 2937-2949. [23] GKIOXARI G, HARIHARAN B, GIRSHICK R, et al. Using k-poselets for detecting people and localizing their keypoints [C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 3582-3589. [24] PISHCHULIN L, ANDRILUKA M, GEHLER P, et al. Poselet conditioned pictorial structures [C]//2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 588-595. [25] PISHCHULIN L, JAIN A, ANDRILUKA M, et al. Articulated people detection and pose estimation: Reshaping the future [C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 3178-3185. [26] REN Z H, FANG F Z, YAN N, et al. State of the art in defect detection based on machine vision [J]. International Journal of Precision Engineering and Manufacturing-Green Technology, 2022, 9(2): 661-691. [27] FELZENSZWALB P F, HUTTENLOCHER D P. Pictorial structures for object recognition [J]. International Journal of Computer Vision, 2005, 61: 55-79. [28] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks [C]//28th Annual Conference on Advances in Neural Information Processing Systems. Quebec: MIT Press, 2015: 1-8. [29] PAPANDREOU G, ZHU T, KANAZAWA N, et al. Towards accurate multi-person pose estimation in the wild [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3711-3719. [30] PISHCHULIN L, INSAFUTDINOV E, TANG S Y, et al. DeepCut: joint subset partition and labeling for multi person pose estimation [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 4929-4937. [31] INSAFUTDINOV E, PISHCHULIN L, ANDRES B, et al. DeeperCut: A deeper, stronger, and faster multi-person pose estimation model[M]//European conference on computer vision. Cham: Springer, 2016: 34-50. [32] INSAFUTDINOV E, ANDRILUKA M, PISHCHULIN L, et al. ArtTrack: articulated multi-person tracking in the wild [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1293-1301. [33] LI Z Q, BAO J S, LIU T Y, et al. Judging the normativity of PAF based on TFN and NAN [J]. Journal of Shanghai Jiao Tong University (Science), 2020, 25(5): 569-577. [34] ZHU X, JIANG Y, LUO Z. Multi-person pose estimation for posetrack with enhanced part affinity fields [C]//2017 International Conference on Computer Vision Pose Track Workshop. Venice: IEEE, 2017: 7-11. [35] NEWELL A, HUANG Z, DENG J. Associative embedding: End-to-end learning for joint detection and grouping[C]//Advances in Neural Information Processing Systems. Long Beach: MIT Press, 2017: 2277-2287. [36] KOCABAS M, KARAGOZ S, AKBAS E. MultiPoseNet: fast multi-person pose estimation using pose residual network[M]//European conference on computer vision. Cham: Springer, 2018: 437-453. [37] PAPANDREOU G, ZHU T, CHEN L C, et al. PersonLab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model[M]//European conference on computer vision. Cham: Springer, 2018: 282-299. [38] LIN J J, LEE G H. Learning spatial context with graph neural network for multi-person pose grouping[C]//2021 IEEE International Conference on Robotics and Automation. Xi’an: IEEE, 2021: 4230-4236. [39] HARA K, KATAOKA H, SATOH Y. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?[C]//IEEE conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6546-6555. [40] PETERSEN P, VOIGTLAENDER F. Optimal approximation of piecewise smooth functions using deep ReLU neural networks [J]. Neural Networks, 2018, 108: 296-330. [41] ZHONG Y, WANG J, PENG J, et al. Anchor box optimization for object detection[C]//IEEE/CVF Winter Conference on Applications of Computer Vision. Colorado: IEEE, 2020: 1286-1294. [42] CHEN D, ZHANG S S, OUYANG W L, et al. Person search via a mask-guided two-stream CNN model[M]//European conference on computer vision. Cham: Springer, 2018: 764-781. [43] RIZWAN T, CAI Y Z, AHSAN M, et al. Neural network approach for 2-dimension person pose estimation with encoded mask and keypoint detection [J]. IEEE Access, 2020, 8: 107760-107771. [44] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[M]//European conference on computer vision. Cham: Springer, 2014: 740-755. [45] GU Y L, ZHANG H Y, KAMIJO S. Multi-person pose estimation using an orientation and occlusion aware deep learning network [J]. Sensors, 2020, 20(6): 1593. [46] WEI S H, RAMAKRISHNA V, KANADE T, et al. Convolutional pose machines [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 4724-4732. [47] CHEN K, GABRIEL P, ALASFOUR A, et al. Patient-specific pose estimation in clinical environments [J]. IEEE Journal of Translational Engineering in Health and Medicine, 2018, 6: 1-11. [48] ZHANG R, ZHU Z, LI P, et al. Exploiting offset-guided network for pose estimation and tracking[C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 20-28. |
| [1] | . 基于三维卷积特征金字塔网络的高光谱卫星图像分类[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(6): 1073-1084. |
| [2] | . 基于ALBERT的中国诗酒文化命名实体识别[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 1065-1072. |
| [3] | . 面向太阳能电池复杂缺陷检测的新型多步深度学习方法[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 1050-1064. |
| [4] | 夏伊琳, 刘刚, 鄢丛强, 蔡云泽. 基于深度学习的SAR图像舰船尾迹旋转框检测算法研究[J]. 空天防御, 2025, 8(5): 64-74. |
| [5] | 许强, 马跃华, 许可, 潘俊. 雷达目标智能识别方法研究综述[J]. 空天防御, 2025, 8(5): 1-9. |
| [6] | . 基于CEEMDAN 和 GRU的停车位预测[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 962-975. |
| [7] | . 基于多注意力机制的轻量化人体姿态估计[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 899-910. |
| [8] | . 基于语义动态超图卷积的三维手姿态估计[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 855-865. |
| [9] | 赵紫昱, 王绪泉, 马杰, 邢裕杰, 顿雄, 王占山, 程鑫彬. 轻薄红外计算成像重建算法的边缘芯片部署方法研究[J]. 空天防御, 2025, 8(4): 85-93. |
| [10] | 梁煜婉, 肖朝昀, 李明广, 孟江山, 周建烽, 黄山景, 朱浩杰. 基于长短时记忆的真空预压地基沉降预测[J]. 上海交通大学学报, 2025, 59(4): 525-532. |
| [11] | . 基于RGB-D图像的机器人抓取检测高效全卷积网络和优化方法[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 399-416. |
| [12] | . 基于双流自编码器的无监督动作识别[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 330-336. |
| [13] | 詹泽辉, 钟铭恩, 袁彬淦, 谭佳威, 范康. 随机平视摄像条件下的路边车辆违停检测[J]. 上海交通大学学报, 2025, 59(10): 1568-1580. |
| [14] | 孙佳哲, 邹鹰. 基于深度学习的码头电子围栏识别应用[J]. 海洋工程装备与技术, 2025, 12(1): 87-93. |
| [15] | Sahaya Anselin Nisha1, NARMADHA R.1, AMIRTHALAKSHMI T. M.2, BALAMURUGAN V.1, VEDANARAYANAN V.1. LOBO优化的深度卷积神经网络用于脑肿瘤分类[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(1): 107-114. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||