J Shanghai Jiaotong Univ Sci ›› 2025, Vol. 30 ›› Issue (6): 1103-1113.doi: 10.1007/s12204-023-2658-z
• Automation & Computer Technologies • Previous Articles Next Articles
TAHIR Rizwana,b, 蔡云泽a,b,c
Received:2022-10-28
Accepted:2023-02-10
Online:2025-11-21
Published:2025-11-26
CLC Number:
TAHIR Rizwana, CAI Yunze. Multi-Human Pose Estimation by Deep Learning-Based Sequential Approach for Human Keypoint Position and Human Body Detection[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(6): 1103-1113.
| [1] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//26th Annual Conference on Advance in Neural Information Process System. Lake Tahoe: Curran Assosiates, Inc., 2012: 1-9. [2] SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation [C]//IEEE Transactions on Pattern Analysis and Machine Intelligence. Boston: IEEE, 2016: 640-651. [3] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]//28th Annual Conference on Advances in Neural Information Processing Systems. Quebec: MIT Press, 2015: 91-99. [4] TOSHEV A, SZEGEDY C. DeepPose: Human pose estimation via deep neural networks [C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1653-1660. [5] KAMEL A, SHENG B, LI P, et al. Hybrid refinement-correction heatmaps for human pose estimation [J]. IEEE Transactions on Multimedia, 2021, 23: 1330-1342. [6] CAO Z, HIDALGO G, SIMON T, et al. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1): 172-186. [7] ARTACHO B, SAVAKIS A. BAPose: Bottom-up pose estimation with disentangled waterfall representations [C]//2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops. Waikoloa: IEEE, 2023: 528-537. [8] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: ACM, 2014: 580-587. [9] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944. [10] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN [C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980-2988. [11] LI J E, WANG Z X, QI B, et al. MEMe: A mutually enhanced modeling method for efficient and effective human pose estimation [J]. Sensors, 2022, 22(2): 632. [12] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [DB/OL]. (2014-09-04). https://arxiv.org/abs/1409.1556 [13] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778. [14] NEWELL A, YANG K Y, DENG J. Stacked hourglass networks for human pose estimation[M]//European conference on computer vision. Cham: Springer, 2016: 483-499. [15] HUA G G, LI L H, LIU S G. Multipath affinage stacked—Hourglass networks for human pose estimation [J]. Frontiers of Computer Science, 2020, 14(4): 144701. [16] CHEN Y L, WANG Z C, PENG Y X, et al. Cascaded pyramid network for multi-person pose estimation [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7103-7112. [17] SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5686-5696. [18] MAO W A, GE Y T, SHEN C H, et al. Poseur: direct human pose regression with transformers[M]//European conference on computer vision. Cham: Springer, 2022: 72-88. [19] LUVIZON D C, TABIA H, PICARD D. Human pose regression by combining indirect part detection and contextual information [J]. Computers & Graphics, 2019, 85: 15-22. [20] LIU H, LIU W, CHI Z, et al. Fast human pose estimation in compressed videos [J]. IEEE Transactions on Multimedia, 2022, 25: 1390-1400. [21] XIAO B, WU H P, WEI Y C. Simple baselines for human pose estimation and tracking[M]//European conference on computer vision. Cham: Springer, 2018: 472-487. [22] XIAO J, LI H, QU G, et al. Hope: Heatmap and offset for pose estimation[J]. Journal of Ambient Intelligence and Humanized Computing, 2022, 13: 2937-2949. [23] GKIOXARI G, HARIHARAN B, GIRSHICK R, et al. Using k-poselets for detecting people and localizing their keypoints [C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 3582-3589. [24] PISHCHULIN L, ANDRILUKA M, GEHLER P, et al. Poselet conditioned pictorial structures [C]//2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 588-595. [25] PISHCHULIN L, JAIN A, ANDRILUKA M, et al. Articulated people detection and pose estimation: Reshaping the future [C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 3178-3185. [26] REN Z H, FANG F Z, YAN N, et al. State of the art in defect detection based on machine vision [J]. International Journal of Precision Engineering and Manufacturing-Green Technology, 2022, 9(2): 661-691. [27] FELZENSZWALB P F, HUTTENLOCHER D P. Pictorial structures for object recognition [J]. International Journal of Computer Vision, 2005, 61: 55-79. [28] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks [C]//28th Annual Conference on Advances in Neural Information Processing Systems. Quebec: MIT Press, 2015: 1-8. [29] PAPANDREOU G, ZHU T, KANAZAWA N, et al. Towards accurate multi-person pose estimation in the wild [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3711-3719. [30] PISHCHULIN L, INSAFUTDINOV E, TANG S Y, et al. DeepCut: joint subset partition and labeling for multi person pose estimation [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 4929-4937. [31] INSAFUTDINOV E, PISHCHULIN L, ANDRES B, et al. DeeperCut: A deeper, stronger, and faster multi-person pose estimation model[M]//European conference on computer vision. Cham: Springer, 2016: 34-50. [32] INSAFUTDINOV E, ANDRILUKA M, PISHCHULIN L, et al. ArtTrack: articulated multi-person tracking in the wild [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1293-1301. [33] LI Z Q, BAO J S, LIU T Y, et al. Judging the normativity of PAF based on TFN and NAN [J]. Journal of Shanghai Jiao Tong University (Science), 2020, 25(5): 569-577. [34] ZHU X, JIANG Y, LUO Z. Multi-person pose estimation for posetrack with enhanced part affinity fields [C]//2017 International Conference on Computer Vision Pose Track Workshop. Venice: IEEE, 2017: 7-11. [35] NEWELL A, HUANG Z, DENG J. Associative embedding: End-to-end learning for joint detection and grouping[C]//Advances in Neural Information Processing Systems. Long Beach: MIT Press, 2017: 2277-2287. [36] KOCABAS M, KARAGOZ S, AKBAS E. MultiPoseNet: fast multi-person pose estimation using pose residual network[M]//European conference on computer vision. Cham: Springer, 2018: 437-453. [37] PAPANDREOU G, ZHU T, CHEN L C, et al. PersonLab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model[M]//European conference on computer vision. Cham: Springer, 2018: 282-299. [38] LIN J J, LEE G H. Learning spatial context with graph neural network for multi-person pose grouping[C]//2021 IEEE International Conference on Robotics and Automation. Xi’an: IEEE, 2021: 4230-4236. [39] HARA K, KATAOKA H, SATOH Y. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?[C]//IEEE conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6546-6555. [40] PETERSEN P, VOIGTLAENDER F. Optimal approximation of piecewise smooth functions using deep ReLU neural networks [J]. Neural Networks, 2018, 108: 296-330. [41] ZHONG Y, WANG J, PENG J, et al. Anchor box optimization for object detection[C]//IEEE/CVF Winter Conference on Applications of Computer Vision. Colorado: IEEE, 2020: 1286-1294. [42] CHEN D, ZHANG S S, OUYANG W L, et al. Person search via a mask-guided two-stream CNN model[M]//European conference on computer vision. Cham: Springer, 2018: 764-781. [43] RIZWAN T, CAI Y Z, AHSAN M, et al. Neural network approach for 2-dimension person pose estimation with encoded mask and keypoint detection [J]. IEEE Access, 2020, 8: 107760-107771. [44] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[M]//European conference on computer vision. Cham: Springer, 2014: 740-755. [45] GU Y L, ZHANG H Y, KAMIJO S. Multi-person pose estimation using an orientation and occlusion aware deep learning network [J]. Sensors, 2020, 20(6): 1593. [46] WEI S H, RAMAKRISHNA V, KANADE T, et al. Convolutional pose machines [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 4724-4732. [47] CHEN K, GABRIEL P, ALASFOUR A, et al. Patient-specific pose estimation in clinical environments [J]. IEEE Journal of Translational Engineering in Health and Medicine, 2018, 6: 1-11. [48] ZHANG R, ZHU Z, LI P, et al. Exploiting offset-guided network for pose estimation and tracking[C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 20-28. |
| [1] | YE Jihua, JIANG Lu, XIAO Shunjie, ZONG Yi, JIANG Aiwen. Multi-Label Image Classification Model Based on Multiscale Fusion and Adaptive Label Correlation [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 889-898. |
| [2] | LIN Xiao, LU Meichen, GAO Mufeng, LI Yan. Lightweight Human Pose Estimation Based on Multi-Attention Mechanism [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 899-910. |
| [3] | DING Leqi, WANG Biyun, YAO Lixiu, CAI Yunze. MAGPNet: Multi-Domain Attention-Guided Pyramid Network for Infrared Small Object Detection [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 935-951. |
| [4] | JIANG Wenbo, ZHENG Hangbin, BAO Jinsong. Novel Multi-Step Deep Learning Approach for Detection of Complex Defects in Solar Cells [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 1050-1064. |
| [5] | LIU Mengge, LIU Hao, HE Xin, JIN Shaohui, CHEN Pengyun, XU Mingliang. Research Advances on Non-Line-of-Sight Imaging Technology [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 833-854. |
| [6] | Fu Zeyu, Fu Zhuang, Guan Yisheng. Vascular Interventional Surgery Path Planning and 3D Visual Navigation [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(3): 472-481. |
| [7] | Wang Baomin, Ding Hewei, Teng Fei, Liu Hongqin. Damage Detection of X-ray Image of Conveyor Belts with Steel Rope Cores Based on Improved FCOS Algorithm [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 309-318. |
| [8] | Wang Gang, Guan Yaonan, Li Dewei. Two-Stream Auto-Encoder Network for Unsupervised Skeleton-Based Action Recognition [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 330-336. |
| [9] | Diao Zijian, Cao Shuai, Li Wenwei, Liang Jianan, Wen Guilin, Huang Weixi, Zhang Shouming. Person Re-Identification Based on Spatial Feature Learning and Multi-Granularity Feature Fusion [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 363-374. |
| [10] | ZHOU Su (周苏), ZHONG Zebin∗ (钟泽滨). Real-Time Ranging of Vehicles and Pedestrians for Mobile Application on Smartphones [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(6): 1081-1090. |
| [11] | YAN Congqiang1,2 (鄢丛强), GUO Zhengyun3,4 (郭正玉), CAI Yunze1,2∗∗ (蔡云泽). Data Augmentation of Ship Wakes in SAR Images Based on Improved CycleGAN [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 702-711. |
| [12] | LONARE Savita1,2* , BHRAMARAMBA Ravi2. Federated Approach for Privacy-Preserving Traffic Prediction Using Graph Convolutional Network [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 509-517. |
| [13] | LV Feng(吕峰), WANG Xinyan* (王新彦), LI Lei(李磊), JIANG Quan(江泉), YI Zhengyang(易政洋). Tree Detection Algorithm Based on Embedded YOLO Lightweight Network [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 518-527. |
| [14] | SONG Liboa (宋立博), FEI Yanqiongb (费燕琼). New Lite YOLOv4-Tiny Algorithm and Application on Crack Intelligent Detection [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 528-536. |
| [15] | SHEN Ao1,2‡ (沈傲), HU Jisu 2,3‡ (胡冀苏), JIN Pengfei4 (金鹏飞), ZHOU Zhiyong2 (周志勇), QIAN Xusheng 2,3 (钱旭升), ZHENG Yi2 (郑毅), BAO Jie 4 (包婕), WANG Ximing4∗ (王希明), DAI Yakang1,2∗ (戴亚康). Ensemble Attention Guided Multi-SEANet Trained with Curriculum Learning for Noninvasive Prediction of Gleason Grade Groups from MRI [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(1): 109-119. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||