Computing & Computer Technologies

3D Hand Pose Estimation Using Semantic Dynamic Hypergraph Convolutional Networks

Expand
  • Beijing Key Laboratory of Multimedia and Intelligent Software Technology; Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China

Received date: 2023-08-08

  Accepted date: 2023-08-29

  Online published: 2024-01-16

Abstract

Due to self-occlusion and high degree of freedom, estimating 3D hand pose from a single RGB image is a great challenging problem. Graph convolutional networks (GCNs) use graphs to describe the physical connection relationships between hand joints and improve the accuracy of 3D hand pose regression. However, GCNs cannot effectively describe the relationships between non-adjacent hand joints. Recently, hypergraph convolutional networks (HGCNs) have received much attention as they can describe multi-dimensional relationships between nodes through hyperedges; therefore, this paper proposes a framework for 3D hand pose estimation based on HGCN, which can better extract correlated relationships between adjacent and non-adjacent hand joints. To overcome the shortcomings of predefined hypergraph structures, a kind of dynamic hypergraph convolutional network is proposed, in which hyperedges are constructed dynamically based on hand joint feature similarity. To better explore the local semantic relationships between nodes, a kind of semantic dynamic hypergraph convolution is proposed. The proposed method is evaluated on publicly available benchmark datasets. Qualitative and quantitative experimental results both show that the proposed HGCN and improved methods for 3D hand pose estimation are better than GCN, and achieve state-of-the-art performance compared with existing methods.

Cite this article

WU Yalei, LI Jinghua, KONG Dehui, LI Qianxing, YIN Baocai . 3D Hand Pose Estimation Using Semantic Dynamic Hypergraph Convolutional Networks[J]. Journal of Shanghai Jiaotong University(Science), 2025 , 30(5) : 855 -865 . DOI: 10.1007/s12204-024-2697-0

References

[1]   DOOSTI B, NAHA S, MIRBAGHERI M, et al. Hope-net: A graph-based model for hand-object pose estimation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 6608-6617.

[2]   GE L H, REN Z, LI Y C, et al. 3D hand shape and pose estimation from a single RGB image[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019:10833-10842.

[3]   GUO S X, RIGALL E, QI L, et al. Graph-based CNNs with self-supervised module for 3d hand pose estimation from monocular RGB[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(4): 1514-1525.

[4]   CHEN L J, LIN S Y, XIE Y S, et al. Temporal-aware self-supervised learning for 3d hand pose and mesh estimation in videos[C]// 2021 IEEE Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2021: 1050-1059.

[5]   XIONG F, ZHANG B S, XIAO Y, et al. A2J: Anchor-to-joint regression network for 3d articulated pose estimation from a single depth image[C]// 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 793-802.

[6]   YUAN S X, GARCIA-HERNANDO G, STENGER B, et al. Depth-based 3d hand pose estimation: from current achievements to future goals[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 2636-2645.

[7]   ZIMMERMANN C, BROX T. Learning to estimate 3d hand pose from single RGB images[C]// 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 4903-4911.

[8]   PANTELERIS P, ARGYROS A. Back to RGB: 3d tracking of hands and hand-object interactions based on short-baseline stereo[C]// 2017 IEEE International Conference on Computer Vision Workshops. Venice: IEEE, 2017: 575-584.

[9]   CAI Y J, GE L H, CAI J F, et al. Weakly-supervised 3d hand pose estimation from monocular RGB images[C]// Proceedings of the European Conference on Computer Vision, Munich: Springer, 2018: 666-682.

[10] GUO S X, RIGALL E, JU Y K, et al. 3D hand pose estimation from monocular RGB with feature interaction module[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(8): 5293-5306.

[11] SIMON T, JOO H, MATTHEWS I, et al. Hand keypoint detection in single images using multiview bootstrapping[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu: IEEE, 2017: 1145-1153.

[12] YU J, TAO D C, WANG M. Adaptive hypergraph learning and its application in image classification[J]. IEEE Transactions on Image Processing, 2012, 21(7): 3262-3272.

[13] JIANG J W, WEI Y X, FENG Y F, et al. Dynamic hypergraph neural networks[C]// Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. Freiburg: IJCAI, 2019: 2635-2641.

[14] CAI Y, GE L, LIU J, et al. Exploiting spatial-temporal relationships for 3D pose estimation via graph convolutional networks[C]// 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019:2272-2281.

[15] CAI Y J, GE L H, CAI J, et al. 3D hand pose estimation using synthetic data and weakly labeled RGB images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(11): 3739-3753.

[16] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.

[17] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[C]// Proceedings of 5th International Conference on Learning Representations. Toulon: ICLR, 2017.

[18] FENG Y F, YOU H X, ZHANG Z Z, et al. Hypergraph neural networks[C]// The Thirty-Third AAAI Conference on Artificial Intelligence. Hilton Hawaiian Village: AAAI Press, 2019, 33(01): 3558-3565.

[19] LIU S, LV P, ZHANG Y, et al. Semi-dynamic hypergraph neural network for 3d pose estimation[C]// Proceedings of the twenty-ninth International Joint Conference on Artificial Intelligence, Yokohama, Japan: Local Organizing Committee, 2020: 782-788.

[20] XU X X, ZOU Q, LIN X. Adaptive hypergraph neural network for multi-person pose estimation[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Vancouver: AAAI Press, 2022, 36(3): 2955-2963.

[21] ZHAO L, PENG X, TIAN Y, et al. Semantic graph convolutional networks for 3d human pose regression[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 3425-3435.

[22] ZHANG J W, JIAO J B, CHEN M L, et al. A hand pose tracking benchmark from stereo matching[C]// 2017 IEEE International Conference on Image Processing. Beijing: IEEE, 2017: 982-986.

[23] ZIMMERMANN C, CEYLAN D, YANG J, et al. Freihand: a dataset for markerless capture of hand pose and shape from single RGB images[C]// 2019 IEEE/CVF International Conference on Computer Vision. SeoulIEEE, 2019:813-822.

[24] GE L H, CAI Y J, WENG J W, et al. Hand Pointnet: 3D hand pose estimation using point sets[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE 2018: 8417-8426.

[25] YANG L X, LI J S, XU W Q, et al. Bihand: recovering hand mesh with multi-stage bisected hourglass networks[C]// Proceedings of the British Machine Vision Conference. Virtual: British Machine Vision Association, 2020.

[26] SPURR A, SONG J, PARK S, et al. Cross-modal deep variational hand pose estimation[C]// 018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 89-98.

[27] YANG L L, LI S L, LEE D, et al. Aligning latent spaces for 3d hand pose estimation[C]// 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 2335-2343.

[28] THEODORIDIS T, CHATZIS T, SOLACHIDIS V, et al. Cross-modal variational alignment of latent spaces[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle: IEEE, 2020: 960-961.

[29] STERGIOULAS A, CHATZIS T, KONSTANTINIDIS D, et al. 3D Hand pose estimation via aligned latent space injection and kinematic losses[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Nashville: IEEE, 2021: 1730-1739.

[30] CUI Y, LI M, GAO Y, et al. Camera distance helps 3D hand pose estimated from a single RGB image[J]. Graphical Models, 2023, 127:101179.

[31] KOURBANE I, GENC Y. A hybrid classification-regression approach for 3D hand pose estimation using graph convolutional networks[J]. Signal Processing Image Communication, 2022:101.

[32] HASSON Y., VAROL G., TZIONAS D, et al. Learning joint reconstruction of hands and manipulated objects[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long BeachIEEE, 2019:11807–11816.


Outlines

/