J Shanghai Jiaotong Univ Sci ›› 2025, Vol. 30 ›› Issue (5): 855-865.doi: 10.1007/s12204-024-2697-0

• • 上一篇    下一篇

基于语义动态超图卷积的三维手姿态估计

  

  1. 北京工业大学 多媒体与智能软件技术北京市重点实验室;信息学部,北京100124
  • 收稿日期:2023-08-08 接受日期:2023-08-29 出版日期:2025-09-26 发布日期:2024-01-16

3D Hand Pose Estimation Using Semantic Dynamic Hypergraph Convolutional Networks

吴亚磊,李敬华,孔德慧,李倩星,尹宝才   

  1. Beijing Key Laboratory of Multimedia and Intelligent Software Technology; Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
  • Received:2023-08-08 Accepted:2023-08-29 Online:2025-09-26 Published:2024-01-16

摘要: 由于手的自遮挡和高自由度变化,基于单张RGB图像进行3D手姿态估计是一个极具挑战性的问题。图卷积网络利用图描述手关节之间的结构关系,在一定程度上可以提高3D手姿态回归的准确性,然而,图卷积神经网络不能有效描述非相邻手部关节点间的关系。近来,广受关注的超图卷积网络能够通过超边描述节点之间的多元高维关系。因此,本文提出了一种基于超图卷积网络的手三维姿态估计框架,能够更好提取相邻和非相邻手关节之间的关联关系。为了克服预定义超图结构的缺点,提出了一种动态超图卷积网络(DHGCN),其中超边是基于手部关节特征相似性动态构建的。为了更好地探索节点之间的局部语义关系,提出了一种语义动态超图卷积(SDHGCN)。该方法在公开的基准数据集上进行了评估。本文在两个公开的基准数据集STB、RHD上评估了所提出的方法。定性定量的实验结果均表明,相较于图卷积网络,超图卷积网络更适用于手部姿态估计任务,与现有方法的对比实验表明本文所提出的网络框架达到了主流水平。

关键词: 手姿态估计, 超图卷积, 动态超图卷积, 语义动态超图卷积

Abstract: Due to self-occlusion and high degree of freedom, estimating 3D hand pose from a single RGB image is a great challenging problem. Graph convolutional networks (GCNs) use graphs to describe the physical connection relationships between hand joints and improve the accuracy of 3D hand pose regression. However, GCNs cannot effectively describe the relationships between non-adjacent hand joints. Recently, hypergraph convolutional networks (HGCNs) have received much attention as they can describe multi-dimensional relationships between nodes through hyperedges; therefore, this paper proposes a framework for 3D hand pose estimation based on HGCN, which can better extract correlated relationships between adjacent and non-adjacent hand joints. To overcome the shortcomings of predefined hypergraph structures, a kind of dynamic hypergraph convolutional network is proposed, in which hyperedges are constructed dynamically based on hand joint feature similarity. To better explore the local semantic relationships between nodes, a kind of semantic dynamic hypergraph convolution is proposed. The proposed method is evaluated on publicly available benchmark datasets. Qualitative and quantitative experimental results both show that the proposed HGCN and improved methods for 3D hand pose estimation are better than GCN, and achieve state-of-the-art performance compared with existing methods.

中图分类号: