J Shanghai Jiaotong Univ Sci ›› 2025, Vol. 30 ›› Issue (2): 330-336.doi: 10.1007/s12204-023-2619-6

• Automation & Computer Science • Previous Articles     Next Articles

Two-Stream Auto-Encoder Network for Unsupervised Skeleton-Based Action Recognition

基于双流自编码器的无监督动作识别

王刚,管耀南,李德伟   

  1. Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China
  2. 上海交通大学 自动化系,上海200024
  • Accepted:2022-09-16 Online:2025-03-21 Published:2025-03-21

Abstract: Representation learning from unlabeled skeleton data is a challenging task. Prior unsupervised learning algorithms mainly rely on the modeling ability of recurrent neural networks to extract the action representations. However, the structural information of the skeleton data, which also plays a critical role in action recognition, is rarely explored in existing unsupervised methods. To deal with this limitation, we propose a novel twostream autoencoder network to combine the topological information with temporal information of skeleton data. Specifically, we encode the graph structure by graph convolutional network (GCN) and integrate the extracted GCN-based representations into the gate recurrent unit stream. Then we design a transfer module to merge the representations of the two streams adaptively. According to the characteristics of the two-stream autoencoder, a unified loss function composed of multiple tasks is proposed to update the learnable parameters of our model. Comprehensive experiments on NW-UCLA, UWA3D, and NTU-RGBD 60 datasets demonstrate that our proposed method can achieve an excellent performance among the unsupervised skeleton-based methods and even perform a similar or superior performance over numerous supervised skeleton-based methods.

Key words: representation learning, skeleton-based action recognition, unsupervised deep learning

摘要: 针对无标签骨架数据的表征学习是一项具有挑战性的任务。传统的无监督学习算法主要依赖循环神经网络的建模能力来提取骨架数据的表征。骨架数据的结构信息对动作识别中也有着关键性作用,但目前的无监督学习算法很少关注此信息。为了解决此问题,我们提出了一种新的双流自动编码器网络,该网络可以挖掘骨架数据的拓扑结构信息和时序信息。具体而言,我们通过图卷积网络(GCN)流对图结构进行编码,并将提取的基于GCN的表征输入到门控循环单元流中。然后,我们设计了一个转化模块来自适应地融合双流的表征。根据双流自动编码器的特点,我们提出了一个由多任务组成的统一损失函数来更新模型的参数。在NW-UCLA、UWA3D和NTU-RGBD 60数据集上的实验表明,本文提出的算法在基于骨架数据的无监督算法中具有优异的效果,甚至取得了和许多有监督算法相当或更好的表现。

关键词: 表征学习,基于骨架数据的动作识别,无监督深度学习

CLC Number: