CAD & CG

Select

Research Advances on Non-Line-of-Sight Imaging Technology

LIU Mengge, LIU Hao, HE Xin, JIN Shaohui, CHEN Pengyun, XU Mingliang

J Shanghai Jiaotong Univ Sci 2025, 30 (5): 833-854. DOI: 10.1007/s12204-023-2686-8

Abstract （60）

PDF（pc）（2547KB）（35）

Save

Non-line-of-sight imaging recovers hidden objects around the corner by analyzing the diffuse reflection light on the relay surface that carries hidden scene information. Due to its huge application potential in the fields of autonomous driving, defense, medical imaging, and post-disaster rescue, non-line-of-sight imaging has attracted considerable attention from researchers at home and abroad, especially in recent years. The research on non-line-of-sight imaging primarily focuses on imaging systems, forward models, and reconstruction algorithms. This paper systematically summarizes the existing non-line-of-sight imaging technology in both active and passive scenes, and analyzes the challenges and future directions of non-line-of-sight imaging technology.

Reference | Related Articles | Metrics | Comments（0）

Select

3D Hand Pose Estimation Using Semantic Dynamic Hypergraph Convolutional Networks

WU Yalei, LI Jinghua, KONG Dehui, LI Qianxing, YIN Baocai

J Shanghai Jiaotong Univ Sci 2025, 30 (5): 855-865. DOI: 10.1007/s12204-024-2697-0

Abstract （47）

PDF（pc）（1581KB）（20）

Save

Due to self-occlusion and high degree of freedom, estimating 3D hand pose from a single RGB image is a great challenging problem. Graph convolutional networks (GCNs) use graphs to describe the physical connection relationships between hand joints and improve the accuracy of 3D hand pose regression. However, GCNs cannot effectively describe the relationships between non-adjacent hand joints. Recently, hypergraph convolutional networks (HGCNs) have received much attention as they can describe multi-dimensional relationships between nodes through hyperedges; therefore, this paper proposes a framework for 3D hand pose estimation based on HGCN, which can better extract correlated relationships between adjacent and non-adjacent hand joints. To overcome the shortcomings of predefined hypergraph structures, a kind of dynamic hypergraph convolutional network is proposed, in which hyperedges are constructed dynamically based on hand joint feature similarity. To better explore the local semantic relationships between nodes, a kind of semantic dynamic hypergraph convolution is proposed. The proposed method is evaluated on publicly available benchmark datasets. Qualitative and quantitative experimental results both show that the proposed HGCN and improved methods for 3D hand pose estimation are better than GCN, and achieve state-of-the-art performance compared with existing methods.

Reference | Related Articles | Metrics | Comments（0）

Select

Multi-Scene Smoke Detection Based on Multi-Feature Extraction Method

SHAO Yanli, YING Yong, CHEN Xi, DONG Siyu, WEI Dan

J Shanghai Jiaotong Univ Sci 2025, 30 (5): 866-879. DOI: 10.1007/s12204-023-2680-1

Abstract （47）

PDF（pc）（1129KB）（30）

Save

This study proposes a multi-scene smoke detection algorithm based on a multi-feature extraction method to address the problems of varying smoke shapes in different scenes, difficulty in locating and detecting translucent smoke, and variable smoke scales. First, the convolution module of feature extraction in YOLOv5s backbone network is replaced with asymmetric convolution block re-parameterization convolution to improve the detection of different shapes of smoke. Then, coordinate attention mechanism is introduced in the deeper layer of the backbone network to further improve the localization of translucent smoke. Finally, the detection of smoke at different scales is further improved by using the feature pyramid convolution module instead of the standard convolution module of the feature pyramid in the model. The experimental results demonstrate the feasibility and superiority of the proposed model for multi-scene smoke detection.

Reference | Related Articles | Metrics | Comments（0）

Select

Multi-Scale Dynamic Hypergraph Convolutional Network for Traffic Flow Forecasting

DONG Zhaoxian, YU Shuo, SHEN Yanming

J Shanghai Jiaotong Univ Sci 2025, 30 (5): 880-888. DOI: 10.1007/s12204-023-2682-z

Abstract （34）

PDF（pc）（665KB）（18）

Save

This paper focuses on the problem of traffic flow forecasting, with the aim of forecasting future traffic conditions based on historical traffic data. This problem is typically tackled by utilizing spatio-temporal graph neural networks to model the intricate spatio-temporal correlations among traffic data. Although these methods have achieved performance improvements, they often suffer from the following limitations: These methods face challenges in modeling high-order correlations between nodes. These methods overlook the interactions between nodes at different scales. To tackle these issues, in this paper, we propose a novel model named multi-scale dynamic hypergraph convolutional network (MSDHGCN) for traffic flow forecasting. Our MSDHGCN can effectively model the dynamic higher-order relationships between nodes at multiple time scales, thereby enhancing the capability for traffic forecasting. Experiments on two real-world datasets demonstrate the effectiveness of the proposed method.

Reference | Related Articles | Metrics | Comments（0）

Select

Multi-Label Image Classification Model Based on Multiscale Fusion and Adaptive Label Correlation

YE Jihua, JIANG Lu, XIAO Shunjie, ZONG Yi, JIANG Aiwen

J Shanghai Jiaotong Univ Sci 2025, 30 (5): 889-898. DOI: 10.1007/s12204-023-2688-6

Abstract （45）

PDF（pc）（866KB）（20）

Save

At present, research on multi-label image classification mainly focuses on exploring the correlation between labels to improve the classification accuracy of multi-label images. However, in existing methods, label correlation is calculated based on the statistical information of the data. This label correlation is global and depends on the dataset, not suitable for all samples. In the process of extracting image features, the characteristic information of small objects in the image is easily lost, resulting in a low classification accuracy of small objects. To this end, this paper proposes a multi-label image classification model based on multiscale fusion and adaptive label correlation. The main idea is: first, the feature maps of multiple scales are fused to enhance the feature information of small objects. Semantic guidance decomposes the fusion feature map into feature vectors of each category, then adaptively mines the correlation between categories in the image through the self-attention mechanism of graph attention network, and obtains feature vectors containing category-related information for the final classification. The mean average precision of the model on the two public datasets of VOC 2007 and MS COCO 2014 reached 95.6% and 83.6%, respectively, and most of the indicators are better than those of the existing latest methods.

Reference | Related Articles | Metrics | Comments（0）

Select

Lightweight Human Pose Estimation Based on Multi-Attention Mechanism

LIN Xiao, LU Meichen, GAO Mufeng, LI Yan

J Shanghai Jiaotong Univ Sci 2025, 30 (5): 899-910. DOI: 10.1007/s12204-023-2691-y

Abstract （36）

PDF（pc）（917KB）（18）

Save

Human pose estimation has received much attention from the research community because of its wide range of applications. However, current research for pose estimation is usually complex and computationally intensive, especially the feature loss problems in the feature fusion process. To address the above problems, we propose a lightweight human pose estimation network based on multi-attention mechanism (LMANet). In our method, network parameters can be significantly reduced by lightweighting the bottleneck blocks with depth-wise separable convolution on the high-resolution networks. After that, we also introduce a multi-attention mechanism to improve the model prediction accuracy, and the channel attention module is added in the initial stage of the network to enhance the local cross-channel information interaction. More importantly, we inject spatial crossawareness module in the multi-scale feature fusion stage to reduce the spatial information loss during feature extraction. Extensive experiments on COCO2017 dataset and MPII dataset show that LMANet can guarantee a higher prediction accuracy with fewer network parameters and computational effort. Compared with the highresolution network HRNet, the number of parameters and the computational complexity of the network are reduced by 67% and 73%, respectively.

Reference | Related Articles | Metrics | Comments（0）

Select

Generating Adversarial Patterns in Facial Recognition with Visual Camouflage

BAO Qirui, MEI Haiyang, WEI Huilin, L Zheng, WANG Yuxin, YANG Xin

J Shanghai Jiaotong Univ Sci 2025, 30 (5): 911-922. DOI: 10.1007/s12204-023-2692-x

Abstract （39）

PDF（pc）（1670KB）（17）

Save

Deep neural networks, especially face recognition models, have been shown to be vulnerable to adversarial examples. However, existing attack methods for face recognition systems either cannot attack black-box models, are not universal, have cumbersome deployment processes, or lack camouflage and are easily detected by the human eye. In this paper, we propose an adversarial pattern generation method for face recognition and achieve universal black-box attacks by pasting the pattern on the frame of goggles. To achieve visual camouflage, we use a generative adversarial network (GAN). The scale of the generative network of GAN is increased to balance the performance conflict between concealment and adversarial behavior, the perceptual loss function based on VGG19 is used to constrain the color style and enhance GAN’s learning ability, and the fine-grained meta-learning adversarial attack strategy is used to carry out black-box attacks. Sufficient visualization results demonstrate that compared with existing methods, the proposed method can generate samples with camouflage and adversarial characteristics. Meanwhile, extensive quantitative experiments show that the generated samples have a high attack success rate against black-box models.

Reference | Related Articles | Metrics | Comments（0）

Select

Rail Line Detection Algorithm Based on Improved CLRNet

ZHOU Bowei, XING Guanyu, LIU Yanli

J Shanghai Jiaotong Univ Sci 2025, 30 (5): 923-934. DOI: 10.1007/s12204-023-2683-y

Abstract （32）

PDF（pc）（2173KB）（16）

Save

In smart driving for rail transit, a reliable obstacle detection system is an important guarantee for the safety of trains. Therein, the detection of the rail area directly affects the accuracy of the system to identify dangerous targets. Both the rail line and the lane are presented as thin line shapes in the image, but the rail scene is more complex, and the color of the rail line is more difficult to distinguish from the background. By comparison, there are already many deep learning-based lane detection algorithms, but there is a lack of public datasets and targeted deep learning detection algorithms for rail line detection. To address this, this paper constructs a rail image dataset RailwayLine and labels the rail line for the training and testing of models. This dataset contains rich rail images including single-rail, multi-rail, straight rail, curved rail, crossing rails, occlusion, blur, and different lighting conditions. To address the problem of the lack of deep learning-based rail line detection algorithms, we improve the CLRNet algorithm which has an excellent performance in lane detection, and propose the CLRNet-R algorithm for rail line detection. To address the problem of the rail line being thin and occupying fewer pixels in the image, making it difficult to distinguish from complex backgrounds, we introduce an attention mechanism to enhance global feature extraction ability and add a semantic segmentation head to enhance the features of the rail region by the binary probability of rail lines. To address the poor curve recognition performance and unsmooth output lines in the original CLRNet algorithm, we improve the weight allocation for line intersection-over-union calculation in the original framework and propose two loss functions based on local slopes to optimize the model’s local sampling point training constraints, improving the model’s fitting performance on curved rails and obtaining smooth and stable rail line detection results. Through experiments, this paper demonstrates that compared with other mainstream lane detection algorithms, the algorithm proposed in this paper has a better performance for rail line detection.

Reference | Related Articles | Metrics | Comments（0）

Select

Hypergraph-Based Asynchronous Event Processing for Moving Object Classification

YU Nannan, WANG Chaoyi, QIAO Yu, WANG Yuxin, ZHENG Chenglin, ZHANG Qiang, YANG Xin

J Shanghai Jiaotong Univ Sci 2025, 30 (5): 952-961. DOI: 10.1007/s12204-024-2699-y

Abstract （33）

PDF（pc）（961KB）（16）

Save

Unlike traditional video cameras, event cameras capture asynchronous event streams in which each event encodes pixel location, triggers’ timestamps, and the polarity of brightness changes. In this paper, we introduce a novel hypergraph-based framework for moving object classification. Specifically, we capture moving objects with an event camera, to perceive and collect asynchronous event streams in a high temporal resolution. Unlike stacked event frames, we encode asynchronous event data into a hypergraph, fully mining the high-order correlation of event data, and designing a mixed convolutional hypergraph neural network for training to achieve a more efficient and accurate motion target recognition. The experimental results show that our method has a good performance in moving object classification (e.g., gait identification).

Reference | Related Articles | Metrics | Comments（0）

Select

Dynamic Cloth Folding Using Curriculum Learning

LI Mingyang, BAO Hujun, HUANG Jin

J Shanghai Jiaotong Univ Sci 2025, 30 (5): 988-997. DOI: 10.1007/s12204-024-2710-7

Abstract （30）

PDF（pc）（959KB）（12）

Save

This paper presents a novel algorithm for training robotic arms to manipulate cloth, by leveraging reinforcement learning and curriculum learning approaches. Traditional cloth manipulation algorithms rely heavily on predefined action primitives and assumptions about cloth dynamics, introducing significant prior knowledge. To circumvent this limitation, we utilize reinforcement learning to train our cloth folding agent. To fully utilize the advantage of reinforcement learning, we propose a semi-sparse reward function incorporating folding accuracy and a curriculum scheme to accelerate training and improve policy stability. We validate the proposed method by implementing it in the StableBaselines3 framework and training the agent using the soft actor critic algorithm in our virtual environment based on physical-based cloth simulator. Our results demonstrate the benefits of the curriculum learning scheme which increases sample efficiency and accelerates training process compared with previous reinforcement learning cloth manipulation method.

Reference | Related Articles | Metrics | Comments（0）

Topics