J Shanghai Jiaotong Univ Sci

Cover and Table of Contents

2025, 30 (5): 0.

Abstract ( 26 )

PDF (42687KB) ( 35 )

Related Articles | Metrics

Research Advances on Non-Line-of-Sight Imaging Technology

LIU Mengge, LIU Hao, HE Xin, JIN Shaohui, CHEN Pengyun, XU Mingliang

2025, 30 (5): 833-854. doi: 10.1007/s12204-023-2686-8

Abstract ( 85 )

PDF (2547KB) ( 71 )

Non-line-of-sight imaging recovers hidden objects around the corner by analyzing the diffuse reflection light on the relay surface that carries hidden scene information. Due to its huge application potential in the fields of autonomous driving, defense, medical imaging, and post-disaster rescue, non-line-of-sight imaging has attracted considerable attention from researchers at home and abroad, especially in recent years. The research on non-line-of-sight imaging primarily focuses on imaging systems, forward models, and reconstruction algorithms. This paper systematically summarizes the existing non-line-of-sight imaging technology in both active and passive scenes, and analyzes the challenges and future directions of non-line-of-sight imaging technology.

References | Related Articles | Metrics

3D Hand Pose Estimation Using Semantic Dynamic Hypergraph Convolutional Networks

WU Yalei, LI Jinghua, KONG Dehui, LI Qianxing, YIN Baocai

2025, 30 (5): 855-865. doi: 10.1007/s12204-024-2697-0

Abstract ( 59 )

PDF (1581KB) ( 52 )

Due to self-occlusion and high degree of freedom, estimating 3D hand pose from a single RGB image is a great challenging problem. Graph convolutional networks (GCNs) use graphs to describe the physical connection relationships between hand joints and improve the accuracy of 3D hand pose regression. However, GCNs cannot effectively describe the relationships between non-adjacent hand joints. Recently, hypergraph convolutional networks (HGCNs) have received much attention as they can describe multi-dimensional relationships between nodes through hyperedges; therefore, this paper proposes a framework for 3D hand pose estimation based on HGCN, which can better extract correlated relationships between adjacent and non-adjacent hand joints. To overcome the shortcomings of predefined hypergraph structures, a kind of dynamic hypergraph convolutional network is proposed, in which hyperedges are constructed dynamically based on hand joint feature similarity. To better explore the local semantic relationships between nodes, a kind of semantic dynamic hypergraph convolution is proposed. The proposed method is evaluated on publicly available benchmark datasets. Qualitative and quantitative experimental results both show that the proposed HGCN and improved methods for 3D hand pose estimation are better than GCN, and achieve state-of-the-art performance compared with existing methods.

References | Related Articles | Metrics

Multi-Scene Smoke Detection Based on Multi-Feature Extraction Method

SHAO Yanli, YING Yong, CHEN Xi, DONG Siyu, WEI Dan

2025, 30 (5): 866-879. doi: 10.1007/s12204-023-2680-1

Abstract ( 56 )

PDF (1129KB) ( 52 )

This study proposes a multi-scene smoke detection algorithm based on a multi-feature extraction method to address the problems of varying smoke shapes in different scenes, difficulty in locating and detecting translucent smoke, and variable smoke scales. First, the convolution module of feature extraction in YOLOv5s backbone network is replaced with asymmetric convolution block re-parameterization convolution to improve the detection of different shapes of smoke. Then, coordinate attention mechanism is introduced in the deeper layer of the backbone network to further improve the localization of translucent smoke. Finally, the detection of smoke at different scales is further improved by using the feature pyramid convolution module instead of the standard convolution module of the feature pyramid in the model. The experimental results demonstrate the feasibility and superiority of the proposed model for multi-scene smoke detection.

References | Related Articles | Metrics

Multi-Scale Dynamic Hypergraph Convolutional Network for Traffic Flow Forecasting

DONG Zhaoxian, YU Shuo, SHEN Yanming

2025, 30 (5): 880-888. doi: 10.1007/s12204-023-2682-z

Abstract ( 47 )

PDF (665KB) ( 42 )

This paper focuses on the problem of traffic flow forecasting, with the aim of forecasting future traffic conditions based on historical traffic data. This problem is typically tackled by utilizing spatio-temporal graph neural networks to model the intricate spatio-temporal correlations among traffic data. Although these methods have achieved performance improvements, they often suffer from the following limitations: These methods face challenges in modeling high-order correlations between nodes. These methods overlook the interactions between nodes at different scales. To tackle these issues, in this paper, we propose a novel model named multi-scale dynamic hypergraph convolutional network (MSDHGCN) for traffic flow forecasting. Our MSDHGCN can effectively model the dynamic higher-order relationships between nodes at multiple time scales, thereby enhancing the capability for traffic forecasting. Experiments on two real-world datasets demonstrate the effectiveness of the proposed method.

References | Related Articles | Metrics

Multi-Label Image Classification Model Based on Multiscale Fusion and Adaptive Label Correlation

YE Jihua, JIANG Lu, XIAO Shunjie, ZONG Yi, JIANG Aiwen

2025, 30 (5): 889-898. doi: 10.1007/s12204-023-2688-6

Abstract ( 57 )

PDF (866KB) ( 48 )

At present, research on multi-label image classification mainly focuses on exploring the correlation between labels to improve the classification accuracy of multi-label images. However, in existing methods, label correlation is calculated based on the statistical information of the data. This label correlation is global and depends on the dataset, not suitable for all samples. In the process of extracting image features, the characteristic information of small objects in the image is easily lost, resulting in a low classification accuracy of small objects. To this end, this paper proposes a multi-label image classification model based on multiscale fusion and adaptive label correlation. The main idea is: first, the feature maps of multiple scales are fused to enhance the feature information of small objects. Semantic guidance decomposes the fusion feature map into feature vectors of each category, then adaptively mines the correlation between categories in the image through the self-attention mechanism of graph attention network, and obtains feature vectors containing category-related information for the final classification. The mean average precision of the model on the two public datasets of VOC 2007 and MS COCO 2014 reached 95.6% and 83.6%, respectively, and most of the indicators are better than those of the existing latest methods.

References | Related Articles | Metrics

Lightweight Human Pose Estimation Based on Multi-Attention Mechanism

LIN Xiao, LU Meichen, GAO Mufeng, LI Yan

2025, 30 (5): 899-910. doi: 10.1007/s12204-023-2691-y

Abstract ( 50 )

PDF (917KB) ( 43 )

Human pose estimation has received much attention from the research community because of its wide range of applications. However, current research for pose estimation is usually complex and computationally intensive, especially the feature loss problems in the feature fusion process. To address the above problems, we propose a lightweight human pose estimation network based on multi-attention mechanism (LMANet). In our method, network parameters can be significantly reduced by lightweighting the bottleneck blocks with depth-wise separable convolution on the high-resolution networks. After that, we also introduce a multi-attention mechanism to improve the model prediction accuracy, and the channel attention module is added in the initial stage of the network to enhance the local cross-channel information interaction. More importantly, we inject spatial crossawareness module in the multi-scale feature fusion stage to reduce the spatial information loss during feature extraction. Extensive experiments on COCO2017 dataset and MPII dataset show that LMANet can guarantee a higher prediction accuracy with fewer network parameters and computational effort. Compared with the highresolution network HRNet, the number of parameters and the computational complexity of the network are reduced by 67% and 73%, respectively.

References | Related Articles | Metrics

Generating Adversarial Patterns in Facial Recognition with Visual Camouflage

BAO Qirui, MEI Haiyang, WEI Huilin, L Zheng, WANG Yuxin, YANG Xin

2025, 30 (5): 911-922. doi: 10.1007/s12204-023-2692-x

Abstract ( 41 )

PDF (1670KB) ( 37 )

Deep neural networks, especially face recognition models, have been shown to be vulnerable to adversarial examples. However, existing attack methods for face recognition systems either cannot attack black-box models, are not universal, have cumbersome deployment processes, or lack camouflage and are easily detected by the human eye. In this paper, we propose an adversarial pattern generation method for face recognition and achieve universal black-box attacks by pasting the pattern on the frame of goggles. To achieve visual camouflage, we use a generative adversarial network (GAN). The scale of the generative network of GAN is increased to balance the performance conflict between concealment and adversarial behavior, the perceptual loss function based on VGG19 is used to constrain the color style and enhance GAN’s learning ability, and the fine-grained meta-learning adversarial attack strategy is used to carry out black-box attacks. Sufficient visualization results demonstrate that compared with existing methods, the proposed method can generate samples with camouflage and adversarial characteristics. Meanwhile, extensive quantitative experiments show that the generated samples have a high attack success rate against black-box models.

References | Related Articles | Metrics

Rail Line Detection Algorithm Based on Improved CLRNet

ZHOU Bowei, XING Guanyu, LIU Yanli

2025, 30 (5): 923-934. doi: 10.1007/s12204-023-2683-y

Abstract ( 47 )

PDF (2173KB) ( 38 )

In smart driving for rail transit, a reliable obstacle detection system is an important guarantee for the safety of trains. Therein, the detection of the rail area directly affects the accuracy of the system to identify dangerous targets. Both the rail line and the lane are presented as thin line shapes in the image, but the rail scene is more complex, and the color of the rail line is more difficult to distinguish from the background. By comparison, there are already many deep learning-based lane detection algorithms, but there is a lack of public datasets and targeted deep learning detection algorithms for rail line detection. To address this, this paper constructs a rail image dataset RailwayLine and labels the rail line for the training and testing of models. This dataset contains rich rail images including single-rail, multi-rail, straight rail, curved rail, crossing rails, occlusion, blur, and different lighting conditions. To address the problem of the lack of deep learning-based rail line detection algorithms, we improve the CLRNet algorithm which has an excellent performance in lane detection, and propose the CLRNet-R algorithm for rail line detection. To address the problem of the rail line being thin and occupying fewer pixels in the image, making it difficult to distinguish from complex backgrounds, we introduce an attention mechanism to enhance global feature extraction ability and add a semantic segmentation head to enhance the features of the rail region by the binary probability of rail lines. To address the poor curve recognition performance and unsmooth output lines in the original CLRNet algorithm, we improve the weight allocation for line intersection-over-union calculation in the original framework and propose two loss functions based on local slopes to optimize the model’s local sampling point training constraints, improving the model’s fitting performance on curved rails and obtaining smooth and stable rail line detection results. Through experiments, this paper demonstrates that compared with other mainstream lane detection algorithms, the algorithm proposed in this paper has a better performance for rail line detection.

References | Related Articles | Metrics

MAGPNet: Multi-Domain Attention-Guided Pyramid Network for Infrared Small Object Detection

DING Leqi, WANG Biyun, YAO Lixiu, CAI Yunze

2025, 30 (5): 935-951. doi: 10.1007/s12204-024-2694-3

Abstract ( 48 )

PDF (1860KB) ( 39 )

To overcome the obstacles of poor feature extraction and little prior information on the appearance of infrared dim small targets, we propose a multi-domain attention-guided pyramid network (MAGPNet). Specifically, we design three modules to ensure that salient features of small targets can be acquired and retained in the multi-scale feature maps. To improve the adaptability of the network for targets of different sizes, we design a kernel aggregation attention block with a receptive field attention branch and weight the feature maps under different perceptual fields with attention mechanism. Based on the research on human vision system, we further propose an adaptive local contrast measure module to enhance the local features of infrared small targets. With this parameterized component, we can implement the information aggregation of multi-scale contrast saliency maps. Finally, to fully utilize the information within spatial and channel domains in feature maps of different scales, we propose the mixed spatial-channel attention-guided fusion module to achieve high-quality fusion effects while ensuring that the small target features can be preserved at deep layers. Experiments on public datasets demonstrate that our MAGPNet can achieve a better performance over other state-of-the-art methods in terms of the intersection of union, Precision, Recall, and F-measure. In addition, we conduct detailed ablation studies to verify the effectiveness of each component in our network.

References | Related Articles | Metrics

Hypergraph-Based Asynchronous Event Processing for Moving Object Classification

YU Nannan, WANG Chaoyi, QIAO Yu, WANG Yuxin, ZHENG Chenglin, ZHANG Qiang, YANG Xin

2025, 30 (5): 952-961. doi: 10.1007/s12204-024-2699-y

Abstract ( 38 )

PDF (961KB) ( 36 )

Unlike traditional video cameras, event cameras capture asynchronous event streams in which each event encodes pixel location, triggers’ timestamps, and the polarity of brightness changes. In this paper, we introduce a novel hypergraph-based framework for moving object classification. Specifically, we capture moving objects with an event camera, to perceive and collect asynchronous event streams in a high temporal resolution. Unlike stacked event frames, we encode asynchronous event data into a hypergraph, fully mining the high-order correlation of event data, and designing a mixed convolutional hypergraph neural network for training to achieve a more efficient and accurate motion target recognition. The experimental results show that our method has a good performance in moving object classification (e.g., gait identification).

References | Related Articles | Metrics

Predicting Parking Spaces Using CEEMDAN and GRU

MA Changxi, HUANG Xiaoting, MENG Wei

2025, 30 (5): 962-975. doi: 10.1007/s12204-023-2672-1

Abstract ( 36 )

PDF (1617KB) ( 41 )

Accurate prediction of parking spaces plays a crucial role in maximizing the efficiency of parking resources and optimizing traffic conditions. However, the majority of earlier research has used models based on past parking data or the plethora of variables that influence parking prediction, which not only makes the data more complicated and costs more time to run but can also lead to poor model fits. To solve this problem, a hybrid parking prediction model combining complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and gated recurrent unit (GRU) model is proposed to predict the number of parking spaces. In this model, CEEMDAN has the ability to gradually break down time series fluctuations or trends at various scales, producing a sequence of intrinsic mode functions (IMF) with various characteristic scales. Then, by keeping the majority of the original data’s content, removing superfluous information, and enhancing predicted response time, principal component analysis (PCA) decreases the dimensionality of the IMF series. Subsequently, the high-level abstract characteristics are entered into the GRU network, and the network is built, tested, and predicted based on the deep learning framework Keras. The validity of the presented model is verified by making use of real parking datasets from two three-dimensional parking lots. The test results reveal that the model outperforms the baseline model’s predictive accuracy, i.e., a lower testing error. The real parking time series are most closely modeled by the CEEMDAN-PCA-GRU model. As a result, the method is superior to existing models for parking prediction.

References | Related Articles | Metrics

Fast Attack Algorithm for JPEG Image Encryption with Block Position Shuffle

LI Shanshan, GUO Yali, HUANG Jiaxin, GAO Ruoyun

2025, 30 (5): 976-987. doi: 10.1007/s12204-023-2676-x

Abstract ( 38 )

PDF (2974KB) ( 33 )

For traditional JPEG image encryption, block position shuffling can achieve a better encryption effect and is resistant to non-zero counting attack. However, the numbers of non-zero coefficients in the 8×8 sub-blocks are unchanged using block position shuffle. For this defect, this paper proposes a fast attack algorithm for JPEG image encryption based on inter-block shuffle and non-zero quantization discrete cosine transformation coefficient attack. The algorithm analyzes the position mapping relationship before and after encryption of image blocks by detecting the pixel values of an image by the designed plaintext image. Then the preliminary attack result of the image blocks can be obtained from the inverse mapping relationship. Finally, the final attack result of the algorithm is generated according to the numbers of non-zero coefficients in each 8 × 8 block of the preliminary attack result. Every 8×8 block position is related with its number of non-zero discrete cosine transform coefficients in the designed plaintext. It is verified that the main content of the original image could be obtained without knowledge of the encryption algorithm and keys in a relatively short time.

References | Related Articles | Metrics

Dynamic Cloth Folding Using Curriculum Learning

LI Mingyang, BAO Hujun, HUANG Jin

2025, 30 (5): 988-997. doi: 10.1007/s12204-024-2710-7

Abstract ( 38 )

PDF (959KB) ( 32 )

This paper presents a novel algorithm for training robotic arms to manipulate cloth, by leveraging reinforcement learning and curriculum learning approaches. Traditional cloth manipulation algorithms rely heavily on predefined action primitives and assumptions about cloth dynamics, introducing significant prior knowledge. To circumvent this limitation, we utilize reinforcement learning to train our cloth folding agent. To fully utilize the advantage of reinforcement learning, we propose a semi-sparse reward function incorporating folding accuracy and a curriculum scheme to accelerate training and improve policy stability. We validate the proposed method by implementing it in the StableBaselines3 framework and training the agent using the soft actor critic algorithm in our virtual environment based on physical-based cloth simulator. Our results demonstrate the benefits of the curriculum learning scheme which increases sample efficiency and accelerates training process compared with previous reinforcement learning cloth manipulation method.

References | Related Articles | Metrics

Undecimated Dual-Tree Complex Wavelet Transform and Fuzzy Clustering-Based Sonar Image Denoising Technique

LIU Biao, LIU Guangyu, FENG Wei, WANG Shuai, ZHOU Bao, ZHAO Enming

2025, 30 (5): 998-1008. doi: 10.1007/s12204-023-2662-3

Abstract ( 42 )

PDF (2202KB) ( 34 )

Imaging sonar devices generate sonar images by receiving echoes from objects, which are often accompanied by severe speckle noise, resulting in image distortion and information loss. Common optical denoising methods do not work well in removing speckle noise from sonar images and may even reduce their visual quality. To address this issue, a sonar image denoising method based on fuzzy clustering and the undecimated dual-tree complex wavelet transform is proposed. This method provides a perfect translation invariance and an improved directional selectivity during image decomposition, leading to richer representation of noise and edges in high frequency coefficients. Fuzzy clustering can separate noise from useful information according to the amplitude characteristics of speckle noise, preserving the latter and achieving the goal of noise removal. Additionally, the low frequency coefficients are smoothed using bilateral filtering to improve the visual quality of the image. To verify the effectiveness of the algorithm, multiple groups of ablation experiments were conducted, and speckle sonar images with different variances were evaluated and compared with existing speckle removal methods in the transform domain. The experimental results show that the proposed method can effectively improve image quality, especially in cases of severe noise, where it still achieves a good denoising performance.

References | Related Articles | Metrics

CenterLineFormer: Road Centerlines Graph Generation with Single Onboard Camera

QIN Minghui, LIU Yuanzhi, L Na, TAO Wei, ZHAO Hui

2025, 30 (5): 1009-1017. doi: 10.1007/s12204-024-2696-1

Abstract ( 40 )

PDF (1395KB) ( 34 )

As autonomous driving systems advance rapidly, there is a surge in demand for high-definition (HD) maps that provide accurate and dependable prior information on static environments around vehicles. As one of the main high-level elements in HD maps, the road lane centerline is essential for downstream tasks such as autonomous navigation and planning. Considering the complex topology and significant overlap concerns of road centerlines, previous studies have rarely examined the centerline HD map mapping problem. Recent learningbased pipelines take heuristic post-processing predictions to generate a structured centerline output without instance information. To ameliorate this situation, we propose a novel, end-to-end road centerlines vectorized graph generation pipeline, termed CenterLineFormer. CenterLineFormer takes a single onboard camera image as input and predicts a directed graph representing the lane-layer map in the bird’s-eye view (BEV). We propose a strategy for better view transformation that uses a cross-attention mechanism to generate a dense BEV feature map. With our pipeline, we can describe the connection relationship between different centerlines and generate structured lane graphs for downstream modules as planning and control. Qualitatively, our experiments emphasize that our pipeline achieves a superior performance against previous baselines on nuScenes dataset. We also show that CenterLineFormer can generate accurate centerline graph topologies on night driving and complex traffic intersection scenes.

References | Related Articles | Metrics

Fault Identification Method for In-Core Self-Powered Neutron Detectors Combining Graph Convolutional Network and Stacking Ensemble Learning

LIN Weiqing, LU Yanzhen, MIAO Xiren, QIU Xinghua

2025, 30 (5): 1018-1027. doi: 10.1007/s12204-023-2684-x

Abstract ( 34 )

PDF (1454KB) ( 32 )

Self-powered neutron detectors (SPNDs) play a critical role in monitoring the safety margins and overall health of reactors, directly affecting safe operation within the reactor. In this work, a novel fault identification method based on graph convolutional networks (GCN) and Stacking ensemble learning is proposed for SPNDs. The GCN is employed to extract the spatial neighborhood information of SPNDs at different positions, and residuals are obtained by nonlinear fitting of SPND signals. In order to completely extract the time-varying features from residual sequences, the Stacking fusion model, integrated with various algorithms, is developed and enables the identification of five conditions for SPNDs: normal, drift, bias, precision degradation, and complete failure. The results demonstrate that the integration of diverse base-learners in the GCN-Stacking model exhibits advantages over a single model as well as enhances the stability and reliability in fault identification. Additionally, the GCN-Stacking model maintains higher accuracy in identifying faults at different reactor power levels.

References | Related Articles | Metrics

CenterRCNN: Two-Stage Anchor-Free Object Detection Using Center Keypoint-Based Region Proposal Network

LIU Chen, LI Wenfa, XU Yunwen, LI Dewei

2025, 30 (5): 1028-1036. doi: 10.1007/s12204-023-2667-y

Abstract ( 34 )

PDF (1904KB) ( 32 )

The classic two-stage object detection algorithms such as faster regions with convolutional neural network features (Faster RCNN) suffer from low speed and anchor hyper-parameter sensitive problems caused by dense anchor mechanism in region proposal network (RPN). Recently, the anchor-free method CenterNet shows the effectiveness of perceiving and classifying object by its center. However, the severe coincidence false positive problem between confusing categories caused by the multiple binary classifiers makes it still insufficient in accuracy. We introduce a two-stage network CenterRCNN to take advantage of both and overcome their shortcomings. CenterRPN is proposed as the first stage to give proposals that incorporate the center keypoint idea into RPN to perceive foreground objects, replacing dense anchor-based RPN. Then the proposals are classified by the multi-classifier of RCNN header that focuses more on the difference between confusing categories and only outputs the maximum probability one of them. To sum up, CenterRPN can eliminate the drawbacks of dense anchor based RPN in Faster RCNN, and multi-classifier’s classification ability is better than that of multiple binary classifiers in CenterNet. The experiment demonstrates that CenterRCNN outperforms both basic algorithms in the accuracy, and the speed is improved as compared with Faster RCNN.

References | Related Articles | Metrics

CSC-YOLO: An Image Recognition Model for Surface Defect Detection of Copper Strip and Plates

ZHANG Guo, CHEN Tao, WANG Jianping

2025, 30 (5): 1037-1049. doi: 10.1007/s12204-024-2723-2

Abstract ( 46 )

PDF (1137KB) ( 34 )

In order to meet the requirements of accurate identification of surface defects on copper strip in industrial production, a detection model of surface defects based on machine vision, CSC-YOLO, is proposed. The model uses YOLOv4-tiny as the benchmark network. First, K-means clustering is introduced into the benchmark network to obtain anchor frames that match the self-built dataset. Second, a cross-region fusion module is introduced in the backbone network to solve the difficult target recognition problem by fusing contextual semantic information. Third, the spatial pyramid pooling-efficient channel attention network (SPP-E) module is introduced in the path aggregation network (PANet) to enhance the extraction of features. Fourth, to prevent the loss of channel information, a lightweight attention mechanism is introduced to improve the performance of the network. Finally, the performance of the model is improved by adding adjustment factors to correct the loss function for the dimensional characteristics of the surface defects. CSC-YOLO was tested on the self-built dataset of surface defects in copper strip, and the experimental results showed that the mAP of the model can reach 93.58%, which is a 3.37% improvement compared with the benchmark network, and FPS, although decreasing compared with the benchmark network, reached 104. CSC-YOLO takes into account the real-time requirements of copper strip production. The comparison experiments with Faster RCNN, SSD300, YOLOv3, YOLOv4, Resnet50-YOLOv4, YOLOv5s, YOLOv7, and other algorithms show that the algorithm obtains a faster computation speed while maintaining a higher detection accuracy.

References | Related Articles | Metrics

Novel Multi-Step Deep Learning Approach for Detection of Complex Defects in Solar Cells

JIANG Wenbo, ZHENG Hangbin, BAO Jinsong

2025, 30 (5): 1050-1064. doi: 10.1007/s12204-023-2670-3

Abstract ( 44 )

PDF (2212KB) ( 38 )

Solar cell defects exhibit significant variations and multiple types, with some defect data being difficult to acquire or having small scales, posing challenges in terms of small sample and small target in defect detection for solar cells. In order to address this issue, this paper proposes a multi-step approach for detecting the complex defects of solar cells. First, individual cell plates are extracted from electroluminescence images for block-by-block detection. Then, StyleGAN2-Ada is utilized for generative adversarial networks data augmentation to expand the number of defect samples in small sample defects. Finally, the fake dataset is combined with real dataset, and the improved YOLOv5 model is trained on this mixed dataset. Experimental results demonstrate that the proposed method achieves a superior performance in detecting the defects with small sample and small target, with the final recall rate reaching 99.7%, an increase of 3.9% compared with the unimproved model. Additionally, the precision and mean average precision are increased by 3.4% and 3.5%, respectively. Moreover, the experiments demonstrate that the improved network training on the mixed dataset can effectively enhance the detection performance of the model. The combination of these approaches significantly improves the network’s ability to detect solar cell defects.

References | Related Articles | Metrics

Named Entity Identification of Chinese Poetry and Wine Culture Based on ALBERT

YANG Zhuang, LI Zhaofei, WANG Jihua, WEI Xudong, ZHANG Yijie

2025, 30 (5): 1065-1072. doi: 10.1007/s12204-023-2675-y

Abstract ( 43 )

PDF (513KB) ( 20 )

The task of identifying Chinese named entities of Chinese poetry and wine culture is a key step in the construction of a knowledge graph and a question and answer system. Aimed at the characteristics of Chinese poetry and wine culture entities with different lengths and high training cost of named entity recognition models at the present stage, this study proposes a lite BERT+bi-directional long short-term memory+ attentional mechanisms +conditional random field (ALBERT+BILSTM+Att+CRF). The method first obtains the characterlevel semantic information by ALBERT module, then extracts its high-dimensional features by BILSTM module, weights the original word vector and the learned text vector by attention layer, and finally predicts the true label in CRF module (including five types: poem title, author, time, genre, and category). Through experiments on data sets related to Chinese poetry and wine culture, the results show that the method is more effective than existing mainstream models and can efficiently extract important entity information in Chinese poetry and wine culture, which is an effective method for the identification of named entities of varying lengths of poetry.

References | Related Articles | Metrics

NewsMore...

Virtual Issue更多...