Loading...

Table of Content

    01 April 2026, Volume 31 Issue 2 Previous Issue   

    Automation & Computer Technologies
    For Selected: Toggle Thumbnails
    Cover and Table of Contents
    2026, 31 (2):  0. 
    Abstract ( 78 )   PDF (48962KB) ( 19 )  
    Related Articles | Metrics
    Automation & Computer Technologies
    Improving ECAPA-TDNN Performance with Coordinate Attention
    Liu Shuanghong, Song Zhida, He Liang
    2026, 31 (2):  241-247.  doi: 10.1007/s12204-024-2726-z
    Abstract ( 96 )   PDF (473KB) ( 33 )  
    The current mainstream networks, such as squeeze and excitation residual neural network (SE-ResNet) and emphasized channel attention, propagation and aggregation based time delay neural network (ECAPATDNN), enhance the capability of speaker embedding extractors to extract more discriminative speaker embeddings by incorporating squeeze and excitation (SE) attention within the convolutional blocks. However, the SE attention focuses solely on encoding inter-channel information, overlooking the importance of spatial positional information and time-frequency information, which are crucial for the model’s performance. In this paper, we first experimentally compare the effectiveness of several mainstream attention mechanisms in the computer vision domain for the ECAPA-TDNN model. Next, we focus on the substantial improvements that coordinate attention (CA) brings to the ECAPA-TDNN model. The introduction of CA can help the model embed time-frequency information into the channel representation. Even without using AS-Norm, our proposed model achieves relative reductions of about 5.3% equal error rate (EER) and 5.5% minimum detection cost function (minDCF) on both the Voxceleb-O and Voxceleb-H test sets compared to the ECAPA-TDNN baseline model. In addition, the EER is relatively reduced by 9.46% on the CN-Celeb1 test set. This result strongly demonstrates that the CA module can effectively improve the generalization ability of the ECAPA-TDNN model.
    References | Related Articles | Metrics
    DSNet: Disentangled Siamese Network with Neutral Calibration for Speech Emotion Recognition
    Chen Chengxin, Zhang Pengyuan
    2026, 31 (2):  248-257.  doi: 10.1007/s12204-024-2724-1
    Abstract ( 49 )   PDF (982KB) ( 26 )  
    One persistent challenge in deep learning based speech emotion recognition (SER) is the unconscious encoding of emotion-irrelevant factors (e.g., speaker or phonetic variability), which limits the generalization of SER in practical use. In this paper, we propose DSNet, a disentangled Siamese network with neutral calibration, to meet the demand for a more robust and explainable SER model. Specifically, we introduce an orthogonal feature disentanglement module to explicitly project the high-level representation into two distinct subspaces. Later, we propose a novel neutral calibration mechanism to encourage one subspace to capture sufficient emotion-irrelevant information. In this way, the other one can better isolate and emphasize the emotion-relevant information within speech signals. Experimental results on two popular benchmark datasets demonstrate the superiority of DSNet over various state-of-the-art methods for speaker-independent SER.
    References | Related Articles | Metrics
    Simultaneous Speech Extraction for Multiple Target Speakers Under Meeting Scenarios
    Zeng Bang, Suo Hongbin, Wan Yulong, Li Ming
    2026, 31 (2):  258-264.  doi: 10.1007/s12204-024-2739-7
    Abstract ( 56 )   PDF (438KB) ( 24 )  
    The common target speech separation directly estimates the target source, ignoring the interrelationship between different speakers at each frame. We propose a multiple-target speech separation (MTSS) model to simultaneously extract each speaker’s voice from the mixed speech rather than just optimally estimating the target source. Moreover, we propose a speaker diarization (SD) aware MTSS system (SD-MTSS). By exploiting the target speaker voice activity detection (TSVAD) and the estimated mask, our SD-MTSS model can extract the speech signal of each speaker concurrently in a conversational recording without additional enrollment audio in advance. Experimental results show that our MTSS model achieves improvements of 1.38 dB signal-to-distortion ratio (SDR), 1.34 dB scale-invariant signal-to-distortion ratio (SISDR), and 0.13 perceptual evaluation of speech quality (PESQ) over the baseline on the WSJ0-2mix-extr dataset, separately. The SD-MTSS system makes a 19.2% relative speaker dependent character error rate reduction on the AliMeeting dataset.
    References | Related Articles | Metrics
    Exploring Generation of Pronunciation Lexicon for Low-Resource Language Automatic Speech Recognition Based on Generic Phone Recognizer
    Li Jinpeng, Chen Xie, Zhang Weiqiang
    2026, 31 (2):  265-272.  doi: 10.1007/s12204-024-2730-3
    Abstract ( 65 )   PDF (460KB) ( 18 )  
    The lexicon is an essential component in the hybrid automatic speech recognition (ASR) system. However, a high-quality lexicon requires significant efforts from the linguistic experts and is difficult to obtain, especially for low-resource languages. This paper addresses the problem of using a well-trained universal phone recognizer, obtained through the training of multilingual speech data and pronunciation lexicons, to generate pronunciation lexicons for low-resource languages driven by speech data. We propose a simple pipeline that utilizes this approach to generate pronunciation lexicons and apply them into ASR systems. The steps to generate the lexicon are simple and generic: applying the International Phonetic Alphabet (IPA) phone recognizer on the speech, then aligning it with the reference word sequence, followed by filtering to obtain a series of AUTO-subwords, using them to generate the AUTO-subword lexicon and the AUTO-IPA lexicon. We used the pronunciation lexicon generated for the hybrid system and for fine-tuning the pre-trained model. According to the experiment results, we are able to construct the lexicon without resourcing to linguistic experts. Furthermore, the generated lexicon is able to outperform grapheme-based lexicon and is comparable to expert lexicon.
    References | Related Articles | Metrics
    Unraveling Predictive Mechanism in Speech Perception and Production: Insights from EEG Analyses of Brain Network Dynamics
    Zhao Bin, Dang Jianwu, Li Aijun
    2026, 31 (2):  273-281.  doi: 10.1007/s12204-024-2729-9
    Abstract ( 74 )   PDF (2189KB) ( 20 )  
    How neural networks coordinate to support speech perception and speech production represents a forefront research topic in both contemporary neuroscience and artificial intelligence. Despite the successful incorporation of hierarchical and predictive attributes from biological neural networks (BNNs) into artificial counterparts, substantial disparities persist, particularly in terms of real-time feedback and nonlinear regulation. To gain a more profound understanding of how BNNs manifest these attributes, the present study employed electroencephalography (EEG) techniques to examine the spatiotemporal brain network dynamics involved in listening and oral reading of identical sentences. These two tasks engage distinct sensorimotor modalities while sharing high-level semantic and syntactic representations. According to a hierarchical feedforward model, the low-level auditory and visual inputs would be progressively transformed towards abstract representations of the sentence meaning, leading to a convergence of brain network patterns in higher cognitive regions. However, our findings challenged this viewpoint by revealing an early resemblance of network activation in the prefrontal and parietal areas in both tasks. It implies a top-down predictive mechanism along with the bottom-up progression. This bidirectional interaction could be potentially implemented through frequency-specific synchronization and desynchronization between functional-specific cortical regions, laying the foundation of the speech chain system with common neural substrates.
    References | Related Articles | Metrics
    EC-BERT: A BERT Language Model with Error Correction for Mandarin Chinese Speech Recognition
    Xiao Sujie, Hao Ruipeng, Cheng Gaofeng, Xu Xiaoyan, Li Ta
    2026, 31 (2):  282-288.  doi: 10.1007/s12204-024-2725-0
    Abstract ( 109 )   PDF (715KB) ( 18 )  
    The attention-based encoder-decoder end-to-end model has achieved promising performance in automatic speech recognition (ASR). However, in practical applications, substitution errors commonly occur in ASR systems, particularly for characters with the same or similar pronunciation. According to statistics, homophones cause at least 50% character errors. Therefore, our study focuses on addressing the issue of substitution errors with the same or similar pronunciation. In this study, we propose a BERT language model with error correction (EC-BERT) for the ASR system. We design a two-stage training schedule involving pre-training with a large amount of pseudo-paired data followed by fine-tuning with a small real-paired data to mitigate the inconsistency of the original pre-trained BERT model with our task. Unlike other error correction models, we do not need an error detection network or mask mechanism but directly use the BERT model to learn and correct the error locations. The experimental results show that our proposed method is effective and achieves a relative reduction of 19.2% in character error rate compared with the connectionist temporal classification (CTC) greedy search result and 12.8% compared with the CTC-WFST result on the AISHELL-1 test set. We also prove that our proposed EC-BERT model can achieve comparable results to other error correction models with a shorter runtime and can easily be integrated into the practical ASR system.
    References | Related Articles | Metrics
    Wav2vec-AD: Acoustic Unit Discovery Module-Integrated, Self-Supervised Contrastive Pre-training Approach for Speech Recognition
    Nurmemet Yolwas, Sun Lixu, Li Xin, Liu Qichao, Wang Zhixiang
    2026, 31 (2):  289-297.  doi: 10.1007/s12204-024-2738-8
    Abstract ( 60 )   PDF (449KB) ( 20 )  
    An effective speech recognition model necessitates an ample supply of labeled data for supervised training. However, this proposition poses a monumental challenge for low-resource languages in terms of constructing a speech recognition system with high precision. In this paper, we propose a novel pre-training strategy for contrastive learning by fusing the acoustic unit discovery module with Wav2vec 2.0, herein referred to asWav2vec-AD. This strategy, for the first time in speech contrastive learning, enables controlled negative sample selection via the acoustic unit discovery module, thereby augmenting the model’s representational learning capability. Furthermore, we conduct a thorough analysis regarding the selection of negative samples in different situations to enhance the speech representation learned by the model, optimizing its efficacy in downstream tasks. In the low-resource case, compared to the baseline Wav2vec 2.0, Wav2vec-AD achieves absolute word error rate (WER) improvements of 1.55% and 1.46% respectively on the development-clean and test-clean subsets of LibriSpeech. Moreover, absolute WER improvements of 0.63% and 4.21% were realized in Arabic and Turkish language datasets, respectively.
    References | Related Articles | Metrics
    Multi-Frame Cross-Channel Attention and Speaker Diarization Based Speaker-Attributed Automatic Speech Recognition System for Multi-Channel Multi-Party Meeting Transcription
    Xu Luzhen, Yan Haoyin, He Maokui, Guo Zixian, Zhou Yeping, Liu Peiqi, Zhang Jie, Dai Lirong
    2026, 31 (2):  298-304.  doi: 10.1007/s12204-024-2715-2
    Abstract ( 52 )   PDF (329KB) ( 25 )  
    This paper describes a speaker-attributed automatic speech recognition (SA-ASR) system submitted to the multi-channel multi-party meeting transcription challenge, which aims to address the “who spoke what” problem. We align the serialized output training-based multi-speaker ASR hypotheses and speaker diarization (SD) results to obtain speaker-attributed transcriptions. We use a pre-trained multi-frame cross-channel attention (MFCCA) model as the ASR module. We build a cascade system which includes a pre-trained speaker overlapaware neural diarization and target-speaker voice activity detection model as the SD module. Decoding and alignment strategies are further used to improve the SA-ASR performance. Our proposed system outperforms the baseline with a relative improvement of 40.3% in terms of concatenated minimum-permutation character error rate on the AliMeeting dataset, which ranks top-3 on the fixed sub-track.
    References | Related Articles | Metrics
    Recognition of Pedestrians’ Street-Crossing Intentions Based on Skeleton Features
    Lu Jushou, Chen Hao, Bai Yuchuan, Hu Chuan, Zhang Xi
    2026, 31 (2):  305-318.  doi: 10.1007/s12204-024-2700-9
    Abstract ( 45 )   PDF (3576KB) ( 21 )  
    An integrated method is proposed to solve the problem of frequent conflicts between autonomous vehicles and pedestrians in the street crossing scene. The method involves pedestrian detection, tracking, and intention recognition. First, an enhanced YOLOv8 is introduced by combining the C2f CA module to achieve accurate pedestrian detection, tracking and pose estimation. Second, a variety of intention recognition features are proposed to characterize the position and pose of pedestrians in spatial and time domains. Finally, by taking the feature data as input for the base learners, the intention classification model is proposed based on the Stacking model with SVM, KNN, and random forest as the base learners and XGBoost as the meta learner. The experimental results show that the enhanced YOLOv8 improves the detection accuracy by 5.4% compared with the original model, and the intention recognition based on the Stacking model can achieve 94.0% accuracy on the JAAD dataset, which is improved by more than 3.4% compared with the existing intention recognition models. Furthermore, when different parts of a pedestrian are occluded, the accuracy of the Stacking model still reaches 65.8%—73.3%, which verifies the robustness of the proposed model. The proposed model provides reliable inputs for decision planning of autonomous vehicles, which is conducive to improving the safety of self-driving.
    References | Related Articles | Metrics
    Traffic Light Recognition Based on Improved YOLOv5l
    Dong Ruyi, Shi Cong
    2026, 31 (2):  319-333.  doi: 10.1007/s12204-024-2712-5
    Abstract ( 45 )   PDF (2036KB) ( 21 )  
    Accurate recognition of traffic lights is essential for ensuring the safety of passengers and pedestrians, especially in the context of self-driving car technology. However, traffic lights present challenges due to their small size and limited recognition accuracy. This paper proposes an enhanced version of the YOLOv5l algorithm specifically designed for traffic light recognition. First, the K-means++ clustering algorithm is employed to generate the prior frame. Second, the SiLU activation function in the basic convolution module is replaced with the adaptive Meta-ACONC activation function, significantly improving the model’s detection accuracy. Third, the coordinate attention mechanism is integrated into the trunk feature extraction network to incorporate coordinate information into the channel, thereby enhancing the network’s sensitivity to small target positions and mitigating the ambiguity caused by increased network depth. Finally, the network’s detection scale is improved by removing the original 20 × 20 large target detection head, leading to an improved accuracy and speed for detecting small targets. The proposed approach is evaluated on self-created traffic light datasets, and compared with the original YOLOv5l model; the improved YOLOv5l model achieves a 7.1% increase in mAP@0.5, reaching 83.3%, effectively meeting the requirements for traffic light detection and recognition.
    References | Related Articles | Metrics
    YOLO-VSF: An Improved YOLO Model by Incorporating Attention Mechanism for Object Detection in Traffic Scenes
    Miao Jun, Gong Shaocui, Deng Yongqiang, Liang Hao, Li Juanjuan, Qi Honggang, Zhang Maoxuan
    2026, 31 (2):  334-347.  doi: 10.1007/s12204-024-2751-y
    Abstract ( 41 )   PDF (2943KB) ( 26 )  
    Intelligent transportation and autonomous driving systems have made urgent demands on the techniques with high performance on object detection in traffic scenes. This paper proposes an improved object detection model YOLO-VSF over the YOLOv4 model, which is a representative work with excellent performance among YOLO series of object detection models. The main improvement measures include: The backbone feature extraction network CSPDarknet53 of YOLOv4 is replaced with VGG16 to improve the feature extraction capability; SENet attention mechanism is incorporated to improve the salient and correlation feature representation capability; Focal Loss is integrated into the loss function to overcome the sample imbalance problem. In addition, the detection performance of small targets is improved by increasing the resolution of input images. Experimental results show that on the VanJee traffic image dataset provided by Beijing VanJee Technology Co., Ltd., the proposed YOLO-VSF model achieves an average mean accuracy (mAP) of 92.21 percentage points, which improves the mAP by 3.04 percentage points compared with the YOLOv4 model while maintaining the detection speed of the original model. On the UA-DETRAC dataset, the average accuracy of YOLO-VSF is close to that of the latest YOLOv7 model with the number of parameters reduced by 1.329 ×107. The proposed method can provide a support for object detection in traffic scenes.
    References | Related Articles | Metrics
    High Resolution Remote Sensing Image Segmentation Method with Improved DeepLabv3+
    Tao Hongjie, Li Zhaofei, Qi Fei, Chen Jingjue, Zhou Hao
    2026, 31 (2):  348-358.  doi: 10.1007/s12204-024-2721-4
    Abstract ( 71 )   PDF (1962KB) ( 34 )  
    In order to address the challenges associated with poor semantic segmentation results of classical semantic segmentation networks in high-resolution remote sensing images, limited performance in complex scenes, a large number of network parameters, and high training costs, this study proposes an efficient segmentation method for high-resolution remote sensing images based on an improved DeepLabv3+ approach. The method focuses on three key aspects: reducing the number of network parameters, minimizing computation volume, and enhancing performance. First, the proposed method replaces the original DeepLabv3+ backbone network Xception, which is computationally heavy, with the lighter MobileNetV2 network for feature extraction. This substitution helps reduce the number of network parameters while maintaining effective feature extraction. Second, a lightweight convolutional block attention module (CBAM) is added after the feature extraction module to enhance the network’s feature extraction capability. The inclusion of CBAM further reduces the number of network parameters. Last, coordinate attention is introduced after the shallow features obtained from the feature extraction module. This addition allows the network to focus more on relevant features in the image, while disregarding irrelevant background information. Experimental results demonstrate the effectiveness of the proposed method. In the segmentation task of the high-resolution image dataset, the method achieves a mean intersection over union (mIoU) of 75.33%. This result surpasses mainstream semantic segmentation networks such as SegNet, PSPNet, and U-Net by 12.49%, 3.16%, and 1.62% respectively. Furthermore, the proposed model has a relatively low number of network parameters, with only 6.02 × 106 parameters, and a computation volume of 26.45 GFLOPs. This balance between computational efficiency and segmentation accuracy makes the model highly valuable for edge computing applications.
    References | Related Articles | Metrics
    YOLO-SDD: An Improved YOLOv5 for Storm Drain Detection in Street-Level View
    Wang Jing, Fang Zhiqiang, Li Qianqian, Tang Zhiwei, Huang Zhangyang, Hong Zhonghua, He Haiyang
    2026, 31 (2):  359-374.  doi: 10.1007/s12204-024-2749-5
    Abstract ( 79 )   PDF (2725KB) ( 24 )  
    Urban drainage pipe system is an important part of city management. Automated detection of the status of storm drain in street-level images through current technologies in computer vision and AI is an important aspect of smart city construction. In this paper, a framework based on YOLOv5s for storm drain detection (YOLOSDD) in street view is proposed. By analyzing the characteristics of small-scale targets, YOLO-SDD focuses on optimizing the Backbone network and its loss function. Series of experiments demonstrated that in the task of detecting different states of storm drain under various environmental conditions, the mean average precision (mAP@.5) of the YOLO-SDD can reach 89.6%, increasing by 2% compared with the baseline model YOLOv5s. In the presence and absence of occlusion, the average precision of storm drain detection increased by 0.9% and 3.1%, respectively. In addition, the effectiveness and generalization ability of YOLO-SDD were further validated using the storm drain dataset of Urbana-Champaign (SDUC) from Illinois, USA, and the dataset for object detection in aerial images (DOTA). Finally, this work has deployed the YOLO-SDD on the Android system, which verifies its ability of real-time detecting storm drain in different states in street scenes.
    References | Related Articles | Metrics
    Improved Artificial Rabbit Optimization Algorithm Fused with Particle Swarm Optimization for Wireless Sensor Network Coverage Optimization
    Wu Jin, Su Zhengdong
    2026, 31 (2):  375-389.  doi: 10.1007/s12204-024-2574-x
    Abstract ( 44 )   PDF (5000KB) ( 17 )  
    Aiming at the problem of low node coverage during node deployment in wireless sensor network (WSN), an improved artificial rabbit optimization algorithm incorporating particle swarm optimization (ARO-PSO) is proposed for network coverage optimization. ARO-PSO successfully combines the stochastic characteristics of ARO and the global characteristics of PSO. Firstly, to optimize the quality of the initial population, Sine chaos mapping is introduced to initialize the population; secondly, to better balance the exploration and exploitation, adaptive settings are made; finally, combined with the characteristics of the ARO energy factor, a population decreasing strategy is introduced to further accelerate the convergence speed of the algorithm. Experimental and analytical comparisons are made with ARO and PSO and 6 other excellent optimizers on 13 benchmark functions. The results show that ARO-PSO largely outperforms the original algorithm. Finally, ARO-PSO is applied to WSN coverage optimization experiments in 2D and 3D environments, and the proposed algorithm exhibits higher network coverage and improves the monitoring quality of the network compared to standard ARO and PSO and other state-of-the-art algorithms. The experimental results fully demonstrate the superiority of the ARO-PSO-based WSN node deployment optimization method.
    References | Related Articles | Metrics
    Boundedly Rational Agents in Sequential Posted Pricing
    Huang Wenhan, Deng Xiaotie
    2026, 31 (2):  390-404.  doi: 10.1007/s12204-023-2681-0
    Abstract ( 52 )   PDF (1115KB) ( 19 )  
    We consider the well-studied sequential posted pricing scenarios. In these scenarios, an auctioneer typically learns the value distributions of all agents as prior information and then offers a take-it-or-leave-it price to each sequentially coming agent. If the value distributions are correctly learned, the dominant strategy of each agent is telling the truth. However, an agent could manipulate her value distribution to exploit the auctioneer. We study the behavior of sophisticated agents predicted by two prominent bounded rationality models: the level-k and the cognitive hierarchy models. We begin with analyzing the structure of the optimal reported distributions and then provide algorithms to compute the optimal distributions for each model. In the continuous scenarios, we show that both models are ill-defined by some examples. Moreover, we evaluate both models in discrete scenarios with different numbers of agents, different minimum units of the values, and different risk tolerances. The empirical results and a brief discussion about the Bayesian Nash equilibrium of the experimental scenarios show that both the level-k model and the equilibrium suggest the highest possible prices. In contrast, the cognitive hierarchy model suggests low prices. The level-k model and the equilibrium somehow explain the “winner’s curse” in online markets. The models and the equilibrium fail to explain that the same item could have different prices in different shops. To explain the different-price phenomenon, we suggest trying other bounded rationality models for agents and/or considering the auctioneers with bounded rationality.
    References | Related Articles | Metrics
    Numerical Investigation into Hydrodynamic Interactions Between an Open-Frame Underwater Cleaning Robot and a Full-Scale Floating Production Storage and Offloading
    Zhang Meng, Sun Lianghui, Xu Weidong, Yao Yixin, Zhang Xiaohui
    2026, 31 (2):  405-419.  doi: 10.1007/s12204-024-2698-z
    Abstract ( 84 )   PDF (3798KB) ( 20 )  
    Most of the published work related to the influence of ship hulls on hydrodynamic characteristics of underwater robots takes ship hulls as nearly infinite planes, paying less attention to the effect of hull shape. Thus using an unsteady Reynolds-averaged Navier-Stokes solver, this work investigates hydrodynamic interactions between an open-frame underwater cleaning robot (OFUCR), which is put into commercial use, and the parallel middle body of a real full-scale floating production storage and offloading (FPSO) with round bilge. Calculated results are validated by some published results. In simulation, the OFUCR moves at a speed of 1 kn, and keeps 0.1m away from the hull. Drag and lateral forces of OFUCR, and repulsive force due to the interference of FPSO are calculated. Further, dynamic pressure, velocity and vorticity in the gap between OFUCR and FPSO are drawn. The influence of longitudinal and lateral currents is concerned. As a result, drag obviously increases with the presence of FPSO owing to the wall shear stress and drop of dynamic pressure. Lateral force is found to be a repulsion force as OFUCR moves along ship bottom, and an attraction force as OFUCR moves along ship bilge and ship side.
    References | Related Articles | Metrics
    Nonlinear Disturbance Observer of Clutch Slipping Torque for Multi-Mode Hybrid Electric Vehicles
    Peng Cheng, Chen Li, Fu Shenglai
    2026, 31 (2):  420-427.  doi: 10.1007/s12204-024-2702-7
    Abstract ( 81 )   PDF (1070KB) ( 17 )  
    Clutch slipping torque varies complicatedly and has a strong nonlinearity during the mode transition of hybrid electric vehicles. In order to estimate the clutch slipping torque, an online estimator based on the nonlinear disturbance observer (NDO) is proposed. First, an estimation-oriented model of the clutch slipping torque is established based on the LuGre friction model. Next, the NDO is designed to estimate the unknown part of clutch slipping torque based on the dynamics of the output shaft, while the output shaft torque required by NDO is estimated by a Luenberger state observer. To verify the effectiveness of the proposed estimator, experiments are conducted under different initial slipping speeds and inlet oil temperatures. The results show that the proposed estimator gains high accuracy.
    References | Related Articles | Metrics
    Underactuated System Control Based on Improved Active Disturbance Rejection Control
    Chen Qiuzhuo, Zhu Biao, Ma Lixiang, Liu Bingyou, Wan Luanfei
    2026, 31 (2):  428-439.  doi: 10.1007/s12204-024-2703-6
    Abstract ( 63 )   PDF (1977KB) ( 22 )  
    To achieve the control effect of high precision, fast response speed, and good stability for a class of underactuated systems, a control strategy based on the improved active disturbance rejection controller is designed. First, a new sliding mode tracking differentiator is designed on the basis of a new sliding mode reaching rate. Given that traditional active disturbance rejection control is only suitable for single-input and single-output systems, two sliding mode tracking differentiators are used in the proposed model to obtain the given displacement and velocity as well as the actual displacement and velocity, respectively. Then, a new nonlinear function with enhanced smoothness and convergence is designed. Using the nonlinear function, an improved extended state observer for the underactuated system is designed to optimize its following error ability. Given that the parameter values of the control rate part are difficult to adjust, the particle swarm optimization algorithm is used to optimize the four parameter values of the control rate part. Finally, the simulation results show that the proposed control strategy can realize a fast and stable control of such underactuated systems.
    References | Related Articles | Metrics
    Hybrid Meta-Heuristic Algorithm for a Pickup and Delivery Problem of Ship Outfitting Pallets Distribution Considering Carbon Emissions
    Liu Ziyan, Jiang Zuhua
    2026, 31 (2):  440-457.  doi: 10.1007/s12204-024-2719-y
    Abstract ( 46 )   PDF (972KB) ( 18 )  
    Carbon emissions from ship outfitting pallet distribution account for a significant proportion of shipbuilding logistics. However, the complexity of the problem and the lack of carbon emission considerations make it difficult to achieve efficient and low-carbon distribution scheduling. To improve distribution efficiency while reducing carbon emissions, this paper formulates it as a heterogeneous fleet green multi-pickup and delivery problem with time windows. To effectively solve the problem, we develop a powerful hybrid meta-heuristic and propose a request insertion pruning strategy to accelerate the procedure. Computational results on multiple instances demonstrate the significant advantages of the proposed hybrid approach. The differences in cost components and transportation strategies between the economic and emission cost objective models are analyzed to provide meaningful managerial insights. This paper also quantifies the trade-offs between two costs and the benefits of a heterogeneous fleet over a homogeneous one. The method proposed can effectively reduce carbon emissions while improving distribution efficiency to help improve the sustainability of shipbuilding.
    References | Related Articles | Metrics
    Global Dense Two-Branch Cascade Network for Underwater Image Enhancement
    Wang Yan, Wang Likang, Zhang Jinfeng, Fan Xianghui
    2026, 31 (2):  458-474.  doi: 10.1007/s12204-024-2735-y
    Abstract ( 82 )   PDF (25606KB) ( 20 )  
    In recent years, underwater image enhancement techniques has received a wide range of attention from related researchers with the rise of marine resource exploitation. As the existing network feature extraction is not sufficient and the enhancement results have the problems of incomplete defogging and inaccurate color bias correction, in this paper, an underwater image enhancement method based on global dense two-branch cascade network and spatial domain grayscale transformation is proposed. The global dense two-branch cascade network can amplify the global dimensional interaction features while reducing information reduction on the one hand, and extract spatial features by obtaining spatial information at different scales to achieve richer feature extraction on the other hand; the spatial domain grayscale transformation operation can improve the contrast while color correcting the image, which makes the image visual effect better. After the training is completed, an end-to-end inference can be performed on the underwater images. The experimental results show that this paper’s model works best on the EUVP dataset, and compared with the second best, this paper’s model obtains 3.371, 0.06, 0.716, 0.024, and 1.727 improvements in PSNR, SSIM, UIQM, UCIQE, and CCF, respectively. Compared with other representative methods, the proposed network achieves significant visual enhancement in dealing with severe color bias, low light, and detail loss in underwater images.
    References | Related Articles | Metrics
    Effect of Stratum Distribution on Deep Circular Excavation with Dewatering Above a Multi-Aquifer System by Hydro-Mechanical Coupled Numerical Analysis
    Yan Yueheng, Li Mingguang, Xu Zhonghua, Pan Chunhui
    2026, 31 (2):  475-485.  doi: 10.1007/s12204-024-2745-9
    Abstract ( 39 )   PDF (1457KB) ( 10 )  
    Dewatering in multilayered aquifers is closely related to stratigraphic configuration. However, previous studies concentrate on the excavation response to dewatering in particular strata, lacking systematic study on the influence of stratum configuration. This study developed a hydro-mechanical coupled numerical model based on the practical engineering of Shanghai Tower deep excavation. The model and input parameters were validated by field measurements. Besides, a parametric study was conducted to investigate the effect of the thickness, permeability and depth of the aquitard on excavation performances. The analysis indicated that settlements in the excavation were insensitive to the thicknesses and permeability of aquitards, while ground surface settlement increased significantly with the increase of the confined aquifer thickness. Based on the numerical analysis, a fit relationship was established to predict the maximum ground settlements of deep excavation in Shanghai considering the excavation depth, groundwater drawdown and distribution of multilayered aquifers. The applicability of the fit equation was verified by field measurements.
    References | Related Articles | Metrics
    Multi-Objective Approach for Optimizing Production Parameters of Low-Permeability Oil Well to Enhance Energy Efficiency
    Liu Peijin, Ding Haojian, Yan Dongyang, Sun Haofeng, Huang Tao, Li Jie
    2026, 31 (2):  486-498.  doi: 10.1007/s12204-024-2736-x
    Abstract ( 88 )   PDF (954KB) ( 18 )  
    Aiming at the practical problems of high energy consumption and low energy efficiency during the exploitation of low-permeability oil wells because of insufficient traceability and poor matching performance of production parameters, this paper proposes a multi-objective approach for optimizing production parameters of low-permeability oil well to enhance its energy efficiency. First, a sub-model of daily liquid production yield and a sub-model of unit production energy consumption cost for single low-permeability oil well were established, and the Gaussian mixture model method was employed to compensate for the errors in the sub-model of unit production energy consumption cost, to solve the problem of the influence of uncertain facts during the oil well exploitation and to improve the precision of the model. Second, a multi-objective optimization model was established by taking into account the decision variables and constraints of the model, to maximize the daily liquid production yield while minimizing the unit production energy consumption cost. Subsequently, the non-dominated sorting genetic algorithm was employed to solve the multi-objective optimization model and obtain the production parameters. Finally, the solution set with obvious features was taken as the production parameters and applied to the actual production verification of low-permeability oil wells in a certain oil production plant of the ChangQing Oilfield. The results showed an increase in oil well production yield, and a significant energy-saving effect, thereby verifying the effectiveness of the proposed model and optimization algorithm in this paper.
    References | Related Articles | Metrics
    Knowledge-Data Fusion Model for Multivariate Load Short-Term Forecasting of Integrated Energy System
    Wu Lizhen, Zhao Yifan, Qin Wenbin, Chen Wei
    2026, 31 (2):  499-514.  doi: 10.1007/s12204-024-2740-1
    Abstract ( 61 )   PDF (1330KB) ( 17 )  
    The short-term forecasting of multiple loads is crucial for the optimization and scheduling of integrated energy system (IES). However, the load within the IES exhibits diversified and strongly coupled characteristics, which seriously affects the forecast accuracy. Moreover, only using deep learning forecasting methods cannot analyze the factors that affect the forecast results, which is not conducive to guiding the optimization and scheduling of comprehensive energy systems. Therefore, a multivariate load forecasting model based on knowledge-guided multi-task spatial-temporal synchronous graph convolutional network is proposed. Firstly, the user clusters are classified according to the energy-using characteristics of different buildings. Then, the domain knowledge base is built by combining the dimensionless trends of different groups and expert experience. At the same time, the input features are filtered based on the improved maximum information coefficient method to construct spatialtemporal graph data, forming a more refined and efficient input sample data. Finally, the knowledge-data fusion model for multivariate load forecasting is constructed to predict local fluctuations of the multivariate load series and reconstruct the load ratio. The IES data set of Arizona State University Tempe Campus is taken as a test case. The results show that the proposed method is interpretable, has higher forecast accuracy and has better generalization ability.
    References | Related Articles | Metrics
    Regional Integrated Energy System Resilience Enhancement Strategy at the Integrated Stage of Disaster Response and Post-Disaster Recovery
    You Minghao, Gu Jie, Liu Shuqi
    2026, 31 (2):  515-527.  doi: 10.1007/s12204-024-2742-z
    Abstract ( 81 )   PDF (681KB) ( 10 )  
    Deployment of integrated energy system is conducive to improving energy efficiency and achieving the transformation of the global energy system. However, recent appearance of extreme natural disasters poses a great challenge to the safe and stable operation of the integrated energy system. Therefore, the resilience of the integrated energy system, namely the ability to anticipate, withstand, respond to and recover to normal state, is to be enhanced urgently. This paper proposes a master-slave optimization model for the resilience enhancement of integrated energy system in the integrated stage of disaster response and post-disaster recovery, in view of the strong correlation between the two stages. The master model develops the optimal fault repair plan, and the sub model determines the optimal energy supply recovery scheme. Based on the master-slave model, which adopts the repair and operation state of the component as coupling variables, a coordinated optimization framework is constructed. Then, the master-slave model is merged into a two-stage robust optimization model for iterative solution in order to develop the optimal fault repair strategy and energy supply recovery scheme of the integrated energy system, enhancing its resilience in the integrated stage of disaster response and post-disaster recovery.
    References | Related Articles | Metrics
    Investigation of Oil-air Two-Phase Flow Inside Angular Contact Ball Bearing with Textured Cage
    Wang Baomin, Fang Wenbo, Yan Ruixiang, Qian Sikai
    2026, 31 (2):  528-536.  doi: 10.1007/s12204-024-2737-9
    Abstract ( 48 )   PDF (2141KB) ( 13 )  
    Surface texture technology is a method to improve the tribological properties of friction pairs. In this study, a cylindrical texture is designed in cage pocket, and then the volume of fluid model and the multireference frame method are used to investigate the oil volume fraction inside the bearing cavity, the pressure and oil distribution on the ball surface, and the oil distribution on the inner/outer raceway. The results show that the cylindrical texture in cage pocket is helpful to increase the oil volume fraction inside the bearing cavity, improve the pressure distribution on the ball surface, and increase the oil content on the ball surface. The cage pocket texture helps the ball to carry more lubrication oil in the high-speed rotation process, which increases the oil content of the outer raceway and improves the oil-air lubrication effect of the ball. This study proposes a new texture arrangement in cage pocket of angular contact ball bearings, and introduces the mixed mesh method to divide the fluid domain of bearing. Through comparative study, the cage pocket texture is helpful to improve the oil-air lubrication efficiency.
    References | Related Articles | Metrics