J Shanghai Jiaotong Univ Sci

News

全国人机语音通讯学术会议精选论文（2023—2025）

2024年

Anomalous Sound Detection Using Time-Frequency Feature and Mixbatch

基于时频特征和混合批处理的异常声音检测

https://doi.org/10.1007/s12204-025-2812-x

Improving Speaker Verification Back-End with Graph Neural Networks

使用图神经网络提高说话人验证后端性能

https://doi.org/10.1007/s12204-025-2806-8

Integrating Time-Frequency Domain Shallow and Deep Features for Speech-EEG Match-Mismatch of Auditory Attention Decoding

基于语音时频域和脑电深浅层特征的语音-脑电匹配失配任务的听觉注意解码

https://doi.org/10.1007/s12204-025-2800-1

Dual-Path Spectrogram Refinement Network for Robust Speaker Verification

鲁棒性说话人确认的双路谱图细化网络

https://doi.org/10.1007/s12204-025-2810-z

MHAN: Bottleneck Fusion Model Based on Hybrid Attention Network for Multimodal Emotion Recognition

MHAN:基于混合注意力网络的多模态情感识别瓶颈融合模型

https://doi.org/10.1007/s12204-025-2820-x

Speaker Extraction with Verification of Present and Absent Target Speakers

结合目标说话人存在与否验证的说话人提取

https://doi.org/10.1007/s12204-025-2798-4

2023年

Wav2vec-AD: Acoustic Unit Discovery Module-Integrated, Self-Supervised Contrastive Pre-training Approach for Speech Recognition.

Wav2vec-AD: 用于语音识别的声学单元发现模块集成式自监督对比预训练方法

https://doi.org/10.1007/s12204-024-2738-8

Simultaneous Speech Extraction for Multiple Target Speakers Under Meeting Scenarios.

会议场景下多目标说话人的语音提取

https://doi.org/10.1007/s12204-024-2739-7

Unraveling Predictive Mechanism in Speech Perception and Production: Insights from EEG Analyses of Brain Network Dynamics.

揭示语音感知和产生的预测机制: 来自脑网络动力学的EEG探究

https://doi.org/10.1007/s12204-024-2729-9

Multi-Frame Cross-Channel Attention and Speaker Diarization Based Speaker-Attributed Automatic Speech Recognition System for Multi-Channel Multi-Party Meeting Transcription.

基于多帧跨通道注意力和说话人日志的多通道多方会议转录说话人相关自动语音识别系统

https://doi.org/10.1007/s12204-024-2715-2

EC-BERT: A BERT Language Model with Error Correction for Mandarin Chinese Speech Recognition.

EC-BERT: 面向中文普通话语音识别BERT纠错语言模型

https://doi.org/10.1007/s12204-024-2725-0

Exploring Generation of Pronunciation Lexicon for Low-Resource Language Automatic Speech Recognition Based on Generic Phone Recognizer.

基于通用音素识别器的低资源语言发音词典生成探索

https://doi.org/10.1007/s12204-024-2730-3

Improving ECAPA-TDNN Performance with Coordinate Attention.

基于坐标注意力的ECAPA-TDNN模型性能研究

https://doi.org/10.1007/s12204-024-2726-z

DSNet: Disentangled Siamese Network with Neutral Calibration for Speech Emotion Recognition.

DSNet:用于语音情感识别的带有中性校准的解耦孪生网络

https://doi.org/10.1007/s12204-024-2724-1

Pubdate： 2025-09-26 Viewed： 220