Anomalous Sound Detection Using Time-Frequency Feature and Mixbatch
基于时频特征和混合批处理的异常声音检测
https://doi.org/10.1007/s12204-025-2812-x
Improving Speaker Verification Back-End with Graph Neural Networks
使用图神经网络提高说话人验证后端性能
https://doi.org/10.1007/s12204-025-2806-8
Integrating Time-Frequency Domain Shallow and Deep Features for Speech-EEG Match-Mismatch of Auditory Attention Decoding
基于语音时频域和脑电深浅层特征的语音-脑电匹配失配任务的听觉注意解码
https://doi.org/10.1007/s12204-025-2800-1
Dual-Path Spectrogram Refinement Network for Robust Speaker Verification
鲁棒性说话人确认的双路谱图细化网络
https://doi.org/10.1007/s12204-025-2810-z
MHAN: Bottleneck Fusion Model Based on Hybrid Attention Network for Multimodal Emotion Recognition
MHAN:基于混合注意力网络的多模态情感识别瓶颈融合模型
https://doi.org/10.1007/s12204-025-2820-x
Speaker Extraction with Verification of Present and Absent Target Speakers
结合目标说话人存在与否验证的说话人提取
https://doi.org/10.1007/s12204-025-2798-4
2023年
Wav2vec-AD: Acoustic Unit Discovery Module-Integrated, Self-Supervised Contrastive Pre-training Approach for Speech Recognition.
Wav2vec-AD: 用于语音识别的声学单元发现模块集成式自监督对比预训练方法
https://doi.org/10.1007/s12204-024-2738-8
Simultaneous Speech Extraction for Multiple Target Speakers Under Meeting Scenarios.
会议场景下多目标说话人的语音提取
https://doi.org/10.1007/s12204-024-2739-7
Unraveling Predictive Mechanism in Speech Perception and Production: Insights from EEG Analyses of Brain Network Dynamics.
揭示语音感知和产生的预测机制: 来自脑网络动力学的EEG探究
https://doi.org/10.1007/s12204-024-2729-9
Multi-Frame Cross-Channel Attention and Speaker Diarization Based Speaker-Attributed Automatic Speech Recognition System for Multi-Channel Multi-Party Meeting Transcription.
基于多帧跨通道注意力和说话人日志的多通道多方会议转录说话人相关自动语音识别系统
https://doi.org/10.1007/s12204-024-2715-2
EC-BERT: A BERT Language Model with Error Correction for Mandarin Chinese Speech Recognition.
EC-BERT: 面向中文普通话语音识别BERT纠错语言模型
https://doi.org/10.1007/s12204-024-2725-0
Exploring Generation of Pronunciation Lexicon for Low-Resource Language Automatic Speech Recognition Based on Generic Phone Recognizer.
基于通用音素识别器的低资源语言发音词典生成探索
https://doi.org/10.1007/s12204-024-2730-3
Improving ECAPA-TDNN Performance with Coordinate Attention.
基于坐标注意力的ECAPA-TDNN模型性能研究
https://doi.org/10.1007/s12204-024-2726-z
DSNet: Disentangled Siamese Network with Neutral Calibration for Speech Emotion Recognition.
DSNet:用于语音情感识别的带有中性校准的解耦孪生网络
https://doi.org/10.1007/s12204-024-2724-1