J Shanghai Jiaotong Univ Sci ›› 2026, Vol. 31 ›› Issue (2): 273-281.doi: 10.1007/s12204-024-2729-9

Special Issue: 人机语音通讯

• Automation & Computer Technologies • Previous Articles     Next Articles

Unraveling Predictive Mechanism in Speech Perception and Production: Insights from EEG Analyses of Brain Network Dynamics

揭示语音感知和产生的预测机制:来自脑网络动力学的EEG 探究

赵彬1, 党建武3 , 李爱军1,2   

  1. 1. Key Laboratory of Linguistics, Chinese Academy of Social Sciences, Beijing 100732, China; 2. Corpus and Computational Linguistics Center, Institute of Linguistics, Chinese Academy of Social Sciences, Beijing 100732, China; 3. Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, Guangdong, China
  2. 1. 中国社会科学院 语言学重点实验室,北京 100732;2. 中国社会科学院 语言学研究所 语料库暨计算语言学研究中心,北京 100732;3. 中国科学院 深圳先进技术研究院,广东 深圳 518055
  • Received:2023-12-19 Accepted:2024-01-05 Online:2026-04-01 Published:2024-04-22

Abstract: How neural networks coordinate to support speech perception and speech production represents a forefront research topic in both contemporary neuroscience and artificial intelligence. Despite the successful incorporation of hierarchical and predictive attributes from biological neural networks (BNNs) into artificial counterparts, substantial disparities persist, particularly in terms of real-time feedback and nonlinear regulation. To gain a more profound understanding of how BNNs manifest these attributes, the present study employed electroencephalography (EEG) techniques to examine the spatiotemporal brain network dynamics involved in listening and oral reading of identical sentences. These two tasks engage distinct sensorimotor modalities while sharing high-level semantic and syntactic representations. According to a hierarchical feedforward model, the low-level auditory and visual inputs would be progressively transformed towards abstract representations of the sentence meaning, leading to a convergence of brain network patterns in higher cognitive regions. However, our findings challenged this viewpoint by revealing an early resemblance of network activation in the prefrontal and parietal areas in both tasks. It implies a top-down predictive mechanism along with the bottom-up progression. This bidirectional interaction could be potentially implemented through frequency-specific synchronization and desynchronization between functional-specific cortical regions, laying the foundation of the speech chain system with common neural substrates.

Key words: speech perception and production, electroencephalography (EEG) techniques, brain network dynamics, predictive coding, frequency multiplexing

摘要: 神经网络如何协调支持语音感知和语音产生是当代神经科学和人工智能的前沿研究课题。尽管人工神经网络已成功地整合了生物神经网络的层次性和预测性,但两者之间实质性的差异仍然存在,特别是在实时反馈和非线性调节方面。为了更深入地了解生物神经网络如何表现这些属性,本研究采用脑电技术研究了听力和口语阅读任务中的脑网络时空动态特性。这两个任务涉及不同的感觉运动模态,但共享高层级的语义和句法表征。根据层级前馈模型,低层级的听觉和视觉输入将逐步转化为句子意义的抽象表征,导致大脑网络模式在更高的认知区域趋同。然而,我们的研究结果揭示了与这一观点相悖的现象,即两个任务中前额叶和顶叶区域的网络激活的早期相似性,它意味着自上而下的预测机制和自下而上的同步展开。这种双向交互作用可能通过特定功能皮质区域之间频率特异性的同步和去同步来实现,为具有共同神经基质的言语链奠定了神经生理学方面的基础。

关键词: 语音感知与产生,脑电技术,脑网络动力学,预测编码,频率复用

CLC Number: