上海交通大学学报(自然版) ›› 2017, Vol. 51 ›› Issue (9): 1111-1116.doi: 10.16183/j.cnki.jsjtu.2017.09.014

• 兵器工业 • 上一篇    下一篇

 利用结构特征的语音压缩感知重建算法

 贾晓立,江晓波,蒋三新,刘佩林   

  1.  上海交通大学  北斗导航与位置服务重点实验室
  • 出版日期:2017-09-20 发布日期:2017-09-20
  • 基金资助:
     

 A Reconstruction Algorithm for Speech Compressive Sensing Using Structural Features

JIA Xiaoli,JIANG Xiaobo,JIANG Sanxin,LIU Peilin   

  1.  Shanghai Key Laboratory of Navigation and LocationBased Services,
    Shanghai Jiao Tong University
  • Online:2017-09-20 Published:2017-09-20
  • Supported by:
     

摘要:  针对语音信号在变换域中不够稀疏使得压缩感知重建困难的问题,提出了一种利用频域结构特征的重建算法.该算法为单帧语音信号的修正离散余弦变换系数引入幅度和状态2个隐变量,并分别用高斯马尔可夫过程和马尔可夫链对幅度和状态沿频率轴的连续性建模.在此基础上用因子图表示系数及其幅度、状态的联合后验分布,在因子图上用Turbo消息传递迭代求出系数的后验均值,进而重建原始语音信号.与当前几种最新的算法相比,该算法在不同帧长、不同压缩率下均获得更高的重建精度,重建信号在时频图上的能量分布也与原始语音最为接近.可见,利用语音频域系数的连续性,以Turbo消息传递的方式可以在压缩感知中得到较高的重建精度.

关键词: 语音信号, 压缩感知, 高斯混合模型, 马尔可夫链, 消息传递 

Abstract:  It is difficult to reconstruct speech signal after compressive sampling because coefficients of the signal in transforming domain aren’t sparse enough. In this paper the speech signal was recovered from compressed samples in the frequency domain using structural features. Two hidden variables, amplitude and state, are defined for each modified discrete cosine transforming (MDCT) coefficient of the speech signal. The probability density function of the amplitude of the MDCT coefficient is represented using a Gaussian mixture model, and the continuity of the states along the frequency axis is modeled through a first order Markov chain,the continuity of the amplitude along the frequency axis is modeled through GaussMarkov process. The joint posterior distribution of coefficient, amplitude and state is represented by the factor graph, on which the posterior mean of the coefficient is obtained using Turbo message passing method, and then the speech can be reconstructed. After compressive sampling the MDCT coefficients of a speech segment, we reconstructed the signal using our proposed algorithm and other stateoftheart algorithms for comparison. The results showed that our proposed algorithm achieved best reconstruction quality under different frames and compressive ratios. The spectrogram showed that the energy distribution of reconstructed signal using our algorithm was the most similar to the original signal’s energy distribution. It can be seen that better reconstruction accuracy can be obtained using the continuity along frequency axis and Turbo message passing method.

Key words:  speech signal, compressive sensing, Gaussian mixture model, Markov chain, message passing

中图分类号: