上海交通大学学报 ›› 2023, Vol. 57 ›› Issue (5): 613-623.doi: 10.16183/j.cnki.jsjtu.2022.032

所属专题: 《上海交通大学学报》2023年“电子信息与电气工程”专题

• 电子信息与电气工程 • 上一篇    下一篇

基于窗口自注意力网络的单图像去雨算法

高涛1, 文渊博1(), 陈婷1, 张静2   

  1. 1.长安大学 信息工程学院,西安 710064
    2.澳大利亚国立大学 工程与计算机学院,澳大利亚 堪培拉 2600
  • 收稿日期:2022-02-14 修回日期:2022-03-20 接受日期:2022-04-28 出版日期:2023-05-28 发布日期:2023-06-02
  • 通讯作者: 文渊博 E-mail:wyb@chd.edu.cn.
  • 作者简介:高涛(1981-),教授,博士生导师,现主要从事数字图像处理和模式识别研究.
  • 基金资助:
    国家重点研发计划项目(2019YFE0108300);国家自然科学基金项目(52172379);国家自然科学基金项目(62001058);陕西省重点研发计划(2019GY-039);中央高校基本科研业务费专项资金项目(300102242901);中央高校基本科研业务费专项资金项目(300102112601)

A Single Image Deraining Algorithm Based on Swin Transformer

GAO Tao1, WEN Yuanbo1(), CHEN Ting1, ZHANG Jing2   

  1. 1. School of Information Engineering, Chang’an University, Xi’an 710064, China
    2. College of Engineering and Computer Science, Australian National University, Canberra 2600, ACT, Australia
  • Received:2022-02-14 Revised:2022-03-20 Accepted:2022-04-28 Online:2023-05-28 Published:2023-06-02
  • Contact: WEN Yuanbo E-mail:wyb@chd.edu.cn.

摘要:

单图像去雨研究旨在利用退化的雨图恢复出无雨图像,而现有的基于深度学习的去雨算法未能有效地利用雨图的全局性信息,导致去雨后的图像损失部分细节和结构信息.针对此问题,提出一种基于窗口自注意力网络 (Swin Transformer) 的单图像去雨算法.该算法网络主要包括浅层特征提取模块和深度特征提取网络两部分.前者利用上下文信息聚合输入来适应雨痕分布的多样性,进而提取雨图的浅层特征.后者利用Swin Transformer捕获全局性信息和像素点间的长距离依赖关系,并结合残差卷积和密集连接强化特征学习,最后通过全局残差卷积输出去雨图像.此外,提出一种同时约束图像边缘和区域相似性的综合损失函数来进一步提高去雨图像的质量.实验表明,与目前单图像去雨表现优秀的算法MSPFN、 MPRNet相比,该算法使去雨图像的峰值信噪比提高0.19 dB和2.17 dB,结构相似性提高3.433%和1.412%,同时网络模型参数量下降84.59%和34.53%,前向传播平均耗时减少21.25%和26.67%.

关键词: 计算机视觉, 单图像去雨, 窗口自注意力网络, 残差网络, 自注意力机制, 空洞卷积

Abstract:

Single image deraining aims to recover the rain-free image from rainy image. Most existing deraining methods based on deep learning do not utilize the global information of rainy image effectively, which makes them lose much detailed and structural information after processing. Focusing on this issue, this paper proposes a single image deraining algorithm based on Swin Transformer. The network mainly includes a shallow features extraction module and a deep features extraction network. The former exploits the context information aggregation module to adapt to the distribution diversity of rain streaks and extracts the shallow features of rainy image. The latter uses Swin Transformer to capture the global information and long-distance dependencies between different pixels, in combination with residual convolution and dense connection to strengthen features learning. Finally, the derained image is obtained through a global residual convolution. In addition, this paper proposes a novel comprehensive loss function that constrains the similarity of image edges and regions synchronously to further improve the quality of derained image. Extensive experimental results show that, compared with the two state-of-the-art methods, MSPFN and MPRNet, the average peak signal-to-noise ratio of derained images of our method increases by 0.19 dB and 2.17 dB, and the average structural similarity increases by 3.433% and 1.412%. At the same time, the model parameters of the proposed network decreases by 84.59% and 34.53%, and the forward propagation time reduces by 21.25% and 26.67%.

Key words: computer vision, single image deraining, Swin Transformer, residual network, self-attention mechanism, dilated convolution

中图分类号: