基于窗口自注意力网络的单图像去雨算法

高涛, 文渊博, 陈婷, 张静

doi:10.16183/j.cnki.jsjtu.2022.032

上海交通大学学报 >

2023 , Vol. 57 >Issue 5: 613 - 623

DOI: https://doi.org/10.16183/j.cnki.jsjtu.2022.032

电子信息与电气工程

基于窗口自注意力网络的单图像去雨算法

展开

1.长安大学信息工程学院,西安 710064
2.澳大利亚国立大学工程与计算机学院,澳大利亚堪培拉 2600

高涛(1981-),教授,博士生导师,现主要从事数字图像处理和模式识别研究.

收稿日期: 2022-02-14

修回日期: 2022-03-20

录用日期: 2022-04-28

网络出版日期: 2022-08-23

基金资助

国家重点研发计划项目(2019YFE0108300);国家自然科学基金项目(52172379);国家自然科学基金项目(62001058);陕西省重点研发计划(2019GY-039);中央高校基本科研业务费专项资金项目(300102242901);中央高校基本科研业务费专项资金项目(300102112601)

收起

A Single Image Deraining Algorithm Based on Swin Transformer

Expand

1. School of Information Engineering, Chang’an University, Xi’an 710064, China
2. College of Engineering and Computer Science, Australian National University, Canberra 2600, ACT, Australia

Received date: 2022-02-14

Revised date: 2022-03-20

Accepted date: 2022-04-28

Online published: 2022-08-23

Fold

摘要

单图像去雨研究旨在利用退化的雨图恢复出无雨图像,而现有的基于深度学习的去雨算法未能有效地利用雨图的全局性信息,导致去雨后的图像损失部分细节和结构信息.针对此问题,提出一种基于窗口自注意力网络 (Swin Transformer) 的单图像去雨算法.该算法网络主要包括浅层特征提取模块和深度特征提取网络两部分.前者利用上下文信息聚合输入来适应雨痕分布的多样性,进而提取雨图的浅层特征.后者利用Swin Transformer捕获全局性信息和像素点间的长距离依赖关系,并结合残差卷积和密集连接强化特征学习,最后通过全局残差卷积输出去雨图像.此外,提出一种同时约束图像边缘和区域相似性的综合损失函数来进一步提高去雨图像的质量.实验表明,与目前单图像去雨表现优秀的算法MSPFN、 MPRNet相比,该算法使去雨图像的峰值信噪比提高0.19 dB和2.17 dB,结构相似性提高3.433%和1.412%,同时网络模型参数量下降84.59%和34.53%,前向传播平均耗时减少21.25%和26.67%.

关键词： 计算机视觉; 单图像去雨; 窗口自注意力网络; 残差网络; 自注意力机制; 空洞卷积

本文引用格式

高涛, 文渊博, 陈婷, 张静 . 基于窗口自注意力网络的单图像去雨算法[J]. 上海交通大学学报, 2023 , 57(5) : 613 -623 . DOI: 10.16183/j.cnki.jsjtu.2022.032

Abstract

Single image deraining aims to recover the rain-free image from rainy image. Most existing deraining methods based on deep learning do not utilize the global information of rainy image effectively, which makes them lose much detailed and structural information after processing. Focusing on this issue, this paper proposes a single image deraining algorithm based on Swin Transformer. The network mainly includes a shallow features extraction module and a deep features extraction network. The former exploits the context information aggregation module to adapt to the distribution diversity of rain streaks and extracts the shallow features of rainy image. The latter uses Swin Transformer to capture the global information and long-distance dependencies between different pixels, in combination with residual convolution and dense connection to strengthen features learning. Finally, the derained image is obtained through a global residual convolution. In addition, this paper proposes a novel comprehensive loss function that constrains the similarity of image edges and regions synchronously to further improve the quality of derained image. Extensive experimental results show that, compared with the two state-of-the-art methods, MSPFN and MPRNet, the average peak signal-to-noise ratio of derained images of our method increases by 0.19 dB and 2.17 dB, and the average structural similarity increases by 3.433% and 1.412%. At the same time, the model parameters of the proposed network decreases by 84.59% and 34.53%, and the forward propagation time reduces by 21.25% and 26.67%.

Key words： computer vision; single image deraining; Swin Transformer; residual network; self-attention mechanism; dilated convolution

参考文献

[1]	陈舒曼, 陈玮, 尹钟. 单幅图像去雨算法研究现状及展望[J]. 计算机应用研究, 2022, 39(1): 9-17.
[1]	CHEN Shuman, CHEN Wei, YIN Zhong. Research status and prospect of single image rain removal algorithm[J]. Application Research of Computers, 2022, 39(1): 9-17.
[2]	DENG S, WEI M, WANG J, et al. Detail-recovery image deraining via context aggregation networks [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020: 14560-14569.
[3]	HE K, GKIOXARI G, DOLLáR P, et al. Mask RCNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 2961-2969.
[4]	王春波, 张卫东, 张文渊, 等. 复杂交通环境中车辆的视觉检测[J]. 上海交通大学学报, 2000, 34(12): 1680-1682.
[4]	WANG Chunbo, ZHANG Weidong, ZHANG Wen-yuan, et al. Vision-based vehicles detection in complex traffic scenes[J]. Journal of Shanghai Jiao Tong University, 2000, 34(12): 1680-1682.
[5]	YANG W, TAN R T, WANG S, et al. Single image deraining: From model-based to data-driven and beyond[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(11): 4059-4077.
[6]	ZHENG X, LIAO Y, GUO W, et al. Single-image-based rain and snow removal using multi-guided filter[C]//International Conference on Neural Information Processing. Daegu, South Korea: APNNS, 2013: 258-265.
[7]	KANG L, LIN C, FU Y. Automatic single-image-based rain streaks removal via image decomposition[J]. IEEE Transactions on Image Processing, 2011, 21(4): 1742-1755.
[8]	LUO Y, XU Y, JI H. Removing rain from a single image via discriminative sparse coding[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 3397-3405.
[9]	LI Y, TAN R T, GUO X, et al. Rain streak removal using layer priors[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 2736-2744.
[10]	FU X, HUANG J, DING X, et al. Clearing the skies: A deep network architecture for single-image rain removal[J]. IEEE Transactions on Image Processing, 2017, 26(6): 2944-2956.
[11]	WEI W, MENG D, ZHAO Q, et al. Semi-supervised transfer learning for image rain removal[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019: 3877-3886.
[12]	ZHANG H, PATEL V M. Density-aware single image deraining using a multi-stream dense network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 695-704.
[13]	YASARLA R, PATEL V M. Uncertainty guided multi-scale residual learning-using a cycle spinning CNN for single image de-raining[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019: 8405-8414.
[14]	LI X, WU J, LIN Z, et al. Recurrent squeeze-and-excitation context aggregation net for single image deraining[C]//Proceedings of the European Conference on Computer Vision. Salt Lake City, UT, USA: IEEE, 2018: 254-269.
[15]	REN D, ZUO W, HU Q, et al. Progressive image deraining networks: A better and simpler baseline[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019: 3937-3946.
[16]	JIANG K, WANG Z, YI P, et al. Multi-scale progressive fusion network for single image deraining[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE. 2020: 8346-8355.
[17]	ZAMIR S W, ARORA A, KHAN S, et al. Multi-stage progressive image restoration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Kuala Lumpur, Malaysia: IEEE, 2021: 14821-14831.
[18]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems. Long Beach, CA, USA: NIPS, 2017: 5998-6008.
[19]	LIU Z, LIN Y, CAO Y, et al. Swin Transformer: Hierarchical vision Transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 10012-10022.
[20]	XIAO T, DOLLAR P, SINGH M, et al. Early convolutions help transformers see better[C]//Thirty-Fifth Conference on Neural Information Processing Systems. Montreal, Canada: NIPS, 2021: 34.
[21]	YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[C]//International Conference on Leaning Representations. Caribe Hilton, San Juan, Puerto Rico: OpenReview. net, 2016: 1-13.
[22]	WANG P, CHEN P, YUAN Y, et al. Understanding convolution for semantic segmentation[C]//2018 IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe, NV, USA: IEEE, 2018: 1451-1460.
[23]	LIANG J, CAO J, SUN G, et al. SwinIR: Image restoration using swin transformer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 1833-1844.
[24]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 770-778.
[25]	HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Venice, Italy: IEEE, 2017: 4700-4708.
[26]	WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: From error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612.
[27]	KAMGAR-PARSI B, ROSENFELD A. Optimally isotropic Laplacian operator[J]. IEEE Transactions on Image Processing, 1999, 8(10): 1467-1472.
[28]	ZHANG H, SINDAGI V, PATEL V M. Image de-raining using a conditional generative adversarial network[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 30(11): 3943-3956.
[29]	YANG W, TAN R T, FENG J, et al. Deep joint rain detection and removal from a single image[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Venice, Italy: IEEE, 2017: 1357-1366.
[30]	FU X, LIANG B, HUANG Y, et al. Lightweight pyramid networks for image deraining[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 31(6): 1794-1807.
[31]	HUYNH-THU Q, GHANBARI M. Scope of validity of PSNR in image/video quality assessment[J]. Electronics Letters, 2008, 44(13): 800-801.
[32]	MITTAL A, SOUNDARARAJAN R, BOVIK A C. Making a “completely blind” image quality analyzer[J]. IEEE Signal Processing Letters, 2012, 20(3): 209-212.
[33]	LIU L, LIU B, HUANG H, et al. No-reference image quality assessment based on spatial and spectral entropies[J]. Signal Processing: Image Communication, 2014, 29(8): 856-863.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献