High Resolution Remote Sensing Image Segmentation Method with Improved DeepLabv3+

doi:10.1007/s12204-024-2721-4

Abstract

Abstract: In order to address the challenges associated with poor semantic segmentation results of classical semantic segmentation networks in high-resolution remote sensing images, limited performance in complex scenes, a large number of network parameters, and high training costs, this study proposes an efficient segmentation method for high-resolution remote sensing images based on an improved DeepLabv3+ approach. The method focuses on three key aspects: reducing the number of network parameters, minimizing computation volume, and enhancing performance. First, the proposed method replaces the original DeepLabv3+ backbone network Xception, which is computationally heavy, with the lighter MobileNetV2 network for feature extraction. This substitution helps reduce the number of network parameters while maintaining effective feature extraction. Second, a lightweight convolutional block attention module (CBAM) is added after the feature extraction module to enhance the network’s feature extraction capability. The inclusion of CBAM further reduces the number of network parameters. Last, coordinate attention is introduced after the shallow features obtained from the feature extraction module. This addition allows the network to focus more on relevant features in the image, while disregarding irrelevant background information. Experimental results demonstrate the effectiveness of the proposed method. In the segmentation task of the high-resolution image dataset, the method achieves a mean intersection over union (mIoU) of 75.33%. This result surpasses mainstream semantic segmentation networks such as SegNet, PSPNet, and U-Net by 12.49%, 3.16%, and 1.62% respectively. Furthermore, the proposed model has a relatively low number of network parameters, with only 6.02 × 10⁶ parameters, and a computation volume of 26.45 GFLOPs. This balance between computational efficiency and segmentation accuracy makes the model highly valuable for edge computing applications.

Key words: remote sensing image, DeepLabv3+, MobileNetV2, attention mechanism, semantic segmentation

摘要： 针对经典语义分割网络在高分辨率遥感图像语义分割效果不佳、复杂场景下分割性能受限、网络参数量多，训练网络代价高等问题，从网络参数量、计算量和性能三个方面综合考虑，提出一种基于改进DeepLabv3+的高分辨率遥感图像高效分割方法。该方法首先使用更轻量级的MobileNetV2网络替换DeepLabv3+原始主干网络Xception进行特征提取；其次在特征提取模块获得的深层有效特征之后加入轻量级的通用卷积注意力模块（CBAM），在减少网络参数量的同时增强网络特征提取能力；最后在特征提取模块获得的浅层特征后引入坐标注意力机制，使其更关注图像中有效的特征信息，忽略无关的背景信息。实验结果表明，该方法在高分图像数据集分割任务中mIoU达到75.33%，分别比SegNet、PspNet和U-Net等主流语义分割网络高12.49%、3.16%和1.62%，同时该模型的网络参数量为6.02×10⁶，浮点运算量为26.45 GFLOPs，在计算效率和分割精度之间达到了较好的平衡，对边缘计算具有较高的应用价值。

关键词: 遥感图像，DeepLabv3+，MobileNetV2，注意力机制，语义分割

CLC Number:

TP23
TP18

Tao Hongjie, Li Zhaofei, Qi Fei, Chen Jingjue, Zhou Hao. High Resolution Remote Sensing Image Segmentation Method with Improved DeepLabv3+[J]. J Shanghai Jiaotong Univ Sci, 2026, 31(2): 348-358.

References

[1] LI D R, WANG M, JIANG J. China’s high-resolution optical remote sensing satellites and their mapping applications [J]. Geo-spatial Information Science, 2021, 24(1): 85-94.

[2] ZHANG J, JING H T, FAN S H. Sea-land segmentation for remote sensing imagery based on coastline database [J]. Electronic Measurement Technology, 2020, 43(23): 115-120 (in Chinese).

[3] MATIKAINEN L, KARILA K. Segment-based land cover mapping of a suburban area—Comparison of high-resolution remotely sensed datasets using classification trees and test field points [J]. Remote Sensing, 2011, 3(8): 1777-1804.

[4] TIAN X, WANG L, DING Q. Review of image semantic segmentation based on deep learning [J]. Journal of Software, 2019, 30(2): 440-468 (in Chinese).

[5] ERUS G, LOMÉNIE N. How to involve structural modeling for cartographic object recognition tasks in high-resolution satellite images? [J]. Pattern Recognition Letters, 2010, 31(10): 1109-1119.

[6] OTSU N. A threshold selection method from gray-level histograms [J]. IEEE Transactions on Systems, Man, and Cybernetics, 1979, 9(1): 62-66.

[7] BEZDEK J C, EHRLICH R, FULL W. FCM: The fuzzy c-means clustering algorithm [J]. Computers & Geosciences, 1984, 10(2/3): 191-203.

[8] PENG B, ZHANG L, ZHANG D. A survey of graph theoretical approaches to image segmentation [J]. Pattern Recognition, 2013, 46(3): 1020-1038.

[9] MITRA P, SHANKAR B U, PAL S K. Segmentation of multispectral remote sensing images using active support vector machines [J]. Pattern Recognition Letters, 2004, 25(9): 1067-1074.

[10] POGGI G, SCARPA G, ZERUBIA J B. Supervised segmentation of remote sensing images based on a tree-structured MRF model [J]. IEEE Transactions on Geoscience and Remote Sensing, 2005, 43(8): 1901-1911.

[11] SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651.

[12] RONNEBERGER O, FISCHER P, BROX T. U-net: Convolutional networks for biomedical image segmentation[M]//Medical image computing and computer-assisted intervention – MICCAI 2015. Cham: Springer, 2015: 234-241.

[13] BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.

[14] ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6230-6239.

[15] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs [DB/OL]. (2014-12-22). https://arxiv.org/abs/1412.7062

[16] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848.

[17] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.

[18] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation [DB/OL]. (2017-06-17). https://arxiv.org/abs/1706.05587

[19] DU S J, DU S H, LIU B, et al. Incorporating DeepLabv3+ and object-based image analysis for semantic segmentation of very high resolution remote sensing images [J]. International Journal of Digital Earth, 2021, 14(3): 357-378.

[20] ZENG H B, PENG S Q, LI D X. Deeplabv3+ semantic segmentation model based on feature cross attention mechanism [J]. Journal of Physics: Conference Series, 2020, 1678(1): 012106.

[21] HUANG C, YANG J, LIU Y, et al. Remote sensing image segmentation algorithm based on improved DeeplabV3+[J]. Electronic Measurement Technology, 2022, 45(21): 148-155 (in Chinese).

[22] HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13708-13717.

[23] WANG Z M, WANG J S, YANG K, et al. Semantic segmentation of high-resolution remote sensing images based on a class feature attention mechanism fused with Deeplabv3+ [J]. Computers & Geosciences, 2022, 158: 104969.

[24] GUO M H, LU C G, HOU Q B, et al. SegNeXt: Rethinking convolutional attention design for semantic segmentation [C]// 36th Conference on Neural Information Processing Systems. New Orleans: NIPS, 2022: 1-17.

[25] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[M]// Computer vision – ECCV 2018. Cham: Springer, 2018: 3-19.

[26] SANDLER M, HOWARD A G, ZHU M L, et al. Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation [DB/OL]. (2018-01-13). https://arxiv.org/abs/1801.04381

[27] TONG X Y, XIA G S, LU Q, et al. Land-cover classification with high-resolution remote sensing images using transferable deep models[J]. Remote Sensing of Environment, 2020, 237: 111322.