J Shanghai Jiaotong Univ Sci ›› 2025, Vol. 30 ›› Issue (5): 889-898.doi: 10.1007/s12204-023-2688-6

• • 上一篇    下一篇

基于多尺度融合和自适应标签相关性的多标签图像分类模型

  

  1. 江西师范大学 计算机信息工程学院,南昌 330022
  • 收稿日期:2023-07-10 接受日期:2023-07-31 出版日期:2025-09-26 发布日期:2023-12-21

Multi-Label Image Classification Model Based on Multiscale Fusion and Adaptive Label Correlation

叶继华,江蕗, 肖顺杰, 宗义, 江爱文   

  1. School of Computer Information Engineering, Jiangxi Normal University, Nanchang 330022, China
  • Received:2023-07-10 Accepted:2023-07-31 Online:2025-09-26 Published:2023-12-21

摘要: 目前多标签图像分类的研究主要集中于探索标签之间的相关性,以提高多标签图像的分类精度。但是,现有的方法中,标签相关性是依据数据的统计信息计算的,这种标签相关性是全局且依赖于数据集,并不适合所有样本,并且在提取图像特征过程中,图像中的小物体特性信息易丢失导致小物体的分类准确率低。为此,提出一种基于多尺度融合和自适应标签相关性的多标签图像分类模型,主要思路为:首先将多个尺度的特征图融合以增强小物体的特征信息,并通过标签语义的指导将融合特征图分解为各个类别的特征向量,然后利用图注意力模块中的自注意力机制自适应地挖掘图像中类别之间的相关性,并提出一个注意力正则化损失。该模型在VOC 2007 和 MS COCO 2014 两个公开数据集上的平均精度均值(mAP)分别达到了95.6%和83.6%,并且大部分指标都优于现有的最新方法。

关键词: 图像分类, 标签相关性, 图注意力网络, 小目标, 多尺度融合

Abstract: At present, research on multi-label image classification mainly focuses on exploring the correlation between labels to improve the classification accuracy of multi-label images. However, in existing methods, label correlation is calculated based on the statistical information of the data. This label correlation is global and depends on the dataset, not suitable for all samples. In the process of extracting image features, the characteristic information of small objects in the image is easily lost, resulting in a low classification accuracy of small objects. To this end, this paper proposes a multi-label image classification model based on multiscale fusion and adaptive label correlation. The main idea is: first, the feature maps of multiple scales are fused to enhance the feature information of small objects. Semantic guidance decomposes the fusion feature map into feature vectors of each category, then adaptively mines the correlation between categories in the image through the self-attention mechanism of graph attention network, and obtains feature vectors containing category-related information for the final classification. The mean average precision of the model on the two public datasets of VOC 2007 and MS COCO 2014 reached 95.6% and 83.6%, respectively, and most of the indicators are better than those of the existing latest methods.

中图分类号: