上海交通大学学报

• • 上一篇    下一篇

多模态特征融合的船舶CAD模型检索技术

  

  1. 东华大学 机械工程学院,上海 201620
  • 作者简介:刘子昂(2001—),硕士生,研究方向为三维模型检索、组合检索

Multimodal Feature Fusion for Ship CAD Model Retrieval Technology

  1. College of Mechanical Engineering, Donghua University, Shanghai 201620, China

摘要: 本文提出了一种多模态特征融合的船舶CAD(Computer-Aided Design)模型检索方法,旨在解决传统船舶CAD模型检索中单一模态表达能力有限的问题——三维几何特征难以捕捉语义信息、文本描述无法表达精确几何结构、图像特征受视角和光照变化影响显著。该方法通过参考Context-I2W网络将参考图像映射为伪词标记,并融合CAD模型的BOM(Bill of Materials)信息和网格几何特征;设计基于WR(Weighted Residual)矩阵的多模态特征融合框架,将图像、文本和三维几何特征在语义空间对齐;构建组合查询机制,通过计算组合嵌入与候选模型特征的相似度进行匹配检索。实验在包含204个船舶多模态样本的数据集上验证,结果表明该方法在船体结构、舾装件和管系布置三类典型部件检索任务上平均mAP(Mean Average Precision)达到83.5%,较现有零样本方法提升16.7%,ROC(Receiver Operating Characteristic)曲线下面积达到0.818,在无需标注数据的情况下实现了优秀的检索性能。

关键词: 多模态检索, 上下文特征映射, 跨模态对齐, 零样本学习, CLIP模型

Abstract:

This paper This paper proposes a multimodal feature fusion method for ship CAD (Computer-Aided Design) model retrieval, aiming at solving the problem of limited single-modal expression ability in traditional ship CAD model retrieval - 3D geometric features are difficult to capture semantic information, text descriptions are unable to express the precise Geometric structure and image features are significantly affected by changes in viewing angle and illumination. The method maps reference images to pseudo-word tokens by referring to the Context-I2W network, and fuses BOM (Bill of Materials) information and mesh geometric features of CAD models; and designs a multimodal feature fusion framework based on the WR (Weighted Residual) matrix to align image, text and 3D geometric features in the semantic space; Constructing a combinatorial query mechanism for matching retrieval by calculating the similarity between combinatorial embeddings and candidate model features. The experiments are validated on a dataset containing 204 ship multimodal samples, and the results show that the method achieves an average mAP (Mean Average Precision) of 83.5% on the retrieval task of three types of typical components, namely, hull structure, outfitting parts, and piping arrangement, which is a 16.7% enhancement over the existing zero-sample methods, and the area under the ROC (Receiver Operating Characteristic) curve area under the curve reaches 0.818, which achieves excellent retrieval performance without labeling data.

Key words: Multimodal Retrieval, Contextual Feature Mapping, Cross-Modal Alignment, Zero-Shot Learning, CLIP Model

中图分类号: