多模态特征融合的船舶CAD模型检索技术

刘子昂, 吕超凡, 张丹, 鲍劲松

doi:10.16183/j.cnki.jsjtu.2025.146

上海交通大学学报 >

0 1

DOI: https://doi.org/10.16183/j.cnki.jsjtu.2025.146

多模态特征融合的船舶CAD模型检索技术

展开

东华大学机械工程学院，上海 201620

刘子昂（2001—），硕士生，研究方向为三维模型检索、组合检索

鲍劲松，教授，博士生导师，研究方向为智能制造系统、人机协同与机器人、虚拟现实等；E-mail：bao@dhu.edu.cn

网络出版日期: 2025-10-07

收起

Multimodal Feature Fusion for Ship CAD Model Retrieval Technology

Expand

College of Mechanical Engineering, Donghua University, Shanghai 201620, China

Online published: 2025-10-07

Fold

摘要

本文提出了一种多模态特征融合的船舶CAD（Computer-Aided Design）模型检索方法，旨在解决传统船舶CAD模型检索中单一模态表达能力有限的问题——三维几何特征难以捕捉语义信息、文本描述无法表达精确几何结构、图像特征受视角和光照变化影响显著。该方法通过参考Context-I2W网络将参考图像映射为伪词标记，并融合CAD模型的BOM（Bill of Materials）信息和网格几何特征；设计基于WR（Weighted Residual）矩阵的多模态特征融合框架，将图像、文本和三维几何特征在语义空间对齐；构建组合查询机制，通过计算组合嵌入与候选模型特征的相似度进行匹配检索。实验在包含204个船舶多模态样本的数据集上验证，结果表明该方法在船体结构、舾装件和管系布置三类典型部件检索任务上平均mAP（Mean Average Precision）达到83.5%，较现有零样本方法提升16.7%，ROC（Receiver Operating Characteristic）曲线下面积达到0.818，在无需标注数据的情况下实现了优秀的检索性能。

关键词： 多模态检索; 上下文特征映射; 跨模态对齐; 零样本学习; CLIP模型

本文引用格式

刘子昂, 吕超凡, 张丹, 鲍劲松 . 多模态特征融合的船舶CAD模型检索技术[J]. 上海交通大学学报, 0 : 1 . DOI: 10.16183/j.cnki.jsjtu.2025.146

Abstract

This paper This paper proposes a multimodal feature fusion method for ship CAD (Computer-Aided Design) model retrieval, aiming at solving the problem of limited single-modal expression ability in traditional ship CAD model retrieval - 3D geometric features are difficult to capture semantic information, text descriptions are unable to express the precise Geometric structure and image features are significantly affected by changes in viewing angle and illumination. The method maps reference images to pseudo-word tokens by referring to the Context-I2W network, and fuses BOM (Bill of Materials) information and mesh geometric features of CAD models; and designs a multimodal feature fusion framework based on the WR (Weighted Residual) matrix to align image, text and 3D geometric features in the semantic space; Constructing a combinatorial query mechanism for matching retrieval by calculating the similarity between combinatorial embeddings and candidate model features. The experiments are validated on a dataset containing 204 ship multimodal samples, and the results show that the method achieves an average mAP (Mean Average Precision) of 83.5% on the retrieval task of three types of typical components, namely, hull structure, outfitting parts, and piping arrangement, which is a 16.7% enhancement over the existing zero-sample methods, and the area under the ROC (Receiver Operating Characteristic) curve area under the curve reaches 0.818, which achieves excellent retrieval performance without labeling data.

Key words： Multimodal Retrieval; Contextual Feature Mapping; Cross-Modal Alignment; Zero-Shot Learning; CLIP Model

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract