J Shanghai Jiaotong Univ Sci ›› 2025, Vol. 30 ›› Issue (2): 375-384.doi: 10.1007/s12204-023-2623-x

• Automation & Computer Science • Previous Articles     Next Articles

Cooperative Iteration Matching Method for Aligning Samples from Heterogeneous Industrial Datasets

面向工业异构数据匹配的联合迭代匹配方法

李晗1,史国宏2,刘钊3,朱平1   

  1. 1. School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; 2. Pan Asia Technical Automotive Center Co., Ltd., Shanghai 201201, China; 3. School of Design, Shanghai Jiao Tong University, Shanghai 200240, China
  2. 1. 上海交通大学 机械与动力工程学院,上海200240;2. 泛亚汽车技术中心有限公司,上海 201201;3. 上海交通大学 设计学院,上海200240
  • Accepted:2022-06-08 Online:2025-03-21 Published:2025-03-21

Abstract: Industrial data mining usually deals with data from different sources. These heterogeneous datasets describe the same object in different views. However, samples from some of the datasets may be lost. Then the remaining samples do not correspond one-to-one correctly. Mismatched datasets caused by missing samples make the industrial data unavailable for further machine learning. In order to align the mismatched samples, this article presents a cooperative iteration matching method (CIMM) based on the modified dynamic time warping (DTW). The proposed method regards the sequentially accumulated industrial data as the time series. Mismatched samples are aligned by the DTW. In addition, dynamic constraints are applied to the warping distance of the DTW process to make the alignment more efficient. Then a series of models are trained with the cumulated samples iteratively. Several groups of numerical experiments on different missing patterns and missing locations are designed and analyzed to prove the effectiveness and the applicability of the proposed method.

Key words: dynamic time warping, mismatched samples, sample alignment, industrial data, data missing

摘要: 工业数据挖掘通常处理不同来源的数据,这些异构数据集以不同的角度描述同一个物理对象。然而,其中一些数据集的个别样本可能缺失,进而使得其余样本不能逐一正确匹配。样本缺失导致的数据集匹配错误使得这些工业数据无法用于进一步的机器学习。因此,为了对齐错误匹配的数据,本文提出一种基于改进动态时间规整的联合迭代匹配方法。本文将按顺序积累的工业数据视作时间序列,进而基于动态时间规整对错配样本进行匹配对齐。在动态时间规整过程中,对规整距离施加动态约束以提高对齐效率,随后用迭代过程中累积的样本迭代训练模型。最后通过设计和分析几组不同缺失模式和缺失位置的数值实验,证明了所提方法的有效性和适用性。

关键词: 动态时间规整,错配样本,样本对齐,工业数据,数据缺失

CLC Number: