Automation & Computer Science

Cooperative Iteration Matching Method for Aligning Samples from Heterogeneous Industrial Datasets

Expand
  • 1. School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; 2. Pan Asia Technical Automotive Center Co., Ltd., Shanghai 201201, China; 3. School of Design, Shanghai Jiao Tong University, Shanghai 200240, China

Accepted date: 2022-06-08

  Online published: 2025-03-21

Abstract

Industrial data mining usually deals with data from different sources. These heterogeneous datasets describe the same object in different views. However, samples from some of the datasets may be lost. Then the remaining samples do not correspond one-to-one correctly. Mismatched datasets caused by missing samples make the industrial data unavailable for further machine learning. In order to align the mismatched samples, this article presents a cooperative iteration matching method (CIMM) based on the modified dynamic time warping (DTW). The proposed method regards the sequentially accumulated industrial data as the time series. Mismatched samples are aligned by the DTW. In addition, dynamic constraints are applied to the warping distance of the DTW process to make the alignment more efficient. Then a series of models are trained with the cumulated samples iteratively. Several groups of numerical experiments on different missing patterns and missing locations are designed and analyzed to prove the effectiveness and the applicability of the proposed method.

Cite this article

Li Han, Shi Guohong, Liu Zhao, Zhu Ping . Cooperative Iteration Matching Method for Aligning Samples from Heterogeneous Industrial Datasets[J]. Journal of Shanghai Jiaotong University(Science), 2025 , 30(2) : 375 -384 . DOI: 10.1007/s12204-023-2623-x

References

[1] PEDREGOSA F, VAROQUAUX G, GRAMFORT A, et al. Scikit-learn: Machine learning in python [J]. Journal of Machine Learning Research, 2011, 12: 2825-2830.
[2] LAKSHMINARAYAN K, HARP S A, SAMAD T. Imputation of missing data in industrial databases [J]. Applied Intelligence, 1999, 11(3): 259-275.
[3] HATHAWAY R J, BEZDEK J C. Fuzzy c-means clustering of incomplete data [J]. IEEE Transactions on Systems, Man, and Cybernetics Part B, Cybernetics, 2001, 31(5): 735-744.
[4] PELCKMANS K, DE BRABANTER J, SUYKENS J A K, et al. Handling missing values in support vector machine classifiers [J]. Neural Networks, 2005, 18(5/6): 684-692.
[5] RAHMAN M G, ISLAM M Z. Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques [J]. Knowledge-Based Systems, 2013, 53: 51-65.
[6] LIU Z G, PAN Q, DEZERT J, et al. Adaptive imputation of missing values for incomplete pattern classification [J]. Pattern Recognition, 2016, 52: 85-95.
[7] LAI X C, WU X, ZHANG L Y, et al. Imputations of missing values using a tracking-removed autoencoder trained with incomplete data [J]. Neurocomputing, 2019, 366: 54-65.
[8] ADWAN S, AROF H. On improving dynamic time warping for pattern matching [J]. Measurement, 2012, 45(6): 1609-1620.
[9] BIAN W T, CUI G, WANG X. A trajectory collaboration based map matching approach for low-samplingrate GPS trajectories [J]. Sensors, 2020, 20(7): 2057.
[10] NIE H, HAN X P, HE B, et al. Deep sequenceto- sequence entity matching for heterogeneous entity resolution [C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management. Beijing: ACM, 2019: 629–638.
[11] LI X, ZHANG W, MA H, et al. Data alignments in machinery remaining useful life prediction using deep adversarial neural networks [J]. Knowledge-Based Systems, 2020, 197: 105843.
[12] ZENG K S, LI C J, HOU L, et al. A comprehensive survey of entity alignment for knowledge graphs [J]. AI Open, 2021, 2: 1-13.
[13] FU T C. A review on time series data mining [J]. Engineering Applications of Artificial Intelligence, 2011, 24(1): 164-181.
[14] ESLING P, AGON C. Time-series data mining [J]. ACM Computing Surveys, 2012, 45(1): 1-34.
[15] YAN J H, MENG Y, LU L, et al. Industrial big data in an industry 4.0 environment: Challenges, schemes, and applications for predictive maintenance [J]. IEEE Access, 2017, 5: 23484-23491.
[16] TAO F, QI Q L, LIU A, et al. Data-driven smart manufacturing [J]. Journal of Manufacturing Systems, 2018, 48: 157-169.
[17] QI Q L, TAO F. Digital twin and big data towards smart manufacturing and industry 4.0: 360 degree comparison [J]. IEEE Access, 2018, 6: 3585-3593.
[18] ZHANG Z, TAVENARD R, BAILLY A, et al. Dynamic time warping under limited warping path length [J]. Information Sciences, 2017, 393: 91-107.
[19] KEOGH E J, PAZZANI M J. Derivative dynamic time warping [C]//2001 SIAM International Conference on Data Mining. Philadelphia: SIAM, 2001: 1-11.
[20] BISHOP C. Neural networks for pattern recognition [M]. New York: Oxford University Press, 1995.

[21] BISHOP C M. Pattern recognition and machine learning (information science and statistics) [M]. Berlin, Heidelberg: Springer, 2006: 179-224.
[22] SILVA-RAM′IREZ E L, PINO-MEJ′IAS R, L′OPEZCOELLO M, et al. Missing value imputation on missing completely at random data using multilayer perceptrons [J]. Neural Networks, 2011, 24(1): 121-129.
[23] CHEN B H, DENG W H, DU J P. Noisy softmax: improving the generalization ability of DCNN via postponing the early softmax saturation [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 4021-4030.

Outlines

/