Journal of Shanghai Jiaotong University >
Data Splitting Method of Distance Metric Learning Based on Gaussian Mixed Model
Received date: 2020-03-24
Online published: 2021-03-03
Aimed at the problem of instability and deviation of multiple training model in limited samples, this paper proposes a method of distance metric learning based on the Gaussian mixture model, which can solve this problem more reasonably by dividing the dataset. Distance metric learning relies on the excellent feature extraction capabilities of deep neural networks to embed the original data into the new metric space. Then, based on the deep features, the Gaussian mixture model is used to cluster the analyzer and estimate the sample distribution in this new metric space. Finally, according to the characteristics of sample distribution, stratified sampling is used to reasonably divide the data. The research shows that the method proposed can better understand the characteristics of data distribution and obtain a more reasonable data division, thereby improving the accuracy and generalization of the model.
ZHENG Dezhong, YANG Yuanyuan, XIE Zhe, NI Yangfan, LI Wentao . Data Splitting Method of Distance Metric Learning Based on Gaussian Mixed Model[J]. Journal of Shanghai Jiaotong University, 2021 , 55(2) : 131 -140 . DOI: 10.16183/j.cnki.jsjtu.2020.082
[1] | YU Y L, JI Z, GUO J C, et al. Transductive zero-shot learning with adaptive structural embedding[C]∥IEEE Transactions on Neural Networks and Learning Systems. Piscataway, NJ, USA: IEEE, 2018: 4116-4127. |
[2] | SHEN D G, WU G R, SUK H I. Deep learning in medical image analysis[J]. Annual Review of Biomedical Engineering, 2017, 19(1): 221-248. |
[3] | XIONG C M. Recent progress in deep reinforcement learning for computer vision and NLP[C]∥Proceedings of the 2017 Workshop on Recognizing Families in the Wild. New York, NY, USA: ACM Press, 2017: 1. |
[4] | MAY R J, MAIER H R, DANDY G C. Data splitting for artificial neural networks using SOM-based stratified sampling[J]. Neural Networks, 2010, 23(2): 283-294. |
[5] | ROBERTS D R, BAHN V, CIUTI S, et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure[J]. Ecography, 2017, 40(8): 913-929. |
[6] | PAL S K, SINGH H P, KUMAR S, et al. A family of efficient estimators of the finite population mean in simple random sampling[J]. Journal of Statistical Computation and Simulation, 2018, 88(5): 920-934. |
[7] | REITERMANOVA Z. Data splitting [EB/OL]. (2010-06-03) [2019-12-11]. . |
[8] | BAXTER C W, STANLEY S J, ZHANG Q, et al. Developing artificial neural network models of water treatment processes: A guide for utilities[J]. Journal of Environmental Engineering and Science, 2002, 1(3): 201-211. |
[9] | SNEE R D. Validation of regression models: Methods and examples[J]. Technometrics, 1977, 19(4): 415-428. |
[10] | HADI A S, KAUFMAN L, ROUSSEEUW P J. Finding groups in data: An introduction to cluster analysis[J]. Technometrics, 1992, 34(1): 111. |
[11] | DOUZAS G, BACAO F. Self-organizing map oversampling (SOMO) for imbalanced data set learning[J]. Expert Systems with Applications, 2017, 82: 40-52. |
[12] | SUáREZ J L, GARCíA S, HERRERA F. A tutorial on distance metric learning: Mathematical foundations, algorithms and experiments [EB/OL]. (2018-12-14)[2019-12-11]. |
[13] | FERNáNDEZ J J M, MAYERLE R. Sample selection via angular distance in the space of the arguments of an artificial neural network[J]. Computers & Geosciences, 2018, 114: 98-106. |
[14] | BAGLAEVA E M, SERGEEV A P, SHICHKIN A V, et al. The effect of splitting of raw data into training and test subsets on the accuracy of predicting spatial distribution by a multilayer perceptron[J]. Mathematical Geosciences, 2020, 52(1): 111-121. |
[15] | HE X W, ZHOU Y, ZHOU Z C, et al. Triplet-center loss for multi-view 3D object retrieval[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE, 2018: 1945-1954. |
[16] | ALONSO A G. Probability density imputation of missing data with Gaussian Mixture Models[D]. Massachusetts, USA: University of Oxford, 2017. |
[17] | SILVA D S F, DEUTSCH C V. Multivariate data imputation using Gaussian mixture models[J]. Spatial Statistics, 2018, 27: 74-90. |
[18] | ZONG B, SONG Q, MIN M R, et al. Deep autoencoding Gaussian Mixture Model for unsupervised anomaly detection[C]∥Sixth International Conference on Learning Representations. Vancouver, Canada: ICLR, 2018: 1-19. |
[19] | LI L S, HANSMAN R J, PALACIOS R, et al. Anomaly detection via a Gaussian Mixture Model for flight operation and safety monitoring[J]. Transportation Research Part C: Emerging Technologies, 2016, 64: 45-57. |
[20] | FAN Y X, WEN G J, LI D R, et al. Video anomaly detection and localization via Gaussian Mixture Fully Convolutional Variational Autoencoder[J]. Computer Vision and Image Understanding, 2020, 195: 102920. |
[21] | MA J Y, JIANG J J, LIU C Y, et al. Feature guided Gaussian mixture model with semi-supervised EM and local geometric constraint for retinal image registration[J]. Information Sciences, 2017, 417: 128-142. |
[22] | HUANG T, PENG H, ZHANG K. Model selection for Gaussian mixture models[J]. Statistica Sinica, 2017: 147-169. |
/
〈 |
|
〉 |