上海交通大学学报 ›› 2021, Vol. 55 ›› Issue (2): 131-140.doi: 10.16183/j.cnki.jsjtu.2020.082
所属专题: 《上海交通大学学报》2021年12期专题汇总专辑; 《上海交通大学学报》2021年“自动化技术、计算机技术”专题
郑德重1,2, 杨媛媛1, 谢哲1,2, 倪扬帆1,2, 李文涛3()
收稿日期:
2020-03-24
出版日期:
2021-02-01
发布日期:
2021-03-03
通讯作者:
李文涛
E-mail:liwentao98@126.com
作者简介:
郑德重(1990-),男,湖北省武汉市人,博士生,主要研究方向为机器学习、深度学习在医学影像方面的应用.
基金资助:
ZHENG Dezhong1,2, YANG Yuanyuan1, XIE Zhe1,2, NI Yangfan1,2, LI Wentao3()
Received:
2020-03-24
Online:
2021-02-01
Published:
2021-03-03
Contact:
LI Wentao
E-mail:liwentao98@126.com
摘要:
针对有限样本情况下,多次训练模型时容易出现不稳定和偏差问题,提出一种基于Gaussian混合的距离度量学习数据划分方法,通过更合理地划分数据集来解决该问题.距离度量学习依靠深度神经网络优异的特征提取能力,将原始数据提取的特征嵌入到新的度量空间中;然后,在该新的度量空间中基于深层次特征使用Gaussian混合模型进行聚类分析和样本分布估计;最后,依据样本分布特点进行分层采样对数据进行合理划分.研究表明,该方法可以更好地理解数据分布的特点,获得更加合理的数据划分,进而提升模型的准确性和泛化性.
中图分类号:
郑德重, 杨媛媛, 谢哲, 倪扬帆, 李文涛. 基于Gaussian混合的距离度量学习数据划分方法[J]. 上海交通大学学报, 2021, 55(2): 131-140.
ZHENG Dezhong, YANG Yuanyuan, XIE Zhe, NI Yangfan, LI Wentao. Data Splitting Method of Distance Metric Learning Based on Gaussian Mixed Model[J]. Journal of Shanghai Jiao Tong University, 2021, 55(2): 131-140.
[1] | YU Y L, JI Z, GUO J C, et al. Transductive zero-shot learning with adaptive structural embedding[C]∥IEEE Transactions on Neural Networks and Learning Systems. Piscataway, NJ, USA: IEEE, 2018: 4116-4127. |
[2] | SHEN D G, WU G R, SUK H I. Deep learning in medical image analysis[J]. Annual Review of Biomedical Engineering, 2017, 19(1): 221-248. |
[3] | XIONG C M. Recent progress in deep reinforcement learning for computer vision and NLP[C]∥Proceedings of the 2017 Workshop on Recognizing Families in the Wild. New York, NY, USA: ACM Press, 2017: 1. |
[4] | MAY R J, MAIER H R, DANDY G C. Data splitting for artificial neural networks using SOM-based stratified sampling[J]. Neural Networks, 2010, 23(2): 283-294. |
[5] | ROBERTS D R, BAHN V, CIUTI S, et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure[J]. Ecography, 2017, 40(8): 913-929. |
[6] | PAL S K, SINGH H P, KUMAR S, et al. A family of efficient estimators of the finite population mean in simple random sampling[J]. Journal of Statistical Computation and Simulation, 2018, 88(5): 920-934. |
[7] | REITERMANOVA Z. Data splitting [EB/OL]. (2010-06-03) [2019-12-11]. . |
[8] | BAXTER C W, STANLEY S J, ZHANG Q, et al. Developing artificial neural network models of water treatment processes: A guide for utilities[J]. Journal of Environmental Engineering and Science, 2002, 1(3): 201-211. |
[9] | SNEE R D. Validation of regression models: Methods and examples[J]. Technometrics, 1977, 19(4): 415-428. |
[10] | HADI A S, KAUFMAN L, ROUSSEEUW P J. Finding groups in data: An introduction to cluster analysis[J]. Technometrics, 1992, 34(1): 111. |
[11] | DOUZAS G, BACAO F. Self-organizing map oversampling (SOMO) for imbalanced data set learning[J]. Expert Systems with Applications, 2017, 82: 40-52. |
[12] | SUÁREZ J L, GARCÍA S, HERRERA F. A tutorial on distance metric learning: Mathematical foundations, algorithms and experiments [EB/OL]. (2018-12-14)[2019-12-11]. |
[13] | FERNÁNDEZ J J M, MAYERLE R. Sample selection via angular distance in the space of the arguments of an artificial neural network[J]. Computers & Geosciences, 2018, 114: 98-106. |
[14] | BAGLAEVA E M, SERGEEV A P, SHICHKIN A V, et al. The effect of splitting of raw data into training and test subsets on the accuracy of predicting spatial distribution by a multilayer perceptron[J]. Mathematical Geosciences, 2020, 52(1): 111-121. |
[15] | HE X W, ZHOU Y, ZHOU Z C, et al. Triplet-center loss for multi-view 3D object retrieval[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE, 2018: 1945-1954. |
[16] | ALONSO A G. Probability density imputation of missing data with Gaussian Mixture Models[D]. Massachusetts, USA: University of Oxford, 2017. |
[17] | SILVA D S F, DEUTSCH C V. Multivariate data imputation using Gaussian mixture models[J]. Spatial Statistics, 2018, 27: 74-90. |
[18] | ZONG B, SONG Q, MIN M R, et al. Deep autoencoding Gaussian Mixture Model for unsupervised anomaly detection[C]∥Sixth International Conference on Learning Representations. Vancouver, Canada: ICLR, 2018: 1-19. |
[19] | LI L S, HANSMAN R J, PALACIOS R, et al. Anomaly detection via a Gaussian Mixture Model for flight operation and safety monitoring[J]. Transportation Research Part C: Emerging Technologies, 2016, 64: 45-57. |
[20] | FAN Y X, WEN G J, LI D R, et al. Video anomaly detection and localization via Gaussian Mixture Fully Convolutional Variational Autoencoder[J]. Computer Vision and Image Understanding, 2020, 195: 102920. |
[21] | MA J Y, JIANG J J, LIU C Y, et al. Feature guided Gaussian mixture model with semi-supervised EM and local geometric constraint for retinal image registration[J]. Information Sciences, 2017, 417: 128-142. |
[22] | HUANG T, PENG H, ZHANG K. Model selection for Gaussian mixture models[J]. Statistica Sinica, 2017: 147-169. |
[1] | 王卓鑫, 赵海涛, 谢月涵, 任翰韬, 袁明清, 张博明, 陈吉安. 反向传播神经网络联合遗传算法对复合材料模量的预测[J]. 上海交通大学学报, 2022, 56(10): 1341-1348. |
[2] | 倪扬帆, 杨媛媛, 谢哲, 郑德重, 王卫东. 基于LSTM与注意力结构的肺结节多特征抽取方法[J]. 上海交通大学学报, 2022, 56(8): 1078-1088. |
[3] | 王子垚, 郭凤祥, 陈俐. 基于外推高斯过程回归方法的发动机排放预测[J]. 上海交通大学学报, 2022, 56(5): 604-610. |
[4] | 郑德重, 杨媛媛, 黄浩哲, 谢哲, 李文涛. 基于距离置信度分数的多模态融合分类网络[J]. 上海交通大学学报, 2022, 56(1): 89-100. |
[5] | 陈禹伊, 陈璐. 车辆路径规划问题的逆向优化方法[J]. 上海交通大学学报, 2022, 56(1): 81-88. |
[6] | 袁铭, 刘群, 孙海超, 谭洪胜. 基于变分推断和元路径分解的异质网络表示方法[J]. 上海交通大学学报, 2021, 55(5): 586-597. |
[7] | 魏宪,李元祥,赵海涛,庹红娅,许鹏. 基于改进ISOMAP算法的图像分类[J]. 上海交通大学学报(自然版), 2010, 44(07): 911-0915. |
[8] | 杨博, 曾春源, 陈义军, 束洪春, 曹璞璘. 极限学习机及其在质子交换膜燃料电池参数辨识中的应用(网络首发)[J]. 上海交通大学学报, 0, (): 0-. |
[9] | 张建, 胡小锋, 张亚辉. 基于自步学习的刀具加工过程监测数据异常检测方法(网络首发)[J]. 上海交通大学学报, 0, (): 0-. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||