上海交通大学学报 ›› 2021, Vol. 55 ›› Issue (2): 131-140.doi: 10.16183/j.cnki.jsjtu.2020.082

所属专题: 《上海交通大学学报》2021年12期专题汇总专辑 《上海交通大学学报》2021年“自动化技术、计算机技术”专题

• • 上一篇    下一篇

基于Gaussian混合的距离度量学习数据划分方法

郑德重1,2, 杨媛媛1, 谢哲1,2, 倪扬帆1,2, 李文涛3()   

  1. 1.中国科学院上海技术物理研究所 医学影像信息学实验室,上海  200080
    2.中国科学院大学,北京  100049
    3.复旦大学附属肿瘤医院,上海  200032
  • 收稿日期:2020-03-24 出版日期:2021-02-01 发布日期:2021-03-03
  • 通讯作者: 李文涛 E-mail:liwentao98@126.com
  • 作者简介:郑德重(1990-),男,湖北省武汉市人,博士生,主要研究方向为机器学习、深度学习在医学影像方面的应用.
  • 基金资助:
    面向跨域协同医学影像新型服务模式解决方案(2017YFC0112900);人工智能医学软件测评数据库和服务平台开发(2019YFC0118803)

Data Splitting Method of Distance Metric Learning Based on Gaussian Mixed Model

ZHENG Dezhong1,2, YANG Yuanyuan1, XIE Zhe1,2, NI Yangfan1,2, LI Wentao3()   

  1. 1.Laboratory for Medical Imaging Informatics, Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200080, China
    2.University of Chinese Academy of Sciences, Beijing 100049, China
    3.Fudan University Shanghai Cancer Center, Shanghai 200032, China
  • Received:2020-03-24 Online:2021-02-01 Published:2021-03-03
  • Contact: LI Wentao E-mail:liwentao98@126.com

摘要:

针对有限样本情况下,多次训练模型时容易出现不稳定和偏差问题,提出一种基于Gaussian混合的距离度量学习数据划分方法,通过更合理地划分数据集来解决该问题.距离度量学习依靠深度神经网络优异的特征提取能力,将原始数据提取的特征嵌入到新的度量空间中;然后,在该新的度量空间中基于深层次特征使用Gaussian混合模型进行聚类分析和样本分布估计;最后,依据样本分布特点进行分层采样对数据进行合理划分.研究表明,该方法可以更好地理解数据分布的特点,获得更加合理的数据划分,进而提升模型的准确性和泛化性.

关键词: 人工智能训练, 数据集划分, 深度神经网络, Gaussian混合模型

Abstract:

Aimed at the problem of instability and deviation of multiple training model in limited samples, this paper proposes a method of distance metric learning based on the Gaussian mixture model, which can solve this problem more reasonably by dividing the dataset. Distance metric learning relies on the excellent feature extraction capabilities of deep neural networks to embed the original data into the new metric space. Then, based on the deep features, the Gaussian mixture model is used to cluster the analyzer and estimate the sample distribution in this new metric space. Finally, according to the characteristics of sample distribution, stratified sampling is used to reasonably divide the data. The research shows that the method proposed can better understand the characteristics of data distribution and obtain a more reasonable data division, thereby improving the accuracy and generalization of the model.

Key words: artificial intelligence training, dataset division, deep neural networks, Gaussian mixture model

中图分类号: