上海交通大学学报 ›› 2021, Vol. 55 ›› Issue (5): 557-565.doi: 10.16183/j.cnki.jsjtu.2019.264

所属专题: 《上海交通大学学报》2021年12期专题汇总专辑 《上海交通大学学报》2021年“自动化技术、计算机技术”专题

• • 上一篇    下一篇

基于隐变量后验生成对抗网络的不平衡学习

何新林1, 戚宗锋2, 李建勋1()   

  1. 1.上海交通大学 电子信息与电气工程学院, 上海 200240
    2.电子信息系统复杂电磁环境效应国家重点实验室, 河南 洛阳 471003
  • 收稿日期:2019-09-16 出版日期:2021-05-28 发布日期:2021-06-01
  • 通讯作者: 李建勋 E-mail:lijx@sjtu.edu.cn
  • 作者简介:何新林(1992-),男,湖南省常德市人,硕士生,主要研究方向为数据挖掘.
  • 基金资助:
    国家重点研发计划(2020YFC1512203);电子信息系统复杂电磁环境效应(CEMEE)国家重点实验室基金(2019K0302A);民用飞机专项(MJ-2017-S-38);国家自然科学基金(61673265)

Unbalanced Learning of Generative Adversarial Network Based on Latent Posterior

HE Xinlin1, QI Zongfeng2, LI Jianxun1()   

  1. 1. School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
    2. State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System, Luoyang 471003, Henan, China
  • Received:2019-09-16 Online:2021-05-28 Published:2021-06-01
  • Contact: LI Jianxun E-mail:lijx@sjtu.edu.cn

摘要:

针对现有不平衡分类问题中过采样方法不能充分利用数据概率密度分布的问题,提出了一种基于隐变量后验生成对抗网络的过采样(LGOS)算法.该方法利用变分自编码求取隐变量的近似后验分布,生成器能有效估计数据真实概率分布,在隐空间中采样克服了生成对抗网络采样过程的随机性,并引入边缘分布自适应损失和条件分布自适应损失提升生成数据质量.此外,将生成样本当作源领域样本放入迁移学习框架中,提出了改进的基于实例的迁移学习(TrWSBoost)分类算法,引入了权重缩放因子,有效解决了源领域样本权重收敛过快、学习不充分的问题.实验结果表明,提出的方法在分类问题各指标上的表现明显优于现有方法.

关键词: 不平衡分类, 生成对抗网络, 隐变量, 迁移学习

Abstract:

Based on the problem that the oversampling method in the existing unbalanced classification problem cannot fully utilize the data probability density distribution, a method named latent posterior based generative adversarial network for oversampling (LGOS) was proposed. This method used variational auto-encoder to obtain the approximate posterior distribution of latent variable and generation network could effectively estimate the true probability distribution function of the data. The sampling in the latent space could overcome the randomness of generative adversarial network. The marginal distribution adaptive loss and the conditional distribution adaptive loss were introduced to improve the quality of generated data. Besides, the generated samples as source domain samples were put into the transfer learning framework, the classification algorithm of transfer learning for boosting with weight scaling (TrWSBoost) was proposed, and the weight scaling factor was introduced, which effectively solved the problem that the weight of source domain samples converge too fast and lead to insufficient learning. The experimental results show that the proposed method is superior to the existing oversampling method in the performance of common metrics.

Key words: unbalanced classification, generative adversarial network, latent variable, transfer learning

中图分类号: