上海交通大学学报 ›› 2021, Vol. 55 ›› Issue (5): 557-565.doi: 10.16183/j.cnki.jsjtu.2019.264
所属专题: 《上海交通大学学报》2021年12期专题汇总专辑; 《上海交通大学学报》2021年“自动化技术、计算机技术”专题
收稿日期:
2019-09-16
出版日期:
2021-05-28
发布日期:
2021-06-01
通讯作者:
李建勋
E-mail:lijx@sjtu.edu.cn
作者简介:
何新林(1992-),男,湖南省常德市人,硕士生,主要研究方向为数据挖掘.
基金资助:
HE Xinlin1, QI Zongfeng2, LI Jianxun1()
Received:
2019-09-16
Online:
2021-05-28
Published:
2021-06-01
Contact:
LI Jianxun
E-mail:lijx@sjtu.edu.cn
摘要:
针对现有不平衡分类问题中过采样方法不能充分利用数据概率密度分布的问题,提出了一种基于隐变量后验生成对抗网络的过采样(LGOS)算法.该方法利用变分自编码求取隐变量的近似后验分布,生成器能有效估计数据真实概率分布,在隐空间中采样克服了生成对抗网络采样过程的随机性,并引入边缘分布自适应损失和条件分布自适应损失提升生成数据质量.此外,将生成样本当作源领域样本放入迁移学习框架中,提出了改进的基于实例的迁移学习(TrWSBoost)分类算法,引入了权重缩放因子,有效解决了源领域样本权重收敛过快、学习不充分的问题.实验结果表明,提出的方法在分类问题各指标上的表现明显优于现有方法.
中图分类号:
何新林, 戚宗锋, 李建勋. 基于隐变量后验生成对抗网络的不平衡学习[J]. 上海交通大学学报, 2021, 55(5): 557-565.
HE Xinlin, QI Zongfeng, LI Jianxun. Unbalanced Learning of Generative Adversarial Network Based on Latent Posterior[J]. Journal of Shanghai Jiao Tong University, 2021, 55(5): 557-565.
表2
基于数据过采样的决策树分类器指标
指标 | 数据集 | 原始数据 | ROS | SMOTE | Border | MWMOTE | ADASYN | LGOS |
---|---|---|---|---|---|---|---|---|
Recall | phoneme | 0.7566 | 0.7396 | 0.7953 | 0.7976 | 0.8023 | 0.8046 | 0.8433 |
satimage | 0.9146 | 0.9365 | 0.9414 | 0.9268 | 0.9512 | 0.9524 | 0.9634 | |
pen | 0.9583 | 0.9819 | 0.9814 | 0.9814 | 0.9856 | 0.9625 | 0.9861 | |
wine | 0.6000 | 0.5627 | 0.6511 | 0.6188 | 0.6533 | 0.6583 | 0.6944 | |
letter | 0.9016 | 0.8759 | 0.8983 | 0.8769 | 0.9037 | 0.8586 | 0.9118 | |
avila | 0.9357 | 0.9394 | 0.9564 | 0.9784 | 0.9422 | 0.9697 | 0.9816 | |
F-measure | phoneme | 0.7479 | 0.7394 | 0.7528 | 0.7602 | 0.7627 | 0.7564 | 0.7586 |
satimage | 0.9146 | 0.9411 | 0.9374 | 0.9319 | 0.9414 | 0.9398 | 0.9461 | |
pen | 0.9430 | 0.9718 | 0.9586 | 0.9676 | 0.9755 | 0.9563 | 0.9681 | |
wine | 0.6084 | 0.5885 | 0.5759 | 0.5605 | 0.5945 | 0.5722 | 0.5966 | |
letter | 0.8721 | 0.8898 | 0.8646 | 0.8571 | 0.8772 | 0.8519 | 0.8936 | |
avila | 0.9400 | 0.9483 | 0.9411 | 0.9576 | 0.9280 | 0.9538 | 0.9511 | |
G-mean | phoneme | 0.8241 | 0.8157 | 0.8356 | 0.8398 | 0.8422 | 0.8394 | 0.8486 |
satimage | 0.9521 | 0.9650 | 0.9669 | 0.9596 | 0.9718 | 0.9721 | 0.9778 | |
pen | 0.9749 | 0.9888 | 0.9871 | 0.9881 | 0.9908 | 0.9783 | 0.9902 | |
wine | 0.7510 | 0.7286 | 0.7661 | 0.7483 | 0.7720 | 0.7681 | 0.7897 | |
letter | 0.9432 | 0.9324 | 0.9409 | 0.9301 | 0.9446 | 0.9207 | 0.9500 | |
avila | 0.9642 | 0.9668 | 0.9735 | 0.9853 | 0.9656 | 0.9810 | 0.9859 | |
AUC | phoneme | 0.8271 | 0.8197 | 0.8366 | 0.8410 | 0.8432 | 0.8402 | 0.8486 |
satimage | 0.9529 | 0.9655 | 0.9673 | 0.9602 | 0.9720 | 0.9724 | 0.9779 | |
pen | 0.9751 | 0.9888 | 0.9871 | 0.9881 | 0.9909 | 0.9785 | 0.9902 | |
wine | 0.7700 | 0.7533 | 0.7765 | 0.7621 | 0.7829 | 0.7775 | 0.7963 | |
letter | 0.9442 | 0.9342 | 0.9419 | 0.9317 | 0.9456 | 0.9230 | 0.9508 | |
avila | 0.9646 | 0.9672 | 0.9737 | 0.9854 | 0.9659 | 0.9811 | 0.9860 |
表3
基于数据过采样的迁移学习分类器指标
指标 | 数据集 | ROS | SMOTE | Border | MWMOTE | ADASYN | TrAdaboost | LGOS |
---|---|---|---|---|---|---|---|---|
Recall | phoneme | 0.8266 | 0.8333 | 0.8466 | 0.8400 | 0.8500 | 0.8433 | 0.8633 |
satimage | 0.9512 | 0.9390 | 0.9390 | 0.9512 | 0.9634 | 0.9512 | 0.9756 | |
pen | 1.0000 | 1.0000 | 0.9907 | 0.9953 | 1.0000 | 0.9953 | 1.0000 | |
wine | 0.5166 | 0.6166 | 0.6277 | 0.6388 | 0.5944 | 0.6944 | 0.7722 | |
letter | 0.9152 | 0.9186 | 0.9152 | 0.9220 | 0.9322 | 0.9220 | 0.9491 | |
avila | 0.9862 | 0.9862 | 0.9862 | 0.9954 | 0.9862 | 0.9954 | 1.0000 | |
F-measure | phoneme | 0.8378 | 0.8361 | 0.8396 | 0.84 | 0.8388 | 0.8281 | 0.8477 |
satimage | 0.9512 | 0.9565 | 0.9506 | 0.9512 | 0.9634 | 0.9512 | 0.9696 | |
pen | 0.9953 | 0.9976 | 0.9930 | 0.9976 | 0.9953 | 0.9976 | 1.0000 | |
wine | 0.5942 | 0.6646 | 0.6420 | 0.6301 | 0.6114 | 0.5868 | 0.6698 | |
letter | 0.9540 | 0.9559 | 0.9523 | 0.9560 | 0.9649 | 0.9560 | 0.9705 | |
avila | 0.9930 | 0.9907 | 0.9930 | 0.9954 | 0.9907 | 0.9954 | 1.0000 | |
G-mean | phoneme | 0.8832 | 0.8843 | 0.8895 | 0.8879 | 0.8901 | 0.8835 | 0.8976 |
satimage | 0.9728 | 0.9678 | 0.9672 | 0.9728 | 0.9797 | 0.9728 | 0.9858 | |
pen | 0.9994 | 0.9997 | 0.9951 | 0.9976 | 0.9994 | 0.9976 | 1.0000 | |
wine | 0.7058 | 0.7700 | 0.7711 | 0.7739 | 0.7490 | 0.7870 | 0.8402 | |
letter | 0.9565 | 0.9583 | 0.9564 | 0.9599 | 0.9655 | 0.9599 | 0.9739 | |
avila | 0.9930 | 0.9928 | 0.9930 | 0.9974 | 0.9928 | 0.9974 | 1.0000 | |
AUC | phoneme | 0.8851 | 0.8859 | 0.8906 | 0.8892 | 0.8910 | 0.8845 | 0.8983 |
satimage | 0.9731 | 0.9682 | 0.9676 | 0.9731 | 0.9798 | 0.9731 | 0.9859 | |
pen | 0.9994 | 0.9997 | 0.9951 | 0.9976 | 0.9994 | 0.9976 | 1.0000 | |
wine | 0.7404 | 0.7891 | 0.7875 | 0.7881 | 0.7690 | 0.7932 | 0.8432 | |
letter | 0.9574 | 0.9591 | 0.9573 | 0.9607 | 0.9661 | 0.9607 | 0.9743 | |
avila | 0.9931 | 0.9928 | 0.9931 | 0.9974 | 0.9928 | 0.9974 | 1.0000 |
[1] | FOTOUHI S, ASADI S, KATTAN M W. A comprehensive data level analysis for cancer diagnosis on imbalanced data[J]. Journal of Biomedical Informa-tics, 2019, 90:103089. |
[2] |
NAMVAR A, SIAMI M, RABHI F, et al. Credit risk prediction in an imbalanced social lending environment[J]. International Journal of Computational Intelligence Systems, 2018, 11(1):925-935.
doi: 10.2991/ijcis.11.1.70 URL |
[3] |
SOLEYMANI R, GRANGER E, FUMERA G. Progressive boosting for class imbalance and its application to face re-identification[J]. Expert Systems With Applications, 2018, 101:271-291.
doi: 10.1016/j.eswa.2018.01.023 URL |
[4] |
LEE T, LEE K B, KIM C O. Performance of machine learning algorithms for class-imbalanced process fault detection problems[J]. IEEE Transactions on Semiconductor Manufacturing, 2016, 29(4):436-445.
doi: 10.1109/TSM.2016.2602226 URL |
[5] |
CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: Synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16:321-357.
doi: 10.1613/jair.953 URL |
[6] | HAN H, WANG W Y, MAO B H. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning [C]//International Conference on Intelligent Computing. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005: 878-887. |
[7] | HE H B, BAI Y, GARCIA E A, et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning [C]//2008 IEEE International Joint Conference on Neural Networks. Piscataway, NJ, USA: IEEE, 2008: 1322-1328. |
[8] |
BARUA S, ISLAM M M, YAO X, et al. MWMOTE: Majority weighted minority oversampling technique for imbalanced data set learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(2):405-425.
doi: 10.1109/TKDE.2012.232 URL |
[9] |
DOUZAS G, BACAO F. Effective data generation for imbalanced learning using conditional generative adversarial networks[J]. Expert Systems With Applications, 2018, 91:464-471.
doi: 10.1016/j.eswa.2017.09.030 URL |
[10] |
HE H B, GARCIA E A. Learning from imbalanced data[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9):1263-1284.
doi: 10.1109/TKDE.2008.239 URL |
[11] |
SUN Y M, KAMEL M S, WONG A K C, et al. Cost-sensitive boosting for classification of imba-lanced data[J]. Pattern Recognition, 2007, 40(12):3358-3378.
doi: 10.1016/j.patcog.2007.04.009 URL |
[12] | CHAWLA N V, LAZAREVIC A, HALL L O, et al. SMOTEBoost: Improving prediction of the minority class in boosting [C]//European Conference on Principles of Data Mining and Knowledge Discovery. Berlin, Heidelberg: Springer Berlin Heidelberg, 2003: 107-119. |
[13] |
CHEN S, HE H B, GARCIA E A. RAMOBoost: Ranked minority oversampling in boosting[J]. IEEE Transactions on Neural Networks, 2010, 21(10):1624-1642.
doi: 10.1109/TNN.2010.2066988 URL |
[14] | ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks [C]//2017 IEEE International Conference on Computer Vision. Piscataway, NJ, USA: IEEE, 2017: 2242-2251. |
[15] |
ZHANG H, XU T, LI H S, et al. StackGAN: Realistic image synjournal with stacked generative adversarial networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8):1947-1962.
doi: 10.1109/TPAMI.34 URL |
[16] | GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2. Cambridge, MA, USA: MIT Press, 2014: 2672-2680. |
[17] |
PAN S J, TSANG I W, KWOK J T, et al. Domain adaptation via transfer component analysis[J]. IEEE Transactions on Neural Networks, 2011, 22(2):199-210.
doi: 10.1109/TNN.2010.2091281 URL |
[18] | LONG M S, WANG J M, DING G G, et al. Transfer feature learning with joint distribution adaptation [C]//2013 IEEE International Conference on Computer Vision. Piscataway, NJ, USA: IEEE, 2013: 2200-2207. |
[19] | LONG M S, ZHU H, WANG J M, et al. Deep transfer learning with joint adaptation networks [C]//ICML'17: Proceedings of the 34th International Conference on Machine Learning-Volume 70. New York, NY, USA: ACM, 2017: 2208-2217. |
[20] | DAI W Y, YANG Q, XUE G R, et al. Boosting for transfer learning[C]//Proceedings of the 24th International Conference on Machine Learning-ICML '07. New York: ACM Press, 2007: 193-200. |
[21] | 王胜涛. 基于迁移过采样的类别不平衡学习算法研究[D]. 南京: 东南大学, 2017. |
WANG Shengtao. Research on transfer-sampling based method for class-imbalance learning[D]. Nanjing: Southeast University, 2017. | |
[22] | 么素素, 王宝亮, 侯永宏. 绝对不平衡样本分类的集成迁移学习算法[J]. 计算机科学与探索, 2018, 12(7):1145-1153. |
YAO Susu, WANG Baoliang, HOU Yonghong. Ensemble transfer learning algorithm for absolute imbalanced data classification[J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(7):1145-1153. |
[1] | 李钰, 杨道勇, 刘玲亚, 王易因. 利用生成对抗网络实现水下图像增强[J]. 上海交通大学学报, 2022, 56(2): 134-142. |
[2] | 沈慧, 刘世民, 许敏俊, 黄德林, 鲍劲松, 郑小虎. 面向加工领域的数字孪生模型自适应迁移方法[J]. 上海交通大学学报, 2022, 56(1): 70-80. |
[3] | 王悦行, 吴永国, 徐传刚. 基于深度迁移学习的红外舰船目标检测算法[J]. 空天防御, 2021, 4(4): 61-66. |
[4] | 祁生勇, 臧月进, 吕国云, 杜明. 基于生成对抗网络的空中目标图像生成算法研究[J]. 空天防御, 2021, 4(2): 67-. |
[5] | 姜宇迪, 胡晖, 殷跃红. 基于无监督迁移学习的电梯制动器剩余寿命预测[J]. 上海交通大学学报, 2021, 55(11): 1408-1416. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||