A Review of Semi-Supervised Learning Theories and Recent Advances

doi:10.16183/j.cnki.jsjtu.2018.10.017

Abstract

Abstract: Semi-supervised learning, which has emerged from the beginning of this century, is a new type of learning method between traditional supervised learning and unsupervised learning. The main idea of semi-supervised learning is to introduce unlabeled samples into model training process to avoid performance (or model) degeneration due to insufficiency of labeled samples. Semi-supervised learning has been applied sucessfully in many fields. This paper reviews the development process and main theories of semi-supervised learning, as well as its recent advances and importance in solving real-world problems demonstrated by typical application examples.

Key words: machine learning, semi-supervised learning, graph Laplacian

CLC Number:

TP 181

TU Enmei,YANG Jie. A Review of Semi-Supervised Learning Theories and Recent Advances[J]. Journal of Shanghai Jiao Tong University, 2018, 52(10): 1280-1291.

References

［1］ZHU X. Semi-supervised learning literature survey［J］. Computer Science, University of Wisconsin-Madison, 2006, 2(3): 4. ［2］CHAPELLE O, SCHOLKOPF B, ZIEN A. Semi-supervised learning［J］. IEEE Transactions on Neural Networks, 2009, 20(3): 542. ［3］CASTELLI Vittorio, COVER Thomas M. The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter［J］. IEEE Transactions on Information Theory, 1996, 42(6): 2102-2117. ［4］SHAHSHAHANI B M, LANDGREBE D A. The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon［J］. IEEE Transactions on Geoscience and Remote Sensing, 1994, 32(5): 1087-1095. ［5］RATSABY J, VENKATESH S S. Learning from a mixture of labeled and unlabeled examples with parametric side information［C］//Proceedings of the Eighth Annual Conference on Computational Learning Theory. ACM, 1995: 412-417. ［6］NIGAM K, MCCALLUM A, THRUN S, et al. Learning to classify text from labeled and unlabeled documents［C］//The Fifteenth National Conference on Artificial Intelligence, 1998: 792-799. ［7］MCCALLUMZY A K, NIGAMY K. Employing EM and pool-based active learning for text classification［C］//Proc. International Conference on Machine Learning (ICML), 1998: 359-367. ［8］DE SA V R. Learning classification with unlabeled data［C］//Advances in Neural Information Processing Systems, 1994: 112-119. ［9］BENNETT K P, DEMIRIZ A. Semi-supervised support vector machines［C］//Advances in Neural Information Processing Systems, 1999: 368-374. ［10］JOACHIMS T. SVM light: Support vector machine［J］. SVM-Light Support Vector Machine, 1999, 19(4): 1-12. ［11］VAPNIK V. Statistical learning theory［M］. Wiley, New York, 1998. ［12］JOACHIMS T. Transductive inference for text classification using support vector machines［J］. International Conference on Machine Learning, 1999, 200-209. ［13］DE BIE T, CRISTIANINI N. Semi-supervised learning using semi-definite programming［J］. Semi-supervised learning. MIT Press, Cambridge-Massachusetts, 2006. ［14］DE BIE T, CRISTIANINI N. Convex methods for transduction［C］//Advances in Neural Information Processing Systems, 2004: 73-80. ［15］XU L, SCHUURMANS D. Unsupervised and semi-supervised multi-class support vector machines［C］//The Twentieth National Conference on Artificial Intelligence, 2005: 13-23. ［16］XU Z, JIN R, ZHU J, et al. Efficient convex relaxation for transductive support vector machine［C］//Advances in Neural Information Processing Systems, 2008: 1641-1648. ［17］FUNG G, MANGASARIAN O L. Semi-supervised support vector machines for unlabeled data classification［J］. Optimization methods and software, 2001, 15(1): 29-44. ［18］COLLOBERT R, SINZ F, WESTON J, et al. Large scale transductive SVMs［J］. Journal of Machine Learning Research, 2006, 7: 1687-1712. ［19］WANG J, SHEN X, PAN W. On transductive support vector machines［J］. Contemporary Mathematics, 2007, 443: 7-20. ［20］CHAPELLE O, CHI M, ZIEN A. A continuation method for semi-supervised SVMs［C］//Proceedings of the 23rd international conference on Machine learning. ACM, 2006: 185-192. ［21］CHAPELLE O, ZIEN A. Semi-supervised classification by low density separation［C］//Tenth International Workshop on Artificial Intelligence and Statistics, 2005: 57-64. ［22］SINDHWANI V, KEERTHI S S. Large scale semi-supervised linear SVMs［C］//Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2006: 477-484. ［23］BLUM A, MITCHELL T. Combining labeled and unlabeled data with co-training［C］//Proceedings of the Eleventh Annual Conference on Computational Learning Theory. ACM, 1998: 92-100. ［24］NIGAM K, GHANI R. Analyzing the effectiveness and applicability of co-training［C］//Proceedings of the Ninth International Conference on Information and Knowledge Management. ACM, 2000: 86-93. ［25］BALCAN M, BLUM A, YANG K. Co-training and expansion: Towards bridging theory and practice［C］//Advances in Neural Information Processing Systems, 2005: 89-96. ［26］SARKAR A. Applying co-training methods to statistical parsing［C］//Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, Association for Computational Linguistics, 2001: 1-8. ［27］CLARK S, CURRAN J R, OSBORNE M. Bootstrapping POS taggers using unlabelled data［C］//Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL, Association for Computational Linguistics, 2003: 49-55. ［28］NG V, CARDIE C. Weakly supervised natural language learning without redundant views［C］//Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2003. ［29］MIHALCEA R. Co-training and self-training for word sense disambiguation［C］//Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL, 2004. ［30］WAN X. Co-training for cross-lingual sentiment classification［C］//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Association for Computational Linguistics, 2009: 235-243. ［31］NIGAM K, MCCALLUM A K, THRUN S, et al. Text classification from labeled and unlabeled documents using EM［J］. Machine Learning, 2000, 39(2): 103-134. ［32］COZMAN F G, COHEN I, CIRELO M C. Semi-supervised learning of mixture models［C］//Proceedings of the 20th International Conference on Machine Learning, 2003: 99-106. ［33］LU Z, LEEN T K. Semi-supervised learning with penalized probabilistic clustering［C］//Advances in Neural Information Processing Systems, 2005: 849-856. ［34］FUJINO A, UEDA N, SAITO K. A hybrid generative/discriminative approach to semi-supervised classifier design［C］//Twentieth National Conference on Artificial Intelligence, 2005: 764-769. ［35］ROSENBERG C, HEBERT M, SCHNEIDERMAN H. Semi-supervised self-training of object detection models［C］//Proceedings of the Seventh IEEE Workshops on Application of Computer Vision, 2005. ［36］CULP M, MICHAILIDIS G, An iterative algorithm for extending learners to a semi-supervised setting ［J］. Journal of Computational and Graphical Statistics, 2008, 17(3): 545-571. ［37］LEE D H. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks［C］//Workshop on Challenges in Representation Learning, 2013: 2-10. ［38］BLUM A, CHAWLA S. Learning from labeled and unlabeled data using graph mincuts［J］. Proceedings of the Eighteenth International Conference on Machine Learning, 2001. ［39］ZHU X, GHAHRAMANI Z, LAFFERTY J D. Semi-supervised learning using Gaussian fields and harmonic functions［C］//The International Conference on Machine Learning, 2003: 912-919. ［40］ZHOU D, BOUSQUET O, LAL T N, et al. Learning with local and global consistency［C］//Advances in Neural Information Processing Systems, 2004: 321-328. ［41］SINDHWANI V, NIYOGI P, BELKIN M. Beyond the point cloud: From transductive to semi-supervised learning［C］//Proceedings of the 22nd International Conference on Machine Learning. ACM, 2005: 824-831. ［42］BELKIN M, NIYOGI P, SINDHWANI V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples［J］. Journal of Machine Learning Research, 2006, 7: 2399-2434. ［43］BELKIN M, NIYOGI P. Semi-supervised learning on Riemannian manifolds［J］. Machine Learning, 2004, 56(1): 209-239. ［44］ZHU X, GHAHRAMANI Z. Learning from labeled and unlabeled data with label propagation［J］. Tech. Rep., Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002. ［45］ZHANG K, KWOK J T, PARVIN B. Prototype vector machine for large scale semi-supervised learning［C］//The International Conference on Machine Learning. ACM, 2009: 1233-1240. ［46］WANG F, ZHANG C. Label propagation through linear neighborhoods［C］//The International Conference on Machine Learning. ACM, 2006: 985-992. ［47］FERGUS R, WEISS Y, TORRALBA A. Semi-supervised learning in gigantic image collections［C］//Advances in Neural Information Processing Systems, 2009: 522-530. ［48］LIU W, HE J, CHANG S F. Large graph construction for scalable semi-supervised learning［C］//Proceedings of the 27th International Conference on Machine Learning, 2010: 679-686. ［49］JEBARA T, WANG J, CHANG S F. Graph construction and b-matching for semi-supervised learning［C］//Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 2009: 441-448. ［50］ZHOU D, HOFMANN T, SCHOLKOPF B. Semi-supervised learning on directed graphs［C］//Advances in Neural Information Processing Systems, 2005: 1633-1640. ［51］ARGYRIOU A, HERBSTER M, PONTIL M. Combining graph Laplacians for semi-supervised learning［C］//Advances in Neural Information Processing Systems, 2006: 67-74. ［52］YANG X, FU H, ZHA H, et al. Semi-supervised nonlinear dimensionality reduction［C］//The International Conference on Machine Learning. ACM, 2006: 1065-1072. ［53］KAPOOR A, AHN H, QI Y, et al. Hyperparameter and kernel learning for graph based semi-supervised classification［C］//Advances in Neural Information Processing Systems, 2006: 627-634. ［54］CHUNG F R K, GRAHAM F C. Spectral graph theory［M］. American Mathematical Soc., 1997. ［55］BELKIN M, NIYOGI P. Towards a theoretical foundation for Laplacian-based manifold methods［C］//Annual Conference on Learning Theory, Springer, 2005: 486-500. ［56］LAFON S S. Diffusion maps and geometric harmonics［D］. USA: Yale University, 2004. ［57］NIYOGI P. Manifold regularization and semi-supervised learning: Some theoretical analyses［J］. The Journal of Machine Learning Research, 2013, 14(1): 1229-1250. ［58］KIM K I, STEINKE F, HEIN M. Semi-supervised regression using Hessian energy with an application to semi-supervised dimensionality reduction［C］//Advances in Neural Information Processing Systems, 2009, 22: 979-987. ［59］YU K, ZHANG T, GONG Y. Nonlinear learning using local coordinate coding［C］//Advances in Neural Information Processing Systems, 2009: 2223-2231. ［60］GOLDBERG A B, ZHU X, SINGH A, et al. Multi-manifold semi-supervised learning［C］//Artificial Intelligence and Statistics, 2009: 169-176. ［61］KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks［C］//Advances In Neural Information Processing Systems, 2012: 1097-1105. ［62］COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch［J］. Journal of Machine Learning Research, 2011, 12: 2493-2537. ［63］HINTON G, DENG L, YU D, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups［J］. IEEE Signal Processing Magazine, 2012, 29(6): 82-97. ［64］WESTON J, RATLE F, MOBAHI H, et al. Deep learning via semi-supervised embedding［M］. Neural Networks: Tricks of the Trade. Springer, Berlin, Heidelberg, 2012: 639-655. ［65］KINGMA D P, MOHAMED S, REZENDE D J, et al. Semi-supervised learning with deep generative models［C］//Advances in Neural Information Processing Systems, 2014: 3581-3589. ［66］JOHNSON R, ZHANG T. Semi-supervised convolutional neural networks for text categorization via region embedding［C］//Advances In Neural Information Processing Systems, 2015: 919-927. ［67］JOHNSON R, ZHANG T. Supervised and semi-supervised text categorization using LSTM for region embeddings［C］//International Conference on Machine Learning, 2016: 526-534. ［68］RASMUS A, BERGLUND M, HONKALA M, et al. Semi-supervised learning with ladder networks［C］//Advances in Neural Information Processing Systems, 2015: 3546-3554. ［69］DAI A M, LE Q V. Semi-supervised sequence learning［C］//Advances in Neural Information Processing Systems, 2015: 3079-3087. ［70］HONG S, NOH H, HAN B. Decoupled deep neural network for semi-supervised semantic segmentation［C］//Advances in Neural Information Processing Systems, 2015: 1495-1503. ［71］KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks［C］//International Conference on Learning Representations, 2017. ［72］SAJJADI M, JAVANMARDI M, TASDIZEN T. Regularization with stochastic transformations and perturbations for deep semi-supervised learning［C］//Advances in Neural Information Processing Systems, 2016: 1163-1171. ［73］YANG Z, COHEN W W, SALAKHUTDINOV R. Revisiting semi-supervised learning with graph embeddings［C］//The International Conference on Machine Learning, 2016. ［74］LIANG J, JACOBS P, SUN J, et al. Semi-supervised embedding in attributed networks with outliers［C］//Proceedings of the 2018 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2018: 153-161. ［75］SAJJADI M, JAVANMARDI M, TASDIZEN T. Mutual exclusivity loss for semi-supervised deep learning［C］//2016 IEEE International Conference on Image Processing, 2016: 1908-1912. ［76］HOFFER E, AILON N. Semi-supervised deep learning by metric embedding［C］//International Conference on Learning Representations, 2017. ［77］THULASIDASAN S, BILMES J. Semi-supervised phone classification using deep neural networks and stochastic graph-based entropic regularization［J］. Machine Learning in Speech and Language Processing, 2016. ［78］RANZATO M A, SZUMMER M. Semi-supervised learning of compact document representations with deep networks［C］//Proceedings of the 25th International Conference on Machine Learning. ACM, 2008: 792-799. ［79］BAUR C, ALBARQOUNI S, NAVAB N. Semi-supervised deep learning for fully convolutional networks［C］//International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2017: 311-319. ［80］HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets［J］. Neural Computation, 2006, 18(7): 1527-1554. ［81］BENGIO Y, LAMBLIN P, POPOVICI D, et al. Greedy layer-wise training of deep networks［C］//Advances in neural information processing systems, 2007: 153-160. ［82］ERHAN D, BENGIO Y, COURVILLE A, et al. Why does unsupervised pre-training help deep learning?［J］. Journal of Machine Learning Research, 2010, 11: 625-660. ［83］GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets［C］//Advances in Neural Information Processing Systems, 2014: 2672-2680. ［84］DENTON E L, CHINTALA S, FERGUS R. Deep generative image models using a Laplacian pyramid of adversarial networks［C］//Advances In Neural Information Processing Systems, 2015: 1486-1494. ［85］SPRINGENBERG J T. Unsupervised and semi-supervised learning with categorical generative adversarial networks［C］//International Conference on Learning Representations, 2015. ［86］SUTSKEVER I, JOZEFOWICZ R, GREGOR K, et al. Towards principled unsupervised learning［J］. Computer Science, 2015, 45(1): 125-163. ［87］SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training gans［C］//Advances in Neural Information Processing Systems, 2016: 2234-2242. ［88］ODENA A. Semi-supervised learning with generative adversarial networks［C］//International Conference on Machine Learning, 2016. ［89］DAI Z, YANG Z, YANG F, et al. Good semi-supervised learning that requires a bad gan［C］//Advances in Neural Information Processing Systems, 2017: 6510-6520. ［90］TU E, YANG J, FANG J, et al. An experimental comparison of semi-supervised learning algorithms for multispectral image classification［J］. Photogrammetric Engineering & Remote Sensing, 2013, 79(4): 347-357. ［91］TU E, YANG J, KASABOV N, et al. Posterior distribution learning (pdl): A novel supervised learning framework using unlabeled samples to improve classification performance［J］. Neurocomputing, 2015, 157: 173-186. ［92］SUYKENS J A, DE BRABANTER J, LUKAS L, et al. Weighted least squares support vector machines: Robustness and sparse approximation［J］. Neurocomputing, 2002, 48(1-4): 85-105. ［93］TU E, ZHANG Y, ZHU L, et al. A graph-based semi-supervised k nearest-neighbor method for nonlinear manifold distributed data classification［J］. Information Sciences, 2016, 367: 673-688. ［94］GONG C, LIU T, TAO D, et al. Deformed graph Laplacian for semisupervised learning［J］. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(10): 2261-2274. ［95］GONG C, TAO D, FU K, et al. Fick’s law assisted propagation for semisupervised learning［J］. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(9): 2148-2162. ［96］GONG C, TAO D, LIU W, et al. Label propagation via teaching-to-learn and learning-to-teach［J］. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(6): 1452-1465. ［97］ZHU X, GOLDBERG A B. Introduction to semi-supervised learning［J］. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2009, 3(1): 1-130. ［98］BEN-DAVID S, LU T, PAL D. Does unlabeled data provably help? Worst-case analysis of the sample complexity of semi-supervised learning［C］//Annual Conference on Learning Theory, 2008: 33-44. ［99］WASSERMAN L, LAFFERTY J D. Statistical analysis of semi-supervised regression［C］//Advances in Neural Information Processing Systems, 2008: 801-808. ［100］NADLER B, SREBRO N, ZHOU X. Statistical analysis of semi-supervised learning: The limit of infinite unlabelled data［C］//Advances in Neural Information Processing Systems, 2009: 1330-1338.

[1]	YAN Mingxuan¹ (颜铭萱), MIAO Yutong^2，3 (苗雨桐), SHENG Shuqian¹ (盛淑茜), GAN Xiaoying¹ (甘小莺), HE Ben² (何奔), SHEN Lan^2，3* (沈兰). Ensemble Learning-Based Mortality Prediction After Acute Myocardial Infarction [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(1): 153-165.
[2]	LIU Yuesheng (刘月笙), HE Ning^∗(贺宁), HE Lile (贺利乐), ZHANG Yiwen (张译文), XI Kun (习坤), ZHANG Mengrui (张梦芮). Self-Tuning of MPC Controller for Mobile Robot Path Tracking Based on Machine Learning [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(6): 1028-1036.
[3]	SUN Qianyang, ZHOU Li, DING Shifeng, LIU Renwei, DING Yi. An Artificial Neural Network-Based Method for Prediction of Ice Resistance of Polar Ships [J]. Journal of Shanghai Jiao Tong University, 2024, 58(2): 156-165.
[4]	BAO Zhujie, LI Zhen, WANG Feiliang, PANG Bo, YANG Jian. Prediction of Slip and Torsion Performance of Right-Angle Fasteners Based on Machine Learning Methods [J]. Journal of Shanghai Jiao Tong University, 2024, 58(2): 242-252.
[5]	HE Wen, GAO Bin, WANG Qiangqiang, FENG Shaokong, YE Guanlin. A Comprehensive Geophysical Prospection Method Based on Gaussian Mixture Clustering and its Application in Karst Exploration [J]. Journal of Shanghai Jiao Tong University, 2024, 58(11): 1724-1734.
[6]	XU Changjie, LI Xinyu. Lateral Deformation Prediction of Deep Foundation Retaining Structures Based on Artificial Neural Network [J]. Journal of Shanghai Jiao Tong University, 2024, 58(11): 1735-1744.
[7]	LI Mingai1,2,3∗ (李明爱), XU Dongqin1 (许东芹). Transfer Learning in Motor Imagery Brain Computer Interface: A Review [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(1): 37-59.
[8]	YAO Leyul (姚乐宇)，HE Fan1,3 (何凡), PENG Haixia2* (彭海霞), WANG Xiaofeng2 (王晓峰)，ZHOU Lu2(周璐), HUANG Xiaolin1,3* (黄晓霖). Improving Colonoscopy Polyp Detection Rate Using Semi-Supervised Learning [J]. J Shanghai Jiaotong Univ Sci, 2023, 28(4): 441-.
[9]	SU Hongjia, LUO Yucheng, LIU Fei. Review of Equipment Effectiveness Evaluation and Supporting Technologies [J]. Air & Space Defense, 2023, 6(3): 29-38.
[10]	SUN Jie, LI Zihao, ZHANG Shuyu. Application of Machine Learning in Chemical Synthesis and Characterization [J]. Journal of Shanghai Jiao Tong University, 2023, 57(10): 1231-1244.
[11]	GAN Ezhong, LIU Yan, WANG Hairong, WANG Chengguang. Research on Comprehensive Tradeoff of Supportability Based on Machine Learning Performance Measurement Theory [J]. Air & Space Defense, 2023, 6(1): 38-44.
[12]	DU Jian1∗ (杜剑), ZHAO Xu2 (赵旭), GUO Liming2 (郭力铭), WANG Jun2 (王军). Machine Learning-Based Approach to Liner Shipping Schedule Design [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(3): 411-423.
[13]	LIANG Liang1 (梁良), SHI Ying1∗ (石英), MOU Junmin2∗ (牟军敏). Submarine Multi-Model Switching Control Under Full Working Condition Based on Machine Learning [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(3): 402-410.
[14]	JIA Dengqiang* (贾灯强), LUO Xinzhe (罗鑫喆), DING Wangbin (丁王斌),HUANG Liqin (黄立勤), ZHUANG Xiahai (庄吓海). SeRN: A Two-Stage Framework of Registration for Semi-Supervised Learning for Medical Images [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(2): 176-189.
[15]	JIA Dao, CHEN Lei, ZHU Zhipeng, YU Yao, CHI Dejian. Application of Machine Learning in Fuze- Warhead System Design [J]. Air & Space Defense, 2022, 5(2): 27-31.