上海交通大学学报(自然版) ›› 2018, Vol. 52 ›› Issue (10): 1280-1291.doi: 10.16183/j.cnki.jsjtu.2018.10.017
屠恩美,杨杰
通讯作者:
杨杰,男,教授,博士生导师,电话(Tel.): 021-34204033;E-mail:jieyang@sjtu.edu.cn.
作者简介:
屠恩美(1983-),男,安徽省霍邱县人,助理教授,主要从事半监督机器学习研究.
基金资助:
TU Enmei,YANG Jie
摘要: 半监督学习介于传统监督学习和无监督学习之间,是一种新型机器学习方法,其思想是在标记样本数量很少的情况下,通过在模型训练中引入无标记样本来避免传统监督学习在训练样本不足(学习不充分)时出现性能(或模型)退化的问题.半监督学习已在许多领域被成功应用.回顾了半监督学习的发展历程和主要理论,并介绍了半监督学习研究的最新进展,最后结合应用实例分析了半监督学习在解决实际问题中的重要作用.
中图分类号:
屠恩美,杨杰. 半监督学习理论及其研究进展概述[J]. 上海交通大学学报(自然版), 2018, 52(10): 1280-1291.
TU Enmei,YANG Jie. A Review of Semi-Supervised Learning Theories and Recent Advances[J]. Journal of Shanghai Jiaotong University, 2018, 52(10): 1280-1291.
[1]ZHU X. Semi-supervised learning literature survey[J]. Computer Science, University of Wisconsin-Madison, 2006, 2(3): 4. [2]CHAPELLE O, SCHOLKOPF B, ZIEN A. Semi-supervised learning[J]. IEEE Transactions on Neural Networks, 2009, 20(3): 542. [3]CASTELLI Vittorio, COVER Thomas M. The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter[J]. IEEE Transactions on Information Theory, 1996, 42(6): 2102-2117. [4]SHAHSHAHANI B M, LANDGREBE D A. The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon[J]. IEEE Transactions on Geoscience and Remote Sensing, 1994, 32(5): 1087-1095. [5]RATSABY J, VENKATESH S S. Learning from a mixture of labeled and unlabeled examples with parametric side information[C]//Proceedings of the Eighth Annual Conference on Computational Learning Theory. ACM, 1995: 412-417. [6]NIGAM K, MCCALLUM A, THRUN S, et al. Learning to classify text from labeled and unlabeled documents[C]//The Fifteenth National Conference on Artificial Intelligence, 1998: 792-799. [7]MCCALLUMZY A K, NIGAMY K. Employing EM and pool-based active learning for text classification[C]//Proc. International Conference on Machine Learning (ICML), 1998: 359-367. [8]DE SA V R. Learning classification with unlabeled data[C]//Advances in Neural Information Processing Systems, 1994: 112-119. [9]BENNETT K P, DEMIRIZ A. Semi-supervised support vector machines[C]//Advances in Neural Information Processing Systems, 1999: 368-374. [10]JOACHIMS T. SVM light: Support vector machine[J]. SVM-Light Support Vector Machine, 1999, 19(4): 1-12. [11]VAPNIK V. Statistical learning theory[M]. Wiley, New York, 1998. [12]JOACHIMS T. Transductive inference for text classification using support vector machines[J]. International Conference on Machine Learning, 1999, 200-209. [13]DE BIE T, CRISTIANINI N. Semi-supervised learning using semi-definite programming[J]. Semi-supervised learning. MIT Press, Cambridge-Massachusetts, 2006. [14]DE BIE T, CRISTIANINI N. Convex methods for transduction[C]//Advances in Neural Information Processing Systems, 2004: 73-80. [15]XU L, SCHUURMANS D. Unsupervised and semi-supervised multi-class support vector machines[C]//The Twentieth National Conference on Artificial Intelligence, 2005: 13-23. [16]XU Z, JIN R, ZHU J, et al. Efficient convex relaxation for transductive support vector machine[C]//Advances in Neural Information Processing Systems, 2008: 1641-1648. [17]FUNG G, MANGASARIAN O L. Semi-supervised support vector machines for unlabeled data classification[J]. Optimization methods and software, 2001, 15(1): 29-44. [18]COLLOBERT R, SINZ F, WESTON J, et al. Large scale transductive SVMs[J]. Journal of Machine Learning Research, 2006, 7: 1687-1712. [19]WANG J, SHEN X, PAN W. On transductive support vector machines[J]. Contemporary Mathematics, 2007, 443: 7-20. [20]CHAPELLE O, CHI M, ZIEN A. A continuation method for semi-supervised SVMs[C]//Proceedings of the 23rd international conference on Machine learning. ACM, 2006: 185-192. [21]CHAPELLE O, ZIEN A. Semi-supervised classification by low density separation[C]//Tenth International Workshop on Artificial Intelligence and Statistics, 2005: 57-64. [22]SINDHWANI V, KEERTHI S S. Large scale semi-supervised linear SVMs[C]//Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2006: 477-484. [23]BLUM A, MITCHELL T. Combining labeled and unlabeled data with co-training[C]//Proceedings of the Eleventh Annual Conference on Computational Learning Theory. ACM, 1998: 92-100. [24]NIGAM K, GHANI R. Analyzing the effectiveness and applicability of co-training[C]//Proceedings of the Ninth International Conference on Information and Knowledge Management. ACM, 2000: 86-93. [25]BALCAN M, BLUM A, YANG K. Co-training and expansion: Towards bridging theory and practice[C]//Advances in Neural Information Processing Systems, 2005: 89-96. [26]SARKAR A. Applying co-training methods to statistical parsing[C]//Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, Association for Computational Linguistics, 2001: 1-8. [27]CLARK S, CURRAN J R, OSBORNE M. Bootstrapping POS taggers using unlabelled data[C]//Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL, Association for Computational Linguistics, 2003: 49-55. [28]NG V, CARDIE C. Weakly supervised natural language learning without redundant views[C]//Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2003. [29]MIHALCEA R. Co-training and self-training for word sense disambiguation[C]//Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL, 2004. [30]WAN X. Co-training for cross-lingual sentiment classification[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Association for Computational Linguistics, 2009: 235-243. [31]NIGAM K, MCCALLUM A K, THRUN S, et al. Text classification from labeled and unlabeled documents using EM[J]. Machine Learning, 2000, 39(2): 103-134. [32]COZMAN F G, COHEN I, CIRELO M C. Semi-supervised learning of mixture models[C]//Proceedings of the 20th International Conference on Machine Learning, 2003: 99-106. [33]LU Z, LEEN T K. Semi-supervised learning with penalized probabilistic clustering[C]//Advances in Neural Information Processing Systems, 2005: 849-856. [34]FUJINO A, UEDA N, SAITO K. A hybrid generative/discriminative approach to semi-supervised classifier design[C]//Twentieth National Conference on Artificial Intelligence, 2005: 764-769. [35]ROSENBERG C, HEBERT M, SCHNEIDERMAN H. Semi-supervised self-training of object detection models[C]//Proceedings of the Seventh IEEE Workshops on Application of Computer Vision, 2005. [36]CULP M, MICHAILIDIS G, An iterative algorithm for extending learners to a semi-supervised setting [J]. Journal of Computational and Graphical Statistics, 2008, 17(3): 545-571. [37]LEE D H. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks[C]//Workshop on Challenges in Representation Learning, 2013: 2-10. [38]BLUM A, CHAWLA S. Learning from labeled and unlabeled data using graph mincuts[J]. Proceedings of the Eighteenth International Conference on Machine Learning, 2001. [39]ZHU X, GHAHRAMANI Z, LAFFERTY J D. Semi-supervised learning using Gaussian fields and harmonic functions[C]//The International Conference on Machine Learning, 2003: 912-919. [40]ZHOU D, BOUSQUET O, LAL T N, et al. Learning with local and global consistency[C]//Advances in Neural Information Processing Systems, 2004: 321-328. [41]SINDHWANI V, NIYOGI P, BELKIN M. Beyond the point cloud: From transductive to semi-supervised learning[C]//Proceedings of the 22nd International Conference on Machine Learning. ACM, 2005: 824-831. [42]BELKIN M, NIYOGI P, SINDHWANI V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples[J]. Journal of Machine Learning Research, 2006, 7: 2399-2434. [43]BELKIN M, NIYOGI P. Semi-supervised learning on Riemannian manifolds[J]. Machine Learning, 2004, 56(1): 209-239. [44]ZHU X, GHAHRAMANI Z. Learning from labeled and unlabeled data with label propagation[J]. Tech. Rep., Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002. [45]ZHANG K, KWOK J T, PARVIN B. Prototype vector machine for large scale semi-supervised learning[C]//The International Conference on Machine Learning. ACM, 2009: 1233-1240. [46]WANG F, ZHANG C. Label propagation through linear neighborhoods[C]//The International Conference on Machine Learning. ACM, 2006: 985-992. [47]FERGUS R, WEISS Y, TORRALBA A. Semi-supervised learning in gigantic image collections[C]//Advances in Neural Information Processing Systems, 2009: 522-530. [48]LIU W, HE J, CHANG S F. Large graph construction for scalable semi-supervised learning[C]//Proceedings of the 27th International Conference on Machine Learning, 2010: 679-686. [49]JEBARA T, WANG J, CHANG S F. Graph construction and b-matching for semi-supervised learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 2009: 441-448. [50]ZHOU D, HOFMANN T, SCHOLKOPF B. Semi-supervised learning on directed graphs[C]//Advances in Neural Information Processing Systems, 2005: 1633-1640. [51]ARGYRIOU A, HERBSTER M, PONTIL M. Combining graph Laplacians for semi-supervised learning[C]//Advances in Neural Information Processing Systems, 2006: 67-74. [52]YANG X, FU H, ZHA H, et al. Semi-supervised nonlinear dimensionality reduction[C]//The International Conference on Machine Learning. ACM, 2006: 1065-1072. [53]KAPOOR A, AHN H, QI Y, et al. Hyperparameter and kernel learning for graph based semi-supervised classification[C]//Advances in Neural Information Processing Systems, 2006: 627-634. [54]CHUNG F R K, GRAHAM F C. Spectral graph theory[M]. American Mathematical Soc., 1997. [55]BELKIN M, NIYOGI P. Towards a theoretical foundation for Laplacian-based manifold methods[C]//Annual Conference on Learning Theory, Springer, 2005: 486-500. [56]LAFON S S. Diffusion maps and geometric harmonics[D]. USA: Yale University, 2004. [57]NIYOGI P. Manifold regularization and semi-supervised learning: Some theoretical analyses[J]. The Journal of Machine Learning Research, 2013, 14(1): 1229-1250. [58]KIM K I, STEINKE F, HEIN M. Semi-supervised regression using Hessian energy with an application to semi-supervised dimensionality reduction[C]//Advances in Neural Information Processing Systems, 2009, 22: 979-987. [59]YU K, ZHANG T, GONG Y. Nonlinear learning using local coordinate coding[C]//Advances in Neural Information Processing Systems, 2009: 2223-2231. [60]GOLDBERG A B, ZHU X, SINGH A, et al. Multi-manifold semi-supervised learning[C]//Artificial Intelligence and Statistics, 2009: 169-176. [61]KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]//Advances In Neural Information Processing Systems, 2012: 1097-1105. [62]COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12: 2493-2537. [63]HINTON G, DENG L, YU D, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups[J]. IEEE Signal Processing Magazine, 2012, 29(6): 82-97. [64]WESTON J, RATLE F, MOBAHI H, et al. Deep learning via semi-supervised embedding[M]. Neural Networks: Tricks of the Trade. Springer, Berlin, Heidelberg, 2012: 639-655. [65]KINGMA D P, MOHAMED S, REZENDE D J, et al. Semi-supervised learning with deep generative models[C]//Advances in Neural Information Processing Systems, 2014: 3581-3589. [66]JOHNSON R, ZHANG T. Semi-supervised convolutional neural networks for text categorization via region embedding[C]//Advances In Neural Information Processing Systems, 2015: 919-927. [67]JOHNSON R, ZHANG T. Supervised and semi-supervised text categorization using LSTM for region embeddings[C]//International Conference on Machine Learning, 2016: 526-534. [68]RASMUS A, BERGLUND M, HONKALA M, et al. Semi-supervised learning with ladder networks[C]//Advances in Neural Information Processing Systems, 2015: 3546-3554. [69]DAI A M, LE Q V. Semi-supervised sequence learning[C]//Advances in Neural Information Processing Systems, 2015: 3079-3087. [70]HONG S, NOH H, HAN B. Decoupled deep neural network for semi-supervised semantic segmentation[C]//Advances in Neural Information Processing Systems, 2015: 1495-1503. [71]KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[C]//International Conference on Learning Representations, 2017. [72]SAJJADI M, JAVANMARDI M, TASDIZEN T. Regularization with stochastic transformations and perturbations for deep semi-supervised learning[C]//Advances in Neural Information Processing Systems, 2016: 1163-1171. [73]YANG Z, COHEN W W, SALAKHUTDINOV R. Revisiting semi-supervised learning with graph embeddings[C]//The International Conference on Machine Learning, 2016. [74]LIANG J, JACOBS P, SUN J, et al. Semi-supervised embedding in attributed networks with outliers[C]//Proceedings of the 2018 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2018: 153-161. [75]SAJJADI M, JAVANMARDI M, TASDIZEN T. Mutual exclusivity loss for semi-supervised deep learning[C]//2016 IEEE International Conference on Image Processing, 2016: 1908-1912. [76]HOFFER E, AILON N. Semi-supervised deep learning by metric embedding[C]//International Conference on Learning Representations, 2017. [77]THULASIDASAN S, BILMES J. Semi-supervised phone classification using deep neural networks and stochastic graph-based entropic regularization[J]. Machine Learning in Speech and Language Processing, 2016. [78]RANZATO M A, SZUMMER M. Semi-supervised learning of compact document representations with deep networks[C]//Proceedings of the 25th International Conference on Machine Learning. ACM, 2008: 792-799. [79]BAUR C, ALBARQOUNI S, NAVAB N. Semi-supervised deep learning for fully convolutional networks[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2017: 311-319. [80]HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7): 1527-1554. [81]BENGIO Y, LAMBLIN P, POPOVICI D, et al. Greedy layer-wise training of deep networks[C]//Advances in neural information processing systems, 2007: 153-160. [82]ERHAN D, BENGIO Y, COURVILLE A, et al. Why does unsupervised pre-training help deep learning?[J]. Journal of Machine Learning Research, 2010, 11: 625-660. [83]GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Advances in Neural Information Processing Systems, 2014: 2672-2680. [84]DENTON E L, CHINTALA S, FERGUS R. Deep generative image models using a Laplacian pyramid of adversarial networks[C]//Advances In Neural Information Processing Systems, 2015: 1486-1494. [85]SPRINGENBERG J T. Unsupervised and semi-supervised learning with categorical generative adversarial networks[C]//International Conference on Learning Representations, 2015. [86]SUTSKEVER I, JOZEFOWICZ R, GREGOR K, et al. Towards principled unsupervised learning[J]. Computer Science, 2015, 45(1): 125-163. [87]SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training gans[C]//Advances in Neural Information Processing Systems, 2016: 2234-2242. [88]ODENA A. Semi-supervised learning with generative adversarial networks[C]//International Conference on Machine Learning, 2016. [89]DAI Z, YANG Z, YANG F, et al. Good semi-supervised learning that requires a bad gan[C]//Advances in Neural Information Processing Systems, 2017: 6510-6520. [90]TU E, YANG J, FANG J, et al. An experimental comparison of semi-supervised learning algorithms for multispectral image classification[J]. Photogrammetric Engineering & Remote Sensing, 2013, 79(4): 347-357. [91]TU E, YANG J, KASABOV N, et al. Posterior distribution learning (pdl): A novel supervised learning framework using unlabeled samples to improve classification performance[J]. Neurocomputing, 2015, 157: 173-186. [92]SUYKENS J A, DE BRABANTER J, LUKAS L, et al. Weighted least squares support vector machines: Robustness and sparse approximation[J]. Neurocomputing, 2002, 48(1-4): 85-105. [93]TU E, ZHANG Y, ZHU L, et al. A graph-based semi-supervised k nearest-neighbor method for nonlinear manifold distributed data classification[J]. Information Sciences, 2016, 367: 673-688. [94]GONG C, LIU T, TAO D, et al. Deformed graph Laplacian for semisupervised learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(10): 2261-2274. [95]GONG C, TAO D, FU K, et al. Fick’s law assisted propagation for semisupervised learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(9): 2148-2162. [96]GONG C, TAO D, LIU W, et al. Label propagation via teaching-to-learn and learning-to-teach[J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(6): 1452-1465. [97]ZHU X, GOLDBERG A B. Introduction to semi-supervised learning[J]. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2009, 3(1): 1-130. [98]BEN-DAVID S, LU T, PAL D. Does unlabeled data provably help? Worst-case analysis of the sample complexity of semi-supervised learning[C]//Annual Conference on Learning Theory, 2008: 33-44. [99]WASSERMAN L, LAFFERTY J D. Statistical analysis of semi-supervised regression[C]//Advances in Neural Information Processing Systems, 2008: 801-808. [100]NADLER B, SREBRO N, ZHOU X. Statistical analysis of semi-supervised learning: The limit of infinite unlabelled data[C]//Advances in Neural Information Processing Systems, 2009: 1330-1338. |
[1] | 贾岛, 陈磊, 朱志鹏, 余曜, 迟德建. 机器学习在引战系统设计中的应用研究[J]. 空天防御, 2022, 5(2): 27-31. |
[2] | 王卓鑫, 赵海涛, 谢月涵, 任翰韬, 袁明清, 张博明, 陈吉安. 反向传播神经网络联合遗传算法对复合材料模量的预测[J]. 上海交通大学学报, 2022, 56(10): 1341-1348. |
[3] | 李川, 聂熠文, 刘军伟, 孟凡钦, 沈晓静. 基于机器学习的多算法融合航迹稳健起始方法[J]. 空天防御, 2022, 5(1): 20-24. |
[4] | 周毅, 秦康平, 孙近文, 范栋琦, 郑义明. 台风气象环境电网设备风险量化预警及其N-m故障处置预案在线生成方法[J]. 上海交通大学学报, 2021, 55(S2): 22-30. |
[5] | 何夏维, 蔡云泽, 严玲玲. 一种合成残差式的反作用轮故障检测方法[J]. 上海交通大学学报, 2021, 55(6): 716-728. |
[6] | 祝颂, 钱晓超, 陆营波, 刘飞. 基于XGBoost的装备体系效能预测方法[J]. 空天防御, 2021, 4(2): 1-. |
[7] | 包清临, 柴华奇, 赵嵩正, 王吉林. 采用机器学习算法的技术机会挖掘模型及应用[J]. 上海交通大学学报, 2020, 54(7): 705-717. |
[8] | 化存卿. 物联网安全检测与防护机制综述[J]. 上海交通大学学报(自然版), 2018, 52(10): 1307-1313. |
[9] | 刘凯a,张立民b,周立军a. 随机受限玻尔兹曼机组设计[J]. 上海交通大学学报(自然版), 2017, 51(10): 1235-1240. |
[10] | 王萍,王迪,冯伟. 基于流形正则化的在线半监督极限学习机[J]. 上海交通大学学报(自然版), 2015, 49(08): 1153-1158. |
[11] | 王进,张军,胡白帆. 结合最优类别信息离散的细粒度超网络微阵列数据分类[J]. 上海交通大学学报(自然版), 2013, 47(12): 1856-1862. |
[12] | 刘书,王慈. 基于自适应非局部均值滤波的图像去块算法[J]. 上海交通大学学报(自然版), 2013, 47(12): 1930-1933. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||