上海交通大学学报(自然版) ›› 2018, Vol. 52 ›› Issue (10): 1382-1387.doi: 10.16183/j.cnki.jsjtu.2018.10.029

• 学报(中文) • 上一篇    下一篇

一种基于集成学习的入侵检测算法

黄金超1,马颖华1,齐开悦2,李怡晨1,夏元轶3   

  1. 1. 上海交通大学 网络空间安全学院, 上海 200240; 2. 上海交通大学 电子信息与电气工程学院, 上海 200240; 3. 国网江苏省电力有限公司, 南京 210024
  • 通讯作者: 齐开悦,男,讲师,E-mail: tommy-qi@sjtu.edu.cn.
  • 作者简介:黄金超(1992-),女,河北省保定市人,博士生,主要从事大数据研究.
  • 基金资助:
    中国国家电网公司(SGCC)科技项目(SGRIXTKJ[2017]133)

An Ensemble-Based Intrusion Detection Algorithm

HUANG Jinchao,MA Yinghua,QI Kaiyue,LI Yichen,XIA Yuanyi   

  1. 1. School of Cyber Security, Shanghai Jiao Tong University, Shanghai 200240, China; 2. School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong Univertsity, Shanghai 200240, China; 3. State Grid Jiangsu Electric Power Co., Ltd., Nanjing 210024, China

摘要: 作为机器学习领域的一个重点研究方向,集成学习相比于单分类器有着更高的检测精度,被广泛应用于异常入侵检测.但是,现有基于集成学习的入侵检测算法在对原问题进行划分过程中会存在一定的边缘信息与整体信息的丢失,且最终的模型融合也是一个耗时、复杂的调整参数过程.基于此,提出一种改进的基于集成学习的入侵检测算法,将原问题转化成多个二分类问题,并把多个分类器的概率预测结果作为先验知识加入到原本的特征中,再进行多分类模型的学习;借助于Facebook提出的梯度提升决策树(GBDT)和逻辑回归(LR)的融合模型对其中的二分类问题进行学习.通过在KDD CUP’99数据集的实验与分析,验证了所提算法的有效性.

关键词: 集成学习, 入侵检测, 信息丢失, 梯度提升决策树, 逻辑回归

Abstract: As a key research direction in the field of machine learning, ensemble learning is widely used in anomaly intrusion detection, and it can reach a higher detection precision than the single classifier. However, existing ensemble-based intrusion detection algorithms have some shortcomings, such as, the loss of edge information as well as the loss of whole information during the process of dividing original problem, time-consuming and complexity of the model fusion. So, this paper proposed a novel ensemble-based algorithm for intrusion detection. Firstly, the original problem is divided into a number of two classification problems, and the predicted probabilities are added into original features. Then the multi-class model is trained as the final result. In addition, we adopted GBDT (Gradient Boosting Decision Tree)+LR (Logistic Regression), proposed by Facebook, to implement the binary classification. Experiments and analysis on KDD CUP’99 dataset verify the effectiveness of our proposed framework.

Key words: ensemble learning, intrusion detection, loss of information, gradient boosting decision tree (GBDT), logistic regression (LR)

中图分类号: