一种基于集成学习的入侵检测算法

HUANG Jinchao; MA Yinghua; QI Kaiyue; LI Yichen; XIA Yuanyi

doi:10.16183/j.cnki.jsjtu.2018.10.029

上海交通大学学报 >

2018 , Vol. 52 >Issue 10: 1382 - 1387

DOI: https://doi.org/10.16183/j.cnki.jsjtu.2018.10.029

学报（中文）

一种基于集成学习的入侵检测算法

HUANG Jinchao ,
MA Yinghua ,
QI Kaiyue ,
LI Yichen ,
XIA Yuanyi

展开

1. 上海交通大学网络空间安全学院，上海 200240； 2. 上海交通大学电子信息与电气工程学院，上海 200240； 3. 国网江苏省电力有限公司，南京 210024

黄金超(1992-)，女，河北省保定市人，博士生，主要从事大数据研究.

网络出版日期: 2025-07-02

基金资助

中国国家电网公司(SGCC)科技项目(SGRIXTKJ［2017］133)

收起

An Ensemble-Based Intrusion Detection Algorithm

黄金超1，马颖华1，齐开悦2，李怡晨1 ,
夏元轶3

Expand

1. School of Cyber Security, Shanghai Jiao Tong University, Shanghai 200240, China; 2. School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong Univertsity, Shanghai 200240, China; 3. State Grid Jiangsu Electric Power Co., Ltd., Nanjing 210024, China

Online published: 2025-07-02

Fold

摘要

作为机器学习领域的一个重点研究方向，集成学习相比于单分类器有着更高的检测精度，被广泛应用于异常入侵检测.但是，现有基于集成学习的入侵检测算法在对原问题进行划分过程中会存在一定的边缘信息与整体信息的丢失，且最终的模型融合也是一个耗时、复杂的调整参数过程.基于此，提出一种改进的基于集成学习的入侵检测算法，将原问题转化成多个二分类问题，并把多个分类器的概率预测结果作为先验知识加入到原本的特征中，再进行多分类模型的学习；借助于Facebook提出的梯度提升决策树(GBDT)和逻辑回归(LR)的融合模型对其中的二分类问题进行学习.通过在KDD CUP’99数据集的实验与分析，验证了所提算法的有效性.

关键词： 集成学习; 入侵检测; 信息丢失; 梯度提升决策树; 逻辑回归

本文引用格式

HUANG Jinchao , MA Yinghua , QI Kaiyue , LI Yichen , XIA Yuanyi . 一种基于集成学习的入侵检测算法[J]. 上海交通大学学报, 2018 , 52(10) : 1382 -1387 . DOI: 10.16183/j.cnki.jsjtu.2018.10.029

Abstract

As a key research direction in the field of machine learning, ensemble learning is widely used in anomaly intrusion detection, and it can reach a higher detection precision than the single classifier. However, existing ensemble-based intrusion detection algorithms have some shortcomings, such as, the loss of edge information as well as the loss of whole information during the process of dividing original problem, time-consuming and complexity of the model fusion. So, this paper proposed a novel ensemble-based algorithm for intrusion detection. Firstly, the original problem is divided into a number of two classification problems, and the predicted probabilities are added into original features. Then the multi-class model is trained as the final result. In addition, we adopted GBDT (Gradient Boosting Decision Tree)+LR (Logistic Regression), proposed by Facebook, to implement the binary classification. Experiments and analysis on KDD CUP’99 dataset verify the effectiveness of our proposed framework.

Key words： ensemble learning; intrusion detection; loss of information; gradient boosting decision tree (GBDT); logistic regression (LR)

参考文献

［1］DENNING D E. An intrusion-detection model［J］. IEEE Transactions on Software Engineering, 1987, 13(2): 222-232. ［2］MOUSTAFA N, SLAY J. The significant features of the UNSW-NB15 and the KDD99 data sets for network intrusion detection systems［C］//International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security. Kyoto, Japan: IEEE, 2015: 25-31. ［3］范晓诗, 雷英杰, 王亚男, 等. 流量异常检测中的直觉模糊推理方法［J］. 电子与信息学报, 2015, 37(9): 2218-2224. FAN Xiaoshi, LEI Yingjie, WANG Yanan, et al. Intuitionistic fuzzy reasoning method in traffic anomaly detection［J］. Journal of Electronics & Information Technology, 2015, 37(9): 2218-2224. ［4］梁本来, 杨忠明, 蔡昭权. 一种混合入侵检测模型［J］. 计算机测量与控制, 2017, 25(4): 225-228. LIANG Benlai, YANG Zhongming, CAI Zhaoquan. One mixed intrusion detection model［J］. Computer Measurement & Control, 2017, 25(4): 225-228. ［5］MEHMOOD T, RAIS H B M. SVM for network anomaly detection using ACO feature subset［C］//International Symposium on Mathematical Sciences and Computing Research. Kuah, Kedah, Malaysia: IEEE, 2015: 121-126. ［6］JAIN R, ABOUZAKHAR N S. Hidden Markov model based anomaly intrusion detection［C］//Internet Technology And Secured Transactions. London: IEEE, 2012: 528-533. ［7］KUMAR S, YADAV A. Increasing performance of intrusion detection system using neural network［C］//International Conference on Advanced Communication Control and Computing Technologies. Ramanathapuram: IEEE, 2015: 546-550. ［8］MUKKAMALA S, SUNG A H, ABRAHAM A. Intrusion detection using ensemble of soft computing paradigms［M］. Berlin: Springer Berlin Heidelberg, 2003: 239-248. ［9］CHEBROLU S, ABRAHAM A, THOMAS J P. Feature deduction and ensemble design of intrusion detection systems［J］. Computers & Security, 2005, 24(4): 295-307. ［10］HE X, PAN J, JIN O, et al. Practical lessons from predicting clicks on ads at facebook［C］//Proceedings of the Eighth International Workshop on Data Mining for Online Advertising. ［s.l.］: ACM, 2014: 1-9. ［11］University of California. KDD cup 1999 data ［DB/OL］. ［2018-04-20］. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, 2007. ［12］张红梅, 高海华, 王行愚. 基于SVM的多类分类集成［J］. 华东理工大学学报(自然科学版), 2008, 34(5): 734-739. ZHANG Hongmei, GAO Haihua, WANG Xingyu. SVM based multi-class classification ensenble［J］. Journal of East China University of Science and Technology (Natural Science Edition), 2008, 34(5): 734-739. ［13］刘衍珩, 田大新, 余雪岗, 等. 基于分布式学习的大规模网络入侵检测算法［J］. 软件学报, 2008, 19(4): 000993. LIU Yanheng, TIAN Daxin, YU Xuegang, et al. Large-scale network intrusion detection algorithm based on distributed learning［J］. Journal of Software, 2008, 19(4): 000993. ［14］LIN L, ZUO R, YANG S, et al. SVM ensemble for anomaly detection based on rotation forest［C］//Third International Conference on Intelligent Control and Information Processing. Dalian: IEEE, 2012: 150-153.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献