基于增量加权的概念漂移数据流分类算法

doi:10.16183/j.cnki.jsjtu.2024.198

上海交通大学学报 ›› 2026, Vol. 60 ›› Issue (1): 112-122.doi: 10.16183/j.cnki.jsjtu.2024.198

基于增量加权的概念漂移数据流分类算法

吴勇华¹, 梅颖²^,³, 卢诚波²^,³()

¹ 浙江理工大学计算机科学与技术学院, 杭州 310018
² 丽水学院数学与计算机学院, 浙江丽水 323000
³ 浙江得图网络有限公司, 浙江丽水 323000

收稿日期:2024-05-29 修回日期:2024-08-26 接受日期:2024-09-04 出版日期:2026-01-28 发布日期:2026-01-27
通讯作者: 卢诚波 E-mail:lu.chengbo@aliyun.com.
作者简介:吴勇华(1998—),硕士生,从事数据挖掘研究.
基金资助:
国家自然科学基金(12171217)

Concept Drift Data Stream Classification Algorithm Based on Incremental Weighting

WU Yonghua¹, MEI Ying²^,³, LU Chengbo²^,³()

¹ School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
² School of Mathematics and Computer, Lishui University, Lishui 323000, Zhejiang, China
³ Zhejiang Detu Network Co., Ltd., Lishui 323000, Zhejiang, China

Received:2024-05-29 Revised:2024-08-26 Accepted:2024-09-04 Online:2026-01-28 Published:2026-01-27
Contact: LU Chengbo E-mail:lu.chengbo@aliyun.com.

1. 2024-198-附录.pdf.pdf(212KB)

摘要/Abstract

摘要：

概念漂移是数据流挖掘中最常见的现象之一,数据流中隐含的知识模式随时间动态变化,导致先前建立的分类器的准确性下降.针对这一问题,提出基于增量加权的概念漂移数据流分类(SCIW)算法.该算法采用启发式的权重更新策略,结合基于准确性差异的自适应方法,同时改进了基于泊松分布的重采样策略.SCIW算法能够适应不同类型的概念漂移,有效缓解了分类器准确率下降的问题.在14个合成数据集和6个真实数据集上的实验结果表明,SCIW算法和自适应随机森林(ARF)算法在准确率方面表现出色,明显优于其他对比算法;SCIW算法在时间和内存消耗方面明显优于ARF算法,总体平均时间消耗约为ARF的83%,总体平均内存消耗约为ARF算法的13%.

关键词: 数据流, 概念漂移, 分类算法, 集成学习, 自适应

Abstract:

Concept drift is one of the most common phenomena in data stream mining, where the underlying knowledge patterns in the data stream change dynamically over time, leading to a decline in the accuracy of previously established classifiers. To address this issue, we propose a concept drift data stream classification algorithm based on incremental weighting abbreviated as SCIW algorihtm. This algorithm employs a heuristic weight updating strategy combined with an adaptive method based on accuracy differences, and improves the Poisson distribution-based resampling strategy. The SCIW is capable of adapting to various concept drifts, effectively mitigating the decline in classifier accuracy. Experimental results on 14 synthetic datasets and 6 real-world datasets demonstrate that SCIW and adaptive random forests (ARF) outperform other algorithms in terms of accuracy. Additionally, SCIW significantly excels ARF in terms of time and memory consumption, with the overall average time consumption being approximately 83% of that of ARF and the overall average memory consumption being approximately 13% of that of ARF algorithm.

Key words: data stream, concept drift, classification algorithm, ensemble learning, adaptive

中图分类号:

TP181
TP183

吴勇华, 梅颖, 卢诚波. 基于增量加权的概念漂移数据流分类算法[J]. 上海交通大学学报, 2026, 60(1): 112-122.

WU Yonghua, MEI Ying, LU Chengbo. Concept Drift Data Stream Classification Algorithm Based on Incremental Weighting[J]. Journal of Shanghai Jiao Tong University, 2026, 60(1): 112-122.

图/表 8

图1

表1

图2

图3

图4

图5

图6

图7

参考文献 29

[1]	WARES S, ISAACS J, ELYAN E. Data stream mining: Methods and challenges for handling concept drift[J]. SN Applied Sciences, 2019, 1(11): 1412. doi: 10.1007/s42452-019-1433-0
[2]	SOUZA V M A, DOS REIS D M, MALETZKE A G, et al. Challenges in benchmarking stream learning algorithms with real-world data[J]. Data Mining and Knowledge Discovery, 2020, 34(6): 1805-1858. doi: 10.1007/s10618-020-00698-5
[3]	ALDISSI B, AMMAR H. Real-time frequency-based detection of a panic behavior in human crowds[J]. Multimedia Tools and Applications, 2020, 79(33): 24851-24871. doi: 10.1007/s11042-020-09024-z
[4]	HU Y. Design and implementation of abnormal behavior detection based on deep intelligent analysis algorithms in massive video surveillance[J]. Journal of Grid Computing, 2020, 18(2): 227-237. doi: 10.1007/s10723-020-09506-2
[5]	ZHANG P, ZHENG J, LIN H L, et al. Vehicle trajectory data mining for artificial intelligence and real-time traffic information extraction[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(11): 13088-13098. doi: 10.1109/TITS.2022.3178182 URL
[6]	MAJUMDAR S, SUBHANI M M, ROULLIER B, et al. Congestion prediction for smart sustainable ci-ties using IoT and machine learning approaches[J]. Sustainable Cities and Society, 2021, 64: 102500. doi: 10.1016/j.scs.2020.102500 URL
[7]	LI W D, YANG X, LIU W Q, et al. DDG-DA: Data distribution generation for predictable concept drift adaptation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(4): 4092-4100. doi: 10.1609/aaai.v36i4.20327 URL
[8]	KIM M, HWANG S H, WHANG S E. Quilt: Robust data segment selection against concept drifts[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(19): 21249-21257. doi: 10.1609/aaai.v38i19.30119 URL
[9]	SUÁREZ-CETRULO A L, QUINTANA D, CERVANTES A. A survey on machine learning for recurring concept drifting data streams[J]. Expert Systems with Applications, 2023, 213: 118934. doi: 10.1016/j.eswa.2022.118934 URL
[10]	AGRAHARI S, SINGH A K. Concept drift detection in data stream mining: A literature review[J]. Journal of King Saud University-Computer and Information Sciences, 2022, 34(10): 9523-9540. doi: 10.1016/j.jksuci.2021.11.006 URL
[11]	赵鹏, 周志华. 基于决策树模型重用的分布变化流数据学习[J]. 中国科学(信息科学), 2021, 51(1): 1-12.
	ZHAO Peng, ZHOU Zhihua. Learning from distribution-changing data streams via decision tree model reuse[J]. Scientia Sinica (Informationis), 2021, 51(1): 1-12.
[12]	郭虎升, 张爱娟, 王文剑. 基于在线性能测试的概念漂移检测方法[J]. 软件学报, 2020, 31(4): 932-947.
	GUO Husheng, ZHANG Aijuan, WANG Wenjian. Concept drift detection method based on online performance test[J]. Journal of Software, 2020, 31(4): 932-947.
[13]	ZHAO P, WANG X Q, XIE S Y, et al. Distribution-free one-pass learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(3): 951-963.
[14]	MANICKASWAMY T, BHUVANESWARI A. Concept drift in data stream classification using ensemble methods: Types, methods and challenges[J]. INFOCOMP Journal of Computer Science, 2020, 19: 163-174.
[15]	WANG H X, FAN W, YU P S, et al. Mining concept-drifting data streams using ensemble classifiers[C]// Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington D. C., USA: ACM, 2003: 226-235.
[16]	KOZAL J, GUZY F, WOŹNIAK M. Employing chunk size adaptation to overcome concept drift[J]Journal of Universal Computer Science, 2022, 28(3): 249-267.
[17]	CANO A, KRAWCZYK B. Kappa Updated Ensemble for drifting data stream mining[J]. Machine Learning, 2020, 109(1): 175-218. doi: 10.1007/s10994-019-05840-z
[18]	郭虎升, 丛璐, 高淑花, 等. 基于在线集成的概念漂移自适应分类方法[J]. 计算机研究与发展, 2023, 60(7): 1592-1602.
	GUO Husheng, CONG Lu, GAO Shuhua, et al. Adaptive classification method for concept drift based on online ensemble[J]. Journal of Computer Research and Development, 2023, 60(7): 1592-1602.
[19]	GOEL K, BATRA S. Two-level pruning based ensemble with abstained learners for concept drift in data streams[J]. Expert Systems, 2021, 38(3): e12661. doi: 10.1111/exsy.v38.3 URL
[20]	BACH S H, MALOOF M A. Paired learners for concept drift[C]// 2008 Eighth IEEE International Conference on Data Mining. Pisa,Italy: IEEE, 2008: 23-32.
[21]	DHALIWAL P, KUMAR A, CHAUDHARY P. An approach for concept drifting streams: Early dynamic weighted majority[J]. Procedia Computer Science, 2020, 167: 2653-2661. doi: 10.1016/j.procs.2020.03.344 URL
[22]	IDREES M M, MINKU L L, STAHL F, et al. A heterogeneous online learning ensemble for non-stationary environments[J]. Knowledge-Based Systems, 2020, 188: 104983. doi: 10.1016/j.knosys.2019.104983 URL
[23]	KOLTER J Z, MALOOF M A. Dynamic weighted majority: An ensemble method for drifting concepts[J]. Journal of Machine Learning Research, 2007, 8: 2755-2790.
[24]	GOMES H M, READ J, BIFET A. Streaming random patches for evolving data stream classification[C]// 2019 IEEE International Conference on Data Mining. Beijing, China: IEEE, 2019: 240-249.
[25]	GOMES H M, BIFET A, READ J, et al. Adaptive random forests for evolving data stream classification[J]. Machine Learning, 2017, 106(9): 1469-1495. doi: 10.1007/s10994-017-5642-8
[26]	FEITOSA NETO A, CANUTO A M P. EOCD: An ensemble optimization approach for concept drift applications[J]. Information Sciences, 2021, 561: 81-100. doi: 10.1016/j.ins.2021.01.051 URL
[27]	OZA N C, RUSSELL S J. Online bagging and boosting[C]// Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics. Hawaii, USA: PMLR, 2001: 229-236.
[28]	BAHRI M, BIFET A, GAMA J, et al. Data stream analysis: Foundations, major tasks and tools[J]. WIREs Data Mining and Knowledge Discovery, 2021, 11(3): e1405. doi: 10.1002/widm.v11.3 URL
[29]	周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.
	ZHOU Zhihua. Machine learning[M]. Beijing: Tsinghua University Press, 2016.

数据集	实例数	特征数	漂移类型	类别数	漂移次数	数据集
LED_A	100000	24	A	10	3	AGR_G	100000	9	G	2	3
LED_G	100000	24	G	10	3	AGR_R	100000	9	R	2	4
HYPER_F	100000	10	I & F	2	1	RTG_A	100000	60	A	2	1
HYPER_S	100000	10	I & S	2	1	RTG_N	100000	60	N	2	0
RBF_F	100000	10	I & F	4	1	USENET1	1500	99	U	2	U
RBF_S	100000	10	I & S	4	1	USENET2	1500	99	U	2	U
SEA_A	100000	3	A	2	3	electricity	45312	6	U	2	U
SINE_A	100000	4	A	2	5	Power	29928	2	U	24	U
SINE_G	100000	4	G	2	5	Weather	18159	8	U	2	U
SINE_R	100000	4	R	2	5	GMSC	100000	11	U	2	U
AGR_A	100000	9	A	2	3

基于增量加权的概念漂移数据流分类算法

Concept Drift Data Stream Classification Algorithm Based on Incremental Weighting

RichHTML

PDF (PC)

补充材料

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 29

相关文章 15

编辑推荐

Metrics

本文评价

[1]	. 考虑碳排放的船舶舾装托盘配送路径优化方法研究[J]. J Shanghai Jiaotong Univ Sci, 2026, 31(2): 440-457.
[2]	樊星, 葛菲, 贾文文, 肖方伟. 基于多属性自适应聚合网络架构的车辆重识别[J]. 上海交通大学学报, 2026, 60(1): 123-132.
[3]	. 类间隙滞后非线性系统复合双通道干扰估计自适应控制器设计[J]. J Shanghai Jiaotong Univ Sci, 2026, 31(1): 106-116.
[4]	. 基于输入映射及事件触发自适应策略的刚柔混合机械臂模型预测控制[J]. J Shanghai Jiaotong Univ Sci, 2026, 31(1): 36-47.
[5]	宋梓豪, 魏汉迪, 肖龙飞, 等. 复杂扰动下水面拖曳体直线航迹跟踪控制[J]. 海洋工程装备与技术, 2026, 13(1): 34-45.
[6]	刘琦, 贺轶斐, 顾铭, 陈梓浩, 李昀豪, 汪涛. 基于密度聚类的毫米波雷达目标点云杂点去除技术[J]. 空天防御, 2026, 9(1): 63-72.
[7]	王语阳, 张琛, 张宇, 王一鸣, 许颇, 蔡旭. 提升弱网有功稳定输出能力的光伏逆变器Q-V下垂系数在线调整方法[J]. 上海交通大学学报, 2025, 59(6): 845-856.
[8]	. 基于自适应鲁棒扩展卡尔曼滤波器的北斗三号PPP-B2b性能综合分析[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(6): 1208-1219.
[9]	. 基于CEEMDAN 和 GRU的停车位预测[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 962-975.
[10]	. MAGPNet: 基于多域注意力引导的红外弱小目标检测网络[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 935-951.
[11]	. 图卷积网络与Stacking集成学习相结合的堆芯自给能中子探测器故障识别方法[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 1018-1027.
[12]	包家汉, 孙德尚, 黄建中, 胡政. 基于自适应阈值的型钢精确角点FAST检测算法[J]. 上海交通大学学报, 2025, 59(5): 691-702.
[13]	. 基于CatBoost特征选择和Stacking集成学习的磨玻璃肺结节识别[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(4): 790-799.
[14]	李扬, 张显涛, 肖龙飞. 自适应双稳态浮子式波浪能发电装置在不规则波中的参数控制[J]. 上海交通大学学报, 2025, 59(3): 293-302.
[15]	缪雨衡, 李如飞, 鄂斌, 王小刚, 崔乃刚. 基于自适应CPM的高超声速飞行器滑翔弹道优化[J]. 空天防御, 2025, 8(3): 123-131.