上海交通大学学报 ›› 2026, Vol. 60 ›› Issue (1): 112-122.doi: 10.16183/j.cnki.jsjtu.2024.198
收稿日期:2024-05-29
修回日期:2024-08-26
接受日期:2024-09-04
出版日期:2026-01-28
发布日期:2026-01-27
通讯作者:
卢诚波
E-mail:lu.chengbo@aliyun.com.
作者简介:吴勇华(1998—),硕士生,从事数据挖掘研究.
基金资助:
WU Yonghua1, MEI Ying2,3, LU Chengbo2,3(
)
Received:2024-05-29
Revised:2024-08-26
Accepted:2024-09-04
Online:2026-01-28
Published:2026-01-27
Contact:
LU Chengbo
E-mail:lu.chengbo@aliyun.com.
摘要:
概念漂移是数据流挖掘中最常见的现象之一,数据流中隐含的知识模式随时间动态变化,导致先前建立的分类器的准确性下降.针对这一问题,提出基于增量加权的概念漂移数据流分类(SCIW)算法.该算法采用启发式的权重更新策略,结合基于准确性差异的自适应方法,同时改进了基于泊松分布的重采样策略.SCIW算法能够适应不同类型的概念漂移,有效缓解了分类器准确率下降的问题.在14个合成数据集和6个真实数据集上的实验结果表明,SCIW算法和自适应随机森林(ARF)算法在准确率方面表现出色,明显优于其他对比算法;SCIW算法在时间和内存消耗方面明显优于ARF算法,总体平均时间消耗约为ARF的83%,总体平均内存消耗约为ARF算法的13%.
中图分类号:
吴勇华, 梅颖, 卢诚波. 基于增量加权的概念漂移数据流分类算法[J]. 上海交通大学学报, 2026, 60(1): 112-122.
WU Yonghua, MEI Ying, LU Chengbo. Concept Drift Data Stream Classification Algorithm Based on Incremental Weighting[J]. Journal of Shanghai Jiao Tong University, 2026, 60(1): 112-122.
表1
数据集描述
| 数据集 | 实例数 | 特征数 | 漂移 类型 | 类别 数 | 漂移 次数 | 数据集 | 实例数 | 特征数 | 漂移 类型 | 类别 数 | 漂移 次数 | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LED_A | 100000 | 24 | A | 10 | 3 | AGR_G | 100000 | 9 | G | 2 | 3 | |||||
| LED_G | 100000 | 24 | G | 10 | 3 | AGR_R | 100000 | 9 | R | 2 | 4 | |||||
| HYPER_F | 100000 | 10 | I & F | 2 | 1 | RTG_A | 100000 | 60 | A | 2 | 1 | |||||
| HYPER_S | 100000 | 10 | I & S | 2 | 1 | RTG_N | 100000 | 60 | N | 2 | 0 | |||||
| RBF_F | 100000 | 10 | I & F | 4 | 1 | USENET1 | 1500 | 99 | U | 2 | U | |||||
| RBF_S | 100000 | 10 | I & S | 4 | 1 | USENET2 | 1500 | 99 | U | 2 | U | |||||
| SEA_A | 100000 | 3 | A | 2 | 3 | electricity | 45312 | 6 | U | 2 | U | |||||
| SINE_A | 100000 | 4 | A | 2 | 5 | Power | 29928 | 2 | U | 24 | U | |||||
| SINE_G | 100000 | 4 | G | 2 | 5 | Weather | 18159 | 8 | U | 2 | U | |||||
| SINE_R | 100000 | 4 | R | 2 | 5 | GMSC | 100000 | 11 | U | 2 | U | |||||
| AGR_A | 100000 | 9 | A | 2 | 3 | |||||||||||
| [1] |
WARES S, ISAACS J, ELYAN E. Data stream mining: Methods and challenges for handling concept drift[J]. SN Applied Sciences, 2019, 1(11): 1412.
doi: 10.1007/s42452-019-1433-0 |
| [2] |
SOUZA V M A, DOS REIS D M, MALETZKE A G, et al. Challenges in benchmarking stream learning algorithms with real-world data[J]. Data Mining and Knowledge Discovery, 2020, 34(6): 1805-1858.
doi: 10.1007/s10618-020-00698-5 |
| [3] |
ALDISSI B, AMMAR H. Real-time frequency-based detection of a panic behavior in human crowds[J]. Multimedia Tools and Applications, 2020, 79(33): 24851-24871.
doi: 10.1007/s11042-020-09024-z |
| [4] |
HU Y. Design and implementation of abnormal behavior detection based on deep intelligent analysis algorithms in massive video surveillance[J]. Journal of Grid Computing, 2020, 18(2): 227-237.
doi: 10.1007/s10723-020-09506-2 |
| [5] |
ZHANG P, ZHENG J, LIN H L, et al. Vehicle trajectory data mining for artificial intelligence and real-time traffic information extraction[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(11): 13088-13098.
doi: 10.1109/TITS.2022.3178182 URL |
| [6] |
MAJUMDAR S, SUBHANI M M, ROULLIER B, et al. Congestion prediction for smart sustainable ci-ties using IoT and machine learning approaches[J]. Sustainable Cities and Society, 2021, 64: 102500.
doi: 10.1016/j.scs.2020.102500 URL |
| [7] |
LI W D, YANG X, LIU W Q, et al. DDG-DA: Data distribution generation for predictable concept drift adaptation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(4): 4092-4100.
doi: 10.1609/aaai.v36i4.20327 URL |
| [8] |
KIM M, HWANG S H, WHANG S E. Quilt: Robust data segment selection against concept drifts[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(19): 21249-21257.
doi: 10.1609/aaai.v38i19.30119 URL |
| [9] |
SUÁREZ-CETRULO A L, QUINTANA D, CERVANTES A. A survey on machine learning for recurring concept drifting data streams[J]. Expert Systems with Applications, 2023, 213: 118934.
doi: 10.1016/j.eswa.2022.118934 URL |
| [10] |
AGRAHARI S, SINGH A K. Concept drift detection in data stream mining: A literature review[J]. Journal of King Saud University-Computer and Information Sciences, 2022, 34(10): 9523-9540.
doi: 10.1016/j.jksuci.2021.11.006 URL |
| [11] | 赵鹏, 周志华. 基于决策树模型重用的分布变化流数据学习[J]. 中国科学(信息科学), 2021, 51(1): 1-12. |
| ZHAO Peng, ZHOU Zhihua. Learning from distribution-changing data streams via decision tree model reuse[J]. Scientia Sinica (Informationis), 2021, 51(1): 1-12. | |
| [12] | 郭虎升, 张爱娟, 王文剑. 基于在线性能测试的概念漂移检测方法[J]. 软件学报, 2020, 31(4): 932-947. |
| GUO Husheng, ZHANG Aijuan, WANG Wenjian. Concept drift detection method based on online performance test[J]. Journal of Software, 2020, 31(4): 932-947. | |
| [13] | ZHAO P, WANG X Q, XIE S Y, et al. Distribution-free one-pass learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(3): 951-963. |
| [14] | MANICKASWAMY T, BHUVANESWARI A. Concept drift in data stream classification using ensemble methods: Types, methods and challenges[J]. INFOCOMP Journal of Computer Science, 2020, 19: 163-174. |
| [15] | WANG H X, FAN W, YU P S, et al. Mining concept-drifting data streams using ensemble classifiers[C]// Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington D. C., USA: ACM, 2003: 226-235. |
| [16] | KOZAL J, GUZY F, WOŹNIAK M. Employing chunk size adaptation to overcome concept drift[J]Journal of Universal Computer Science, 2022, 28(3): 249-267. |
| [17] |
CANO A, KRAWCZYK B. Kappa Updated Ensemble for drifting data stream mining[J]. Machine Learning, 2020, 109(1): 175-218.
doi: 10.1007/s10994-019-05840-z |
| [18] | 郭虎升, 丛璐, 高淑花, 等. 基于在线集成的概念漂移自适应分类方法[J]. 计算机研究与发展, 2023, 60(7): 1592-1602. |
| GUO Husheng, CONG Lu, GAO Shuhua, et al. Adaptive classification method for concept drift based on online ensemble[J]. Journal of Computer Research and Development, 2023, 60(7): 1592-1602. | |
| [19] |
GOEL K, BATRA S. Two-level pruning based ensemble with abstained learners for concept drift in data streams[J]. Expert Systems, 2021, 38(3): e12661.
doi: 10.1111/exsy.v38.3 URL |
| [20] | BACH S H, MALOOF M A. Paired learners for concept drift[C]// 2008 Eighth IEEE International Conference on Data Mining. Pisa,Italy: IEEE, 2008: 23-32. |
| [21] |
DHALIWAL P, KUMAR A, CHAUDHARY P. An approach for concept drifting streams: Early dynamic weighted majority[J]. Procedia Computer Science, 2020, 167: 2653-2661.
doi: 10.1016/j.procs.2020.03.344 URL |
| [22] |
IDREES M M, MINKU L L, STAHL F, et al. A heterogeneous online learning ensemble for non-stationary environments[J]. Knowledge-Based Systems, 2020, 188: 104983.
doi: 10.1016/j.knosys.2019.104983 URL |
| [23] | KOLTER J Z, MALOOF M A. Dynamic weighted majority: An ensemble method for drifting concepts[J]. Journal of Machine Learning Research, 2007, 8: 2755-2790. |
| [24] | GOMES H M, READ J, BIFET A. Streaming random patches for evolving data stream classification[C]// 2019 IEEE International Conference on Data Mining. Beijing, China: IEEE, 2019: 240-249. |
| [25] |
GOMES H M, BIFET A, READ J, et al. Adaptive random forests for evolving data stream classification[J]. Machine Learning, 2017, 106(9): 1469-1495.
doi: 10.1007/s10994-017-5642-8 |
| [26] |
FEITOSA NETO A, CANUTO A M P. EOCD: An ensemble optimization approach for concept drift applications[J]. Information Sciences, 2021, 561: 81-100.
doi: 10.1016/j.ins.2021.01.051 URL |
| [27] | OZA N C, RUSSELL S J. Online bagging and boosting[C]// Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics. Hawaii, USA: PMLR, 2001: 229-236. |
| [28] |
BAHRI M, BIFET A, GAMA J, et al. Data stream analysis: Foundations, major tasks and tools[J]. WIREs Data Mining and Knowledge Discovery, 2021, 11(3): e1405.
doi: 10.1002/widm.v11.3 URL |
| [29] | 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016. |
| ZHOU Zhihua. Machine learning[M]. Beijing: Tsinghua University Press, 2016. |
| [1] | 樊星, 葛菲, 贾文文, 肖方伟. 基于多属性自适应聚合网络架构的车辆重识别[J]. 上海交通大学学报, 2026, 60(1): 123-132. |
| [2] | 王语阳, 张琛, 张宇, 王一鸣, 许颇, 蔡旭. 提升弱网有功稳定输出能力的光伏逆变器Q-V下垂系数在线调整方法[J]. 上海交通大学学报, 2025, 59(6): 845-856. |
| [3] | . 基于自适应鲁棒扩展卡尔曼滤波器的北斗三号PPP-B2b性能综合分析[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(6): 1208-1219. |
| [4] | . 图卷积网络与Stacking集成学习相结合的堆芯自给能中子探测器故障识别方法[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 1018-1027. |
| [5] | 包家汉, 孙德尚, 黄建中, 胡政. 基于自适应阈值的型钢精确角点FAST检测算法[J]. 上海交通大学学报, 2025, 59(5): 691-702. |
| [6] | . 基于CEEMDAN 和 GRU的停车位预测[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 962-975. |
| [7] | . MAGPNet: 基于多域注意力引导的红外弱小目标检测网络[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 935-951. |
| [8] | . 基于CatBoost特征选择和Stacking集成学习的磨玻璃肺结节识别[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(4): 790-799. |
| [9] | 薛昂, 姜恩宇, 张文涛, 林顺富, 米阳. 基于窗口自注意力网络与YOLOv5融合的输电线路通道异物检测[J]. 上海交通大学学报, 2025, 59(3): 413-423. |
| [10] | 李扬, 张显涛, 肖龙飞. 自适应双稳态浮子式波浪能发电装置在不规则波中的参数控制[J]. 上海交通大学学报, 2025, 59(3): 293-302. |
| [11] | 缪雨衡, 李如飞, 鄂斌, 王小刚, 崔乃刚. 基于自适应CPM的高超声速飞行器滑翔弹道优化[J]. 空天防御, 2025, 8(3): 123-131. |
| [12] | . 通过变化子区调整数字图像相关测量精度及其在飞艇蒙皮中的应用[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 239-251. |
| [13] | 薛雅丽, 徐夏易, 李锦毅, 崔闪, 洪君, 刘世豪. 智能控制技术在导弹制导系统中的应用与发展前景[J]. 空天防御, 2025, 8(2): 1-6. |
| [14] | 毛彦嵋, 李华锋, 张亚飞. 面向跨区域场景的无监督域自适应行人重识别[J]. 上海交通大学学报, 2025, 59(12): 1878-1890. |
| [15] | 李龙跃, 王文豪, 皮雳, 贾忠慧, 赵慧珍. 防空反导作战模拟推演分析方法综述[J]. 空天防御, 2025, 8(1): 48-53. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||