基于增量加权的概念漂移数据流分类算法(网络首发)

展开
  • 1.浙江理工大学计算机科学与技术学院;2.丽水学院数学与计算机学院;3.浙江得图网络有限公司

网络出版日期: 2024-10-08

基金资助

国家自然科学基金资助项目(12171217)

The Concept Drift Data Stream Classification Algorithm Based on Incremental Weighting

Expand
  • (1.School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, Zhejiang, China; 2. School of Mathematics and Computer, Lishui University, Lishui 323000, Zhejiang, China; 3. Zhejiang Detu Network Co., Ltd., Lishui 323000, Zhejiang, China)

Online published: 2024-10-08

摘要

概念漂移是数据流挖掘中最常见的现象之一,数据流中隐含的知识模式随时间动态变化,导致先前建立的分类器的准确性下降。针对这一问题,提出了一种基于增量加权的概念漂移数据流分类算法(The Concept Drift Data Stream Classification Algorithm Based on Incremental Weighting,SCIW),该算法采用了一种启发式的权重更新策略,并结合基于准确性差异的自适应方法,同时改进了基于泊松分布的重采样策略。SCIW算法能够适应不同类型的概念漂移,有效缓解了分类器准确率下降的问题。在 14 个合成数据集和 6 个真实数据集上的实验结果表明,SCIW和ARF在准确率方面表现出色,明显优于其他对比算法。同时,SCIW在时间和内存消耗方面明显优于ARF,其中总体平均时间消耗约为ARF的83%,总体平均内存消耗约为ARF的13%。

本文引用格式

吴勇华1, 梅颖2, 3, 卢诚波2, 3 . 基于增量加权的概念漂移数据流分类算法(网络首发)[J]. 上海交通大学学报, 0 : 0 . DOI: 10.16183/j.cnki.jsjtu.2024.198

Abstract

Concept drift is one of the most common phenomena in data stream mining, where the underlying knowledge patterns in the data stream change dynamically over time, leading to a decline in the accuracy of previously established classifiers. To address this issue, we propose a Concept Drift Data Stream Classification Algorithm Based on Incremental Weighting (SCIW). This algorithm employs a heuristic weight updating strategy combined with an adaptive method based on accuracy differences, and it also improves the Poisson distribution-based resampling strategy. The SCIW algorithm is capable of adapting to various types of concept drift, effectively mitigating the decline in classifier accuracy. Experimental results on 14 synthetic datasets and 6 real-world datasets demonstrate that SCIW and ARF outperform other algorithms in terms of accuracy. Additionally, SCIW significantly outperforms ARF in terms of time and memory consumption, with overall average time consumption at approximately 83% of ARF's and overall average memory consumption at approximately 13% of ARF's.
文章导航

/