上海交通大学学报(自然版) ›› 2014, Vol. 48 ›› Issue (05): 647-652.

• 自动化技术、计算机技术 • 上一篇    下一篇

基于角度分布的高维数据流异常点检测算法

朴昌浩1,黄质1,苏岭2,禄盛1
  

  1. (1.重庆邮电大学 模式识别及应用研究所, 重庆 400065; 2.重庆长安汽车股份有限公司, 重庆 400023)
     
  • 收稿日期:2013-08-30
  • 基金资助:

    国家自然科学基金资助项目(11247325),重庆市科委自然科学基金资助项目(CSTC2013yykfC60005,CSTC2011BB4145,CSTC2013jcsfjcssX0022,CSTC2013jcyjjq60002)

High-Dimensional Data Stream Outlier Detection Algorithm Based on Angle Distribution

PIAO Changhao1,HUANG Zhi1,SU Ling2,LU Sheng1
  

  1. (1. Institute of Pattern Recognition and Applications, Chongqing University of Posts and  Telecommunications, Chongqing 400065, China;2.Chongqing Changan Automobile Company Limited, Chongqing 400023, China)
  • Received:2013-08-30

摘要:

为了有效检测高维数据流中的异常点,提出一种基于角度分布的高维数据流异常点检测(DSOD)算法.运用基于角度分布的方法准确识别高维数据集中的正常点、边界点以及异常点;构造了基于正常集、边界集的小规模数据流型计算集,以降低算法在空间以及时间上的开销;建立了正常集、边界集的更新机制,以解决大数据流的概念转移问题.在真实数据集上的实验结果表明,所提出的DSOD算法的效率高于Simple VOA算法与ABOD算法,并且适用于大数据流上的异常点检测.
关键词: 
中图分类号: 文献标志码:  A
 
 

关键词: 角度分布, 数据流, 高维, 异常点检测

Abstract:

To improve outlier detection in high-dimensional data stream, a novel high-dimensional data stream outlier detection (DSOD) algorithm based on angle distribution was proposed. To identify the normal point, border point and outlier accurately, the method of angle distribution-based outlier detection algorithm was employed. To reduce the computational complexity, a smallscale calculation set of data stream was established, which is composed of normal set, border set. To solve the problem of concept drift, an updated mechanism for the normal set and border set was developed. The experimental results on real data sets demonstrate that DSOD is more efficient than Simple variance of angles (Simple VOA) and angel-based outlier detection (ABOD) and is very suitable for the outlier detection of large data streams.
 

Key words: angle distribution, data stream, high-dimensional, outlier detection

中图分类号: