上海交通大学学报 ›› 2021, Vol. 55 ›› Issue (5): 598-606.doi: 10.16183/j.cnki.jsjtu.2020.011

所属专题: 《上海交通大学学报》2021年12期专题汇总专辑 《上海交通大学学报》2021年“自动化技术、计算机技术”专题

• • 上一篇    下一篇

基于反向k近邻过滤异常的群数据异常检测

吴金娥1, 王若愚2, 段倩倩1(), 李国强1,2, 琚长江2   

  1. 1.上海工程技术大学 电子电气工程学院, 上海 201600
    2.上海交通大学 软件学院, 上海 200240
  • 收稿日期:2020-01-08 出版日期:2021-05-28 发布日期:2021-06-01
  • 通讯作者: 段倩倩 E-mail:dqq1019@163.com
  • 作者简介:吴金娥(1995-),女,安徽省芜湖市人,硕士生,现主要从事数据异常检测研究.
  • 基金资助:
    国家自然科学基金重点项目(61732013);国家重点研发计划(SQ2019YFB170208)

Collective Data Anomaly Detection Based on Reverse k-Nearest Neighbor Filtering

WU Jin’e1, WANG Ruoyu2, DUAN Qianqian1(), LI Guoqiang1,2, JÜ Changjiang2   

  1. 1. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201600, China
    2. School of Software, Shanghai Jiao Tong University, Shanghai 200240, China
  • Received:2020-01-08 Online:2021-05-28 Published:2021-06-01
  • Contact: DUAN Qianqian E-mail:dqq1019@163.com

摘要:

针对无数据标签的群数据异常检测问题,提出在无监督模式下利用k最近邻(kNN)算法检测群数据异常.为减少由于异常值与正常值之间相互干扰而产生的漏报和误报,提出用反向k近邻(RkNN)算法对异常群数据进行反向过滤. 反向k近邻算法首先将统计距离作为不同群数据间的相似性度量,再用kNN算法求得每个集群的异常得分,并获得初始异常,最后使用RkNN算法对初始异常进行过滤.实验结果证明,所提算法能有效减少漏报和误报,且具有较高的异常检测率和良好的稳定性.

关键词: 异常检测, 无监督, k最近邻, 反向k近邻, 统计距离

Abstract:

Aimed at the problem of group data anomaly detection with no data labels, a k-nearest neighbor (kNN) algorithm is proposed to detect group data anomalies in the unsupervised mode. In order to reduce false negatives and false positives caused by the mutual interference between abnormal and normal values, a reverse k-nearest neighbor (RkNN) method is proposed to filter the abnormal group data in reverse. First, the RkNN algorithm uses statistical distance as the similarity measure between different groups of data. Then, the anomaly scores of each group and the initial abnormality are obtained by using the kNN algorithm. Finally, the initial abnormality is filtered by using the RkNN method. The experiment results show that the algorithm proposed can not only effectively reduce the false negatives and false positives, but also has a high anomaly detection rate and good stability.

Key words: abnormal detection, unsupervised, k-nearest neighbor (kNN), reverse k-nearest neighbor (RkNN), statistical distance

中图分类号: