基于反向k近邻过滤异常的群数据异常检测

展开
  • 1.上海工程技术大学 电子电气工程学院, 上海 201600
    2.上海交通大学 软件学院, 上海 200240
吴金娥(1995-),女,安徽省芜湖市人,硕士生,现主要从事数据异常检测研究.

收稿日期: 2020-01-08

  网络出版日期: 2021-06-01

基金资助

国家自然科学基金重点项目(61732013);国家重点研发计划(SQ2019YFB170208)

Collective Data Anomaly Detection Based on Reverse k-Nearest Neighbor Filtering

Expand
  • 1. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201600, China
    2. School of Software, Shanghai Jiao Tong University, Shanghai 200240, China

Received date: 2020-01-08

  Online published: 2021-06-01

摘要

针对无数据标签的群数据异常检测问题,提出在无监督模式下利用k最近邻(kNN)算法检测群数据异常.为减少由于异常值与正常值之间相互干扰而产生的漏报和误报,提出用反向k近邻(RkNN)算法对异常群数据进行反向过滤. 反向k近邻算法首先将统计距离作为不同群数据间的相似性度量,再用kNN算法求得每个集群的异常得分,并获得初始异常,最后使用RkNN算法对初始异常进行过滤.实验结果证明,所提算法能有效减少漏报和误报,且具有较高的异常检测率和良好的稳定性.

本文引用格式

吴金娥, 王若愚, 段倩倩, 李国强, 琚长江 . 基于反向k近邻过滤异常的群数据异常检测[J]. 上海交通大学学报, 2021 , 55(5) : 598 -606 . DOI: 10.16183/j.cnki.jsjtu.2020.011

Abstract

Aimed at the problem of group data anomaly detection with no data labels, a k-nearest neighbor (kNN) algorithm is proposed to detect group data anomalies in the unsupervised mode. In order to reduce false negatives and false positives caused by the mutual interference between abnormal and normal values, a reverse k-nearest neighbor (RkNN) method is proposed to filter the abnormal group data in reverse. First, the RkNN algorithm uses statistical distance as the similarity measure between different groups of data. Then, the anomaly scores of each group and the initial abnormality are obtained by using the kNN algorithm. Finally, the initial abnormality is filtered by using the RkNN method. The experiment results show that the algorithm proposed can not only effectively reduce the false negatives and false positives, but also has a high anomaly detection rate and good stability.

参考文献

[1] MEHROTRA K G, MOHAN C K, HUANG H M. Anomaly detection principles and algorithms[M]. Switzerland: Springer International Publishing, 2017.
[2] TIMČENKO V, GAJIN S. Ensemble classifiers for supervised anomaly based network intrusion detection [C]//2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP). Piscataway, NJ, USA: IEEE, 2017: 13-19.
[3] HUSSAIN B, DU Q H, REN P Y. Semi-supervised learning based big data-driven anomaly detection in mobile wireless networks[J]. China Communications, 2018, 15(4):41-57.
[4] MILLER D J, KESIDIS G, QIU Z C. Unsupervised parsimonious cluster-based anomaly detection (PCAD) [C]//2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP). Piscataway, NJ, USA: IEEE, 2018: 1-6.
[5] CHANDOLA V, BANERJEE A, KUMAR V. Anomaly detection[J]. ACM Computing Surveys, 2009, 41(3):1-58.
[6] TAO X T, LI G Q, SUN D, et al. A game-theoretic model and analysis of data exchange protocols for Internet of Things in clouds[J]. Future Generation Computer Systems, 2017, 76:582-589.
[7] EDGEWORTH F Y. On discordant observations[J]. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1887, 23(143):364-375.
[8] KNORR E M, NG R T, TUCAKOV V. Distance-based outliers: Algorithms and applications[J]. The VLDB Journal, 2000, 8(3/4):237-253.
[9] LEE J G, HAN J W, LI X L. Trajectory outlier detection: A partition-and-detect framework [C]//2008 IEEE 24th International Conference on Data Engineering. Piscataway, NJ, USA: IEEE, 2008: 140-149.
[10] LUAN F J, ZHANG Y T, CAO K Y, et al. Based local density trajectory outlier detection with partition-and-detect framework [C]//2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). Piscataway, NJ, USA: IEEE, 2017: 1708-1714.
[11] DJENOURI Y, BELHADI A, LIN J C, et al. Adapted K-nearest neighbors for detecting anomalies on spatio-temporal traffic flow[J]. IEEE Access, 2019, 7:10015-10027.
[12] 毛江云, 吴昊, 孙未未. 路网空间下基于马尔可夫决策过程的异常车辆轨迹检测算法[J]. 计算机学报, 2018, 41(8):1928-1942.
[12] MAO Jiangyun, WU Hao, SUN Weiwei. Vehicle trajectory anomaly detection in road network via Markov decision process[J]. Chinese Journal of Computers, 2018, 41(8):1928-1942.
[13] WANG R Y, SUN D, LI G Q, et al. Statistical detection of collective data Fraud [C]//International Conference on Multimedia and Expo. London, UK: IEEE, 2020.
[14] KULLBACK S, LEIBLER R A. On information and sufficiency[J]. Annals of Mathematical Statistics, 1951, 22(1):79-86.
[15] SALEM O, NAÏT-ABDESSELAM F, MEHAOUA A. Anomaly detection in network traffic using Jensen-Shannon divergence [C]//2012 IEEE International Conference on Communications (ICC). Piscataway, NJ, USA: IEEE, 2012: 5200-5204.
[16] COVER T, HART P. Nearest neighbor pattern classification[J]. IEEE Transactions on Information Theory, 1967, 13(1):21-27.
[17] WOHLKINGER W, ALDOMA A, RUSU R B, et al. 3DNet: Large-scale object class recognition from CAD models [C]//2012 IEEE International Conference on Robotics and Automation. Piscataway, NJ, USA: IEEE, 2012: 5384-5391.
[18] AGGARWAL C C. Proximity-based outlier detection[M]// Outlier Analysis. Switzerland: Springer International Publishing, 2016: 111-147.
[19] 陈瑜. 离群点检测算法研究[D]. 兰州: 兰州大学, 2018.
[19] CHEN Yu. Research on the outliers detection algorithm[D]. Lanzhou: Lanzhou University, 2018.
文章导航

/