Collective Data Anomaly Detection Based on Reverse k-Nearest Neighbor Filtering

Expand
  • 1. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201600, China
    2. School of Software, Shanghai Jiao Tong University, Shanghai 200240, China

Received date: 2020-01-08

  Online published: 2021-06-01

Abstract

Aimed at the problem of group data anomaly detection with no data labels, a k-nearest neighbor (kNN) algorithm is proposed to detect group data anomalies in the unsupervised mode. In order to reduce false negatives and false positives caused by the mutual interference between abnormal and normal values, a reverse k-nearest neighbor (RkNN) method is proposed to filter the abnormal group data in reverse. First, the RkNN algorithm uses statistical distance as the similarity measure between different groups of data. Then, the anomaly scores of each group and the initial abnormality are obtained by using the kNN algorithm. Finally, the initial abnormality is filtered by using the RkNN method. The experiment results show that the algorithm proposed can not only effectively reduce the false negatives and false positives, but also has a high anomaly detection rate and good stability.

Cite this article

WU Jin’e, WANG Ruoyu, DUAN Qianqian, LI Guoqiang, JÜ Changjiang . Collective Data Anomaly Detection Based on Reverse k-Nearest Neighbor Filtering[J]. Journal of Shanghai Jiaotong University, 2021 , 55(5) : 598 -606 . DOI: 10.16183/j.cnki.jsjtu.2020.011

References

[1] MEHROTRA K G, MOHAN C K, HUANG H M. Anomaly detection principles and algorithms[M]. Switzerland: Springer International Publishing, 2017.
[2] TIMČENKO V, GAJIN S. Ensemble classifiers for supervised anomaly based network intrusion detection [C]//2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP). Piscataway, NJ, USA: IEEE, 2017: 13-19.
[3] HUSSAIN B, DU Q H, REN P Y. Semi-supervised learning based big data-driven anomaly detection in mobile wireless networks[J]. China Communications, 2018, 15(4):41-57.
[4] MILLER D J, KESIDIS G, QIU Z C. Unsupervised parsimonious cluster-based anomaly detection (PCAD) [C]//2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP). Piscataway, NJ, USA: IEEE, 2018: 1-6.
[5] CHANDOLA V, BANERJEE A, KUMAR V. Anomaly detection[J]. ACM Computing Surveys, 2009, 41(3):1-58.
[6] TAO X T, LI G Q, SUN D, et al. A game-theoretic model and analysis of data exchange protocols for Internet of Things in clouds[J]. Future Generation Computer Systems, 2017, 76:582-589.
[7] EDGEWORTH F Y. On discordant observations[J]. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1887, 23(143):364-375.
[8] KNORR E M, NG R T, TUCAKOV V. Distance-based outliers: Algorithms and applications[J]. The VLDB Journal, 2000, 8(3/4):237-253.
[9] LEE J G, HAN J W, LI X L. Trajectory outlier detection: A partition-and-detect framework [C]//2008 IEEE 24th International Conference on Data Engineering. Piscataway, NJ, USA: IEEE, 2008: 140-149.
[10] LUAN F J, ZHANG Y T, CAO K Y, et al. Based local density trajectory outlier detection with partition-and-detect framework [C]//2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). Piscataway, NJ, USA: IEEE, 2017: 1708-1714.
[11] DJENOURI Y, BELHADI A, LIN J C, et al. Adapted K-nearest neighbors for detecting anomalies on spatio-temporal traffic flow[J]. IEEE Access, 2019, 7:10015-10027.
[12] 毛江云, 吴昊, 孙未未. 路网空间下基于马尔可夫决策过程的异常车辆轨迹检测算法[J]. 计算机学报, 2018, 41(8):1928-1942.
[12] MAO Jiangyun, WU Hao, SUN Weiwei. Vehicle trajectory anomaly detection in road network via Markov decision process[J]. Chinese Journal of Computers, 2018, 41(8):1928-1942.
[13] WANG R Y, SUN D, LI G Q, et al. Statistical detection of collective data Fraud [C]//International Conference on Multimedia and Expo. London, UK: IEEE, 2020.
[14] KULLBACK S, LEIBLER R A. On information and sufficiency[J]. Annals of Mathematical Statistics, 1951, 22(1):79-86.
[15] SALEM O, NAÏT-ABDESSELAM F, MEHAOUA A. Anomaly detection in network traffic using Jensen-Shannon divergence [C]//2012 IEEE International Conference on Communications (ICC). Piscataway, NJ, USA: IEEE, 2012: 5200-5204.
[16] COVER T, HART P. Nearest neighbor pattern classification[J]. IEEE Transactions on Information Theory, 1967, 13(1):21-27.
[17] WOHLKINGER W, ALDOMA A, RUSU R B, et al. 3DNet: Large-scale object class recognition from CAD models [C]//2012 IEEE International Conference on Robotics and Automation. Piscataway, NJ, USA: IEEE, 2012: 5384-5391.
[18] AGGARWAL C C. Proximity-based outlier detection[M]// Outlier Analysis. Switzerland: Springer International Publishing, 2016: 111-147.
[19] 陈瑜. 离群点检测算法研究[D]. 兰州: 兰州大学, 2018.
[19] CHEN Yu. Research on the outliers detection algorithm[D]. Lanzhou: Lanzhou University, 2018.
Outlines

/