上海交通大学学报

• • 上一篇    

基于锚点加速机制的聚类算法综述

  

  1. 1. 厦门大学萨本栋微纳科学与技术研究所 福建 厦门 361102; 

    2. 集美大学信息工程学院 福建 厦门 361021;

    3. 上海工程技术大学 机械与汽车工程学院, 上海 201620;

    4. 自然资源部第三海洋研究所 福建 厦门 361005

    5.  厦门大学健康医疗大数据国家研究院 福建 厦门 361102

  • 作者简介:吴沁停(2001—),硕士生,从事模式识别,数据挖掘等研究。
  • 基金资助:
    国家自然科学基金 (42076058)、福建省海洋渔业专项基金(FJHYF-ZH-2023-05)、福建省自然科学基金 ( 2020J01713,2022J01061)、广东省基础与应用基础研究基金(2024A1515011682)

A Review of Cluster Algorithms Based on Anchor Point Acceleration Mechanism

  1. 1. Pen-Tung Sah Institute of Micro-Nano Science and Technology, Xiamen University, Xiamen 361102, Fujian, China; 

    2. School of Information Engineering, Jimei University, Xiamen 361021,  Fujian, China;

    3. School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science, Shanghai 2016202, China; 

    4. Third Institute of Oceanography, Ministry of Natural Resources, Xiamen 361005, Fujian, China;

    5. National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361102, Fujian, China

摘要: 随着大数据时代的到来,聚类算法已成为数据挖掘和机器学习的关键。然而,数据规模和维度的指数级增长导致传统聚类方法的时间和空间复杂性不断升级,制约了其实际应用。为了应对这些挑战,锚点加速机制应运而生,它能显著减轻计算负担,从而提高传统聚类算法在大规模数据集上的有效性。本文全面回顾了基于锚点加速机制的聚类算法,探讨了锚点生成和相似性图构建等各种技术,涵盖了利用固定锚点的聚类方法,包括谱聚类、模糊谱聚类、多视图聚类和深度聚类。此外,本文还研究了采用动态锚点的聚类策略,包括多视图和不完全多视图聚类算法。通过综合分析这些情况,本文指出了当前的局限性,并直面新出现的挑战,对未来的发展方向提出了见解,为指导该领域的未来研究和实际应用提供了路线图。这项全面的研究旨在为研究人员和从业人员提供有价值的指导和启发,促进适合当代数据环境的聚类算法的持续创新。

关键词: 聚类, 锚点, 谱聚类, 模糊聚类, 多视图聚类, 深度聚类

Abstract: With the advent of the big data era, clustering algorithms have become pivotal in data mining and machine learning. However, the exponential growth in data size and dimensionality has resulted in escalating time and space complexities for traditional clustering methods, constraining their practical utility. To address these challenges, the anchor point acceleration mechanism has emerged as a potent approach to significantly mitigate computational burdens, thereby augmenting the effectiveness of conventional clustering algorithms for large-scale datasets. This paper provides a comprehensive review of clustering algorithms leveraging the anchor point acceleration mechanism. It explores various techniques such as anchor point generation and the construction of similarity graphs. The discussion encompasses clustering methodologies utilizing fixed anchor pointsq, encompassing spectral clustering, fuzzy spectral clustering, multi-view clustering, and deep clustering algorithms. Additionally, it investigates clustering strategies employing dynamic anchor points, including multi-view and incomplete multi-view clustering algorithms. By synthesizing and analyzing this landscape, the paper identifies current limitations and confronts emerging challenges. It also offers insights into future avenues for advancement, serving as a roadmap for guiding future research and practical applications in the field. This comprehensive examination aims to provide valuable guidance and inspiration to researchers and practitioners alike, fostering continued innovation in clustering algorithms tailored for contemporary data environments.

Key words: clustering; anchors, spectral clustering, fuzzy clustering, multi-view clustering, deep clustering

中图分类号: