上海交通大学学报(英文版) ›› 2017, Vol. 22 ›› Issue (2): 139-147.doi: 10.1007/s12204-017-1813-9

• • 上一篇    下一篇

Automatically Finding the Number of Clusters Based on Simulated Annealing

YANG Zhengwu (杨政武), HUO Hong (霍宏), FANG Tao*(方涛)   

  1. (Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China)
  • 出版日期:2017-03-31 发布日期:2017-04-04
  • 通讯作者: FANG Tao(方涛) E-mail:tfang@sjtu.edu.cn

Automatically Finding the Number of Clusters Based on Simulated Annealing

YANG Zhengwu (杨政武), HUO Hong (霍宏), FANG Tao*(方涛)   

  1. (Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China)
  • Online:2017-03-31 Published:2017-04-04
  • Contact: FANG Tao(方涛) E-mail:tfang@sjtu.edu.cn

摘要:

Based on simulated annealing (SA), automatically finding the number of clusters (AFNC) is proposed in this paper to determine the number of clusters and their initial centers. It is a simple and automatic method that combines local search with two widely-accepted global analysis techniques, namely careful-seeding (CS) and distance-histogram (DH). The procedure for finding a cluster is formulated as mountain-climbing, and the mountain is defined as the convergent domain of SA.When arriving at the peak of one mountain, AFNC has found one of the clusters in the dataset, and its initial center is the peak. Then, AFNC continues to climb up another mountain from a new starting point found by CS till the termination condition is satisfied. In the procedure of climbing-up mountain, the local dense region for searching the next state of SA is found by analyzing the distance histogram. Experimental results show that AFNC can achieve consistent performance for a wide range of datasets.

关键词: clusters, simulated annealing (SA), distance histogram, careful seeding

Abstract:

Based on simulated annealing (SA), automatically finding the number of clusters (AFNC) is proposed in this paper to determine the number of clusters and their initial centers. It is a simple and automatic method that combines local search with two widely-accepted global analysis techniques, namely careful-seeding (CS) and distance-histogram (DH). The procedure for finding a cluster is formulated as mountain-climbing, and the mountain is defined as the convergent domain of SA.When arriving at the peak of one mountain, AFNC has found one of the clusters in the dataset, and its initial center is the peak. Then, AFNC continues to climb up another mountain from a new starting point found by CS till the termination condition is satisfied. In the procedure of climbing-up mountain, the local dense region for searching the next state of SA is found by analyzing the distance histogram. Experimental results show that AFNC can achieve consistent performance for a wide range of datasets.

Key words: clusters, simulated annealing (SA), distance histogram, careful seeding

中图分类号: