Journal of shanghai Jiaotong University (Science) ›› 2017, Vol. 22 ›› Issue (2): 139-147.doi: 10.1007/s12204-017-1813-9

Previous Articles     Next Articles

Automatically Finding the Number of Clusters Based on Simulated Annealing

Automatically Finding the Number of Clusters Based on Simulated Annealing

YANG Zhengwu (杨政武), HUO Hong (霍宏), FANG Tao*(方涛)   

  1. (Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China)
  2. (Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China)
  • Online:2017-03-31 Published:2017-04-04
  • Contact: FANG Tao(方涛) E-mail:tfang@sjtu.edu.cn

Abstract:

Based on simulated annealing (SA), automatically finding the number of clusters (AFNC) is proposed in this paper to determine the number of clusters and their initial centers. It is a simple and automatic method that combines local search with two widely-accepted global analysis techniques, namely careful-seeding (CS) and distance-histogram (DH). The procedure for finding a cluster is formulated as mountain-climbing, and the mountain is defined as the convergent domain of SA.When arriving at the peak of one mountain, AFNC has found one of the clusters in the dataset, and its initial center is the peak. Then, AFNC continues to climb up another mountain from a new starting point found by CS till the termination condition is satisfied. In the procedure of climbing-up mountain, the local dense region for searching the next state of SA is found by analyzing the distance histogram. Experimental results show that AFNC can achieve consistent performance for a wide range of datasets.

Key words:

clusters| simulated annealing (SA)| distance histogram| careful seeding

摘要:

Based on simulated annealing (SA), automatically finding the number of clusters (AFNC) is proposed in this paper to determine the number of clusters and their initial centers. It is a simple and automatic method that combines local search with two widely-accepted global analysis techniques, namely careful-seeding (CS) and distance-histogram (DH). The procedure for finding a cluster is formulated as mountain-climbing, and the mountain is defined as the convergent domain of SA.When arriving at the peak of one mountain, AFNC has found one of the clusters in the dataset, and its initial center is the peak. Then, AFNC continues to climb up another mountain from a new starting point found by CS till the termination condition is satisfied. In the procedure of climbing-up mountain, the local dense region for searching the next state of SA is found by analyzing the distance histogram. Experimental results show that AFNC can achieve consistent performance for a wide range of datasets.

关键词:

clusters| simulated annealing (SA)| distance histogram| careful seeding

CLC Number: