Journal of Shanghai Jiao Tong University (Science) ›› 2019, Vol. 24 ›› Issue (3): 364-371.doi: 10.1007/s12204-019-2072-8

Previous Articles     Next Articles

Research on Weibo Hotspot Finding Based on Self-Adaptive Incremental Clustering

Research on Weibo Hotspot Finding Based on Self-Adaptive Incremental Clustering

SONG Huilin (宋慧琳), PENG Diyun (彭迪云), HUANG Xin *(黄欣), FENG Jun (冯俊)   

  1. (1. School of Management, Nanchang University, Nanchang 330031, China; 2. School of Economics and Management, Nanchang University, Nanchang 330031, China; 3. School of Electronics and Information Engineering, Tongji University, Shanghai 200082, China)
  2. (1. School of Management, Nanchang University, Nanchang 330031, China; 2. School of Economics and Management, Nanchang University, Nanchang 330031, China; 3. School of Electronics and Information Engineering, Tongji University, Shanghai 200082, China)
  • Online:2019-06-01 Published:2019-05-29
  • Contact: HUANG Xin *(黄欣) E-mail:1610466@tongji.edu.cn

Abstract: Weibo, also known as micro-blog, with its extremely low threshold of information release and interactive communication mode, has become the primary source and communication form of Internet hotspots. However, characterized as a kind of short text, the sparsity in semantic features, plus its colloquial and diversified expressions makes clustering analysis more difficult. In order to solve the above problems, we use the Biterm topic model (BTM) to extract features from the corpus and use vector space model (VSM) to strengthen the features to reduce the vector dimension and highlight the main features. Then, an improved Weibo feature-incorporated incremental clustering algorithm and the Weibo buzz calculation formula are proposed to describe the buzz of Weibo, and then the discovery of hotspots can be reasonably made. The experimental results show that the incremental clustering algorithm presented in this paper can effectively improve the accuracy of clustering in different dimensions. Meanwhile, the calculation formula of Weibo buzz reasonably describes the evolution process of Weibo buzz from a qualitative point of view, which can help discover the hotspots effectively.

Key words: incremental clustering| Weibo| hotspot finding

摘要: Weibo, also known as micro-blog, with its extremely low threshold of information release and interactive communication mode, has become the primary source and communication form of Internet hotspots. However, characterized as a kind of short text, the sparsity in semantic features, plus its colloquial and diversified expressions makes clustering analysis more difficult. In order to solve the above problems, we use the Biterm topic model (BTM) to extract features from the corpus and use vector space model (VSM) to strengthen the features to reduce the vector dimension and highlight the main features. Then, an improved Weibo feature-incorporated incremental clustering algorithm and the Weibo buzz calculation formula are proposed to describe the buzz of Weibo, and then the discovery of hotspots can be reasonably made. The experimental results show that the incremental clustering algorithm presented in this paper can effectively improve the accuracy of clustering in different dimensions. Meanwhile, the calculation formula of Weibo buzz reasonably describes the evolution process of Weibo buzz from a qualitative point of view, which can help discover the hotspots effectively.

关键词: incremental clustering| Weibo| hotspot finding

CLC Number: