Weibo, also known as micro-blog, with its extremely low threshold of information release and interactive
communication mode, has become the primary source and communication form of Internet hotspots. However,
characterized as a kind of short text, the sparsity in semantic features, plus its colloquial and diversified expressions
makes clustering analysis more difficult. In order to solve the above problems, we use the Biterm topic model
(BTM) to extract features from the corpus and use vector space model (VSM) to strengthen the features to
reduce the vector dimension and highlight the main features. Then, an improved Weibo feature-incorporated
incremental clustering algorithm and the Weibo buzz calculation formula are proposed to describe the buzz of
Weibo, and then the discovery of hotspots can be reasonably made. The experimental results show that the
incremental clustering algorithm presented in this paper can effectively improve the accuracy of clustering in
different dimensions. Meanwhile, the calculation formula of Weibo buzz reasonably describes the evolution process
of Weibo buzz from a qualitative point of view, which can help discover the hotspots effectively.
SONG Huilin (宋慧琳)
,
PENG Diyun (彭迪云)
,
HUANG Xin *(黄欣)
,
FENG Jun (冯俊)
. Research on Weibo Hotspot Finding Based on Self-Adaptive Incremental Clustering[J]. Journal of Shanghai Jiaotong University(Science), 2019
, 24(3)
: 364
-371
.
DOI: 10.1007/s12204-019-2072-8
[1] GUO C L. Research and design on hot topic detectionand tracking system in internet [D]. Chengdu,China: University of Electronic Science and Technologyof China, 2013 (in Chinese).
[2] L¨U R F. Language features of Weibo [J]. Journal ofChangchun Education Institute, 2013, 29(14): 42-44(in Chinese).
[3] LI Y D. Research on hot topic detection methods formicroblog [D]. Nanjing, China: Nanjing Normal University,2013 (in Chinese).
[4] BEIL F, ESTER M, XU X W. Frequent termbasedtext clustering [C]//Proceedings of the 8th ACMSIGKDD International Conference on Knowledge Discoveryand Data Mining. Edmonton, Alberta, Canada:ACM, 2002: 436-442.
[5] HU J X, Xu H B, LIU Y, et al. Algorithm of repeatsbasedterm extraction and its application in text clustering[J]. Computer Engineering, 2007, 33(2): 65-67(in Chinese).
[6] GABRILOVICH E, MARKOVITCH S. Feature generationfor textual information retrieval using worldknowledge [D]. Haifa, Israel: Israel Institute of Technology,2006.
[7] LIU X L, CAO F Y, LIANG J Y. incremental algorithmfor clustering short texts on news comments [J].Journal of Frontiers of Computer Science and Technology,2018, 12(6): 950-960 (in Chinese).
[8] HOTHO A, STAAB S, STUMME G. Ontologies improvetext document clustering [C]//Proceedings of3rd IEEE International Conference on Data Mining.Melbourne, FL, USA: IEEE, 2003: 1-4.
[9] FREY B J, DUECK D. Clustering by passing messagesbetween data points [J]. Science, 2007, 315(5814):972-976.
[10] SONG L, ZHANG P J. System design of micro-blogpublic opinion based on LDA topic modeling method[J]. Network Security Technology & Application, 2014(4): 5-6 (in Chinese).
[11] TANG Q L. Short text clustering method based onBTM [D]. Hefei, China: Anhui University, 2014 (inChinese).
[12] ZHANG Y. A short text similarity calculation methodbased on feature extension using BTM topic mode [D].Hefei, China: Anhui University, 2014 (in Chinese).
[13] ALLAN J. Introduction to topic detection and tracking[C]//Topic Detection and Tracking. Boston, MA:Springer, 2002: 1-16.
[14] XU X P. The Methods and characteristics of predictingfuture via twitter [D]. Hangzhou, China: ZhejiangUniversity, 2011 (in Chinese).
[15] SAKAKI T, OKAZAKI M, MATSUO Y. Earthquakeshakes Twitter users: Real-time event detection by socialsensors [C]//Proceedings of the 19th InternationalConference on World WIDE WEB. Raleigh, NC, USA:ACM, 2010: 851-860.
[16] PHUVIPADAWAT S, MURATA T. Breakingnews detection and tracking in Twitter [C]//2010IEEE/WIC/ACM International Conference on WebIntelligence and Intelligent Agent Technology. Toronto,ON, Canada: IEEE, 2010: 120-123.
[17] O’CONNOR B, BALASUBRAMANYAN R, ROUTLEDGEB R, et al. From tweets to polls: Linking textsentiment to public opinion time series [C]//The 4thInternational AAAI Conference on Weblogs and SocialMedia. Washington, DC, USA: AAAI, 2010: 122-129.
[18] NIE W H, ZENG C, JIA D W. Microblog hot topicsdetection based on heat matrix [J]. Computer Engineering,2017, 43(2): 57-62 (in Chinese).
[19] JIANG H M. Characteristics of microblog and its influenceon public opinion [J]. Journalism Lover, 2011(5):85-86 (in Chinese).
[20] YANG L, LIN Y, LIN H F. Micro-blog hot events detectionbased on emotion distribution [J]. Journal ofChinese Information Processing, 2012, 26(1): 84-90(in Chinese).
[21] CHENG J S, SUN A, HU D N, et al. An Informationdiffusion-based recommendation framework for microblogging[J]. Journal of the Association for InformationSystems, 2011, 12(7): 463-486.