Journal of Shanghai Jiao Tong University (Science) ›› 2019, Vol. 24 ›› Issue (3): 364-371.doi: 10.1007/s12204-019-2072-8
SONG Huilin (宋慧琳), PENG Diyun (彭迪云), HUANG Xin *(黄欣), FENG Jun (冯俊)
出版日期:
2019-06-01
发布日期:
2019-05-29
通讯作者:
HUANG Xin *(黄欣)
E-mail:1610466@tongji.edu.cn
SONG Huilin (宋慧琳), PENG Diyun (彭迪云), HUANG Xin *(黄欣), FENG Jun (冯俊)
Online:
2019-06-01
Published:
2019-05-29
Contact:
HUANG Xin *(黄欣)
E-mail:1610466@tongji.edu.cn
摘要: Weibo, also known as micro-blog, with its extremely low threshold of information release and interactive communication mode, has become the primary source and communication form of Internet hotspots. However, characterized as a kind of short text, the sparsity in semantic features, plus its colloquial and diversified expressions makes clustering analysis more difficult. In order to solve the above problems, we use the Biterm topic model (BTM) to extract features from the corpus and use vector space model (VSM) to strengthen the features to reduce the vector dimension and highlight the main features. Then, an improved Weibo feature-incorporated incremental clustering algorithm and the Weibo buzz calculation formula are proposed to describe the buzz of Weibo, and then the discovery of hotspots can be reasonably made. The experimental results show that the incremental clustering algorithm presented in this paper can effectively improve the accuracy of clustering in different dimensions. Meanwhile, the calculation formula of Weibo buzz reasonably describes the evolution process of Weibo buzz from a qualitative point of view, which can help discover the hotspots effectively.
中图分类号:
SONG Huilin (宋慧琳), PENG Diyun (彭迪云), HUANG Xin *(黄欣), FENG Jun (冯俊). Research on Weibo Hotspot Finding Based on Self-Adaptive Incremental Clustering[J]. Journal of Shanghai Jiao Tong University (Science), 2019, 24(3): 364-371.
SONG Huilin (宋慧琳), PENG Diyun (彭迪云), HUANG Xin *(黄欣), FENG Jun (冯俊). Research on Weibo Hotspot Finding Based on Self-Adaptive Incremental Clustering[J]. Journal of Shanghai Jiao Tong University (Science), 2019, 24(3): 364-371.
[1] | GUO C L. Research and design on hot topic detectionand tracking system in internet [D]. Chengdu,China: University of Electronic Science and Technologyof China, 2013 (in Chinese). |
[2] | L¨U R F. Language features of Weibo [J]. Journal ofChangchun Education Institute, 2013, 29(14): 42-44(in Chinese). |
[3] | LI Y D. Research on hot topic detection methods formicroblog [D]. Nanjing, China: Nanjing Normal University,2013 (in Chinese). |
[4] | BEIL F, ESTER M, XU X W. Frequent termbasedtext clustering [C]//Proceedings of the 8th ACMSIGKDD International Conference on Knowledge Discoveryand Data Mining. Edmonton, Alberta, Canada:ACM, 2002: 436-442. |
[5] | HU J X, Xu H B, LIU Y, et al. Algorithm of repeatsbasedterm extraction and its application in text clustering[J]. Computer Engineering, 2007, 33(2): 65-67(in Chinese). |
[6] | GABRILOVICH E, MARKOVITCH S. Feature generationfor textual information retrieval using worldknowledge [D]. Haifa, Israel: Israel Institute of Technology,2006. |
[7] | LIU X L, CAO F Y, LIANG J Y. incremental algorithmfor clustering short texts on news comments [J].Journal of Frontiers of Computer Science and Technology,2018, 12(6): 950-960 (in Chinese). |
[8] | HOTHO A, STAAB S, STUMME G. Ontologies improvetext document clustering [C]//Proceedings of3rd IEEE International Conference on Data Mining.Melbourne, FL, USA: IEEE, 2003: 1-4. |
[9] | FREY B J, DUECK D. Clustering by passing messagesbetween data points [J]. Science, 2007, 315(5814):972-976. |
[10] | SONG L, ZHANG P J. System design of micro-blogpublic opinion based on LDA topic modeling method[J]. Network Security Technology & Application, 2014(4): 5-6 (in Chinese). |
[11] | TANG Q L. Short text clustering method based onBTM [D]. Hefei, China: Anhui University, 2014 (inChinese). |
[12] | ZHANG Y. A short text similarity calculation methodbased on feature extension using BTM topic mode [D].Hefei, China: Anhui University, 2014 (in Chinese). |
[13] | ALLAN J. Introduction to topic detection and tracking[C]//Topic Detection and Tracking. Boston, MA:Springer, 2002: 1-16. |
[14] | XU X P. The Methods and characteristics of predictingfuture via twitter [D]. Hangzhou, China: ZhejiangUniversity, 2011 (in Chinese). |
[15] | SAKAKI T, OKAZAKI M, MATSUO Y. Earthquakeshakes Twitter users: Real-time event detection by socialsensors [C]//Proceedings of the 19th InternationalConference on World WIDE WEB. Raleigh, NC, USA:ACM, 2010: 851-860. |
[16] | PHUVIPADAWAT S, MURATA T. Breakingnews detection and tracking in Twitter [C]//2010IEEE/WIC/ACM International Conference on WebIntelligence and Intelligent Agent Technology. Toronto,ON, Canada: IEEE, 2010: 120-123. |
[17] | O’CONNOR B, BALASUBRAMANYAN R, ROUTLEDGEB R, et al. From tweets to polls: Linking textsentiment to public opinion time series [C]//The 4thInternational AAAI Conference on Weblogs and SocialMedia. Washington, DC, USA: AAAI, 2010: 122-129. |
[18] | NIE W H, ZENG C, JIA D W. Microblog hot topicsdetection based on heat matrix [J]. Computer Engineering,2017, 43(2): 57-62 (in Chinese). |
[19] | JIANG H M. Characteristics of microblog and its influenceon public opinion [J]. Journalism Lover, 2011(5):85-86 (in Chinese). |
[20] | YANG L, LIN Y, LIN H F. Micro-blog hot events detectionbased on emotion distribution [J]. Journal ofChinese Information Processing, 2012, 26(1): 84-90(in Chinese). |
[21] | CHENG J S, SUN A, HU D N, et al. An Informationdiffusion-based recommendation framework for microblogging[J]. Journal of the Association for InformationSystems, 2011, 12(7): 463-486. |
[1] | MA Qunsheng (马群圣), CEN Xingxing (岑星星), YUAN Junyi (袁骏毅), HOU Xumin (侯旭敏). Word Embedding Bootstrapped Deep Active Learning Method to Information Extraction on Chinese Electronic Medical Record[J]. J Shanghai Jiaotong Univ Sci, 2021, 26(4): 494-502. |
[2] | WANG Yinglin (王英林), WANG Ming (王明). Fine-Grained Opinion Extraction from Chinese Car Reviews with an Integrated Strategy[J]. Journal of Shanghai Jiao Tong University (Science), 2018, 23(5): 620-626. |
[3] | YE Feiyue (叶飞跃), MA Yixing (马祎星). Research on Web Page Classification Method Based on Query Log[J]. sa, 2018, 23(3): 404-. |
[4] | XU Zewen1,2 (许泽文), LI Jianqiang1,2,3,4* (李建强), LIU Bo1 (刘博),BI Jing1 (毕敬), LI Ro. Semi-Supervised Learning in Large Scale Text Categorization[J]. 上海交通大学学报(英文版), 2017, 22(3): 291-302. |
[5] | LONG Haixia (龙海霞), ZHUO Li* (卓 力), QU Panling (屈盼玲), ZHANG Jing (张 菁). Weak Correlation Dictionary Construction Method for Sparse Coding[J]. 上海交通大学学报(英文版), 2017, 22(1): 77-081. |
[6] | WANG Xiangdong1,3* (王向东), YANG Yang2 (杨阳), ZHANG Jinchao3 (张金超), JIANG Wenbin3 (. Chinese to Braille Translation Based on Braille Word Segmentation Using Statistical Model[J]. 上海交通大学学报(英文版), 2017, 22(1): 82-086. |
[7] | LIU Li-zhen (刘丽珍), LIU Hao (刘昊), WANG Han-shi* (王函石),SONG Wei (宋巍), ZHAO Xin-lei. Generating Domain-Specific Affective Ontology from Chinese Reviews for Sentiment Analysis[J]. 上海交通大学学报(英文版), 2015, 20(1): 32-37. |
[8] | WANG Fei (王飞), LI Cai-hong* (李彩虹), WANG Jing-shan (王景山),XU Jiao (徐娇), LI Lian (李. A Two-Stage Feature Selection Method for Text Categorization by Using Category Correlation Degree and Latent Semantic Indexing[J]. 上海交通大学学报(英文版), 2015, 20(1): 44-50. |
[9] | LIU Xiao-qiang1*(刘晓强), ZHANG Tian-xin1 (章田鑫), TAO Li2 (陶莉),REN Jian-jun1 (任建军), . Online Mind-Map as Interface of Electronic Resource Integration and Sharing[J]. 上海交通大学学报(英文版), 2015, 20(1): 101-105. |
[10] | TONG Tong* (佟 彤), CAI Yan (孙大为), SUN Da-wei (孙大为), WU Yi-xiong (吴毅雄). Modified Gray Level Difference-Based Thresholding Segmentation and its Application in X-Ray Welding Image[J]. 上海交通大学学报(英文版), 2013, 18(4): 448-453. |
[11] | TANG Qi (唐琪), WANG Ying-lin* (王英林), ZHANG Ming-lu (张明禄) . Ontology-Based Approach for Legal Provision Retrieval[J]. 上海交通大学学报(英文版), 2012, 17(2): 135-140. |
阅读次数 | ||||||||||||||||||||||||||||||||||||||||||||||||||
全文 113
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||
摘要 494
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||