上海交通大学学报(英文版) ›› 2013, Vol. 18 ›› Issue (4): 418-424.doi: 10.1007/s12204-013-1416-z
YAN Ze-hua (闫泽华), LI Fang* (李 芳)
出版日期:
2013-08-28
发布日期:
2013-08-12
通讯作者:
LI Fang(李 芳)
E-mail:fli@sjtu.edu.cn
YAN Ze-hua (闫泽华), LI Fang* (李 芳)
Online:
2013-08-28
Published:
2013-08-12
Contact:
LI Fang(李 芳)
E-mail:fli@sjtu.edu.cn
摘要: Automatic thread labeling for news events can help people know different aspects of a news event. In this paper, we present a method to label threads of a news event. We use latent Dirichlet allocation (LDA) topic model to extract news threads from news corpus. Our method first selects the thread words subset then extracts phrases based on co-occurrence calculation. The extracted phrase is then used as a label of a news thread. Experimental results show that about 60% of generated labels visualize the meaningful aspects of a news event. These labels can help people fast to capture many different aspects of a news event.
中图分类号:
YAN Ze-hua (闫泽华), LI Fang* (李 芳). Thread Labeling for News Event[J]. 上海交通大学学报(英文版), 2013, 18(4): 418-424.
YAN Ze-hua (闫泽华), LI Fang* (李 芳). Thread Labeling for News Event[J]. Journal of shanghai Jiaotong University (Science), 2013, 18(4): 418-424.
[1] Cnnic. The 28th statistical report on the Internet development in China [R]. Beijing, China: CNNIC, 2011 (in Chinese).[2] Mei Q, Shen X, Zhai C. Automatic labeling of multinomial topic models [C]//Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, California, USA: ACM, 2007: 490-499.[3] Blei D M, Ng A Y, Jordan M I, et al. Latent dirichlet allocation [J]. Journal of Machine Learning Research, 2003, 3: 993-1022.[4] Nallapati R, Feng A, Peng F, et al. Event threading within news topics [C]//Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management. Washington, DC, USA: ACM, 2004: 446-453.[5] Feng A, Allan J. Finding and linking incidents in news [C]//Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management. Lisboa, Portugal: ACM, 2007: 821-830.[6] Wang X, McCallum A. Topics over time: A non-Markov continuous-time model of topical trends [C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, PA, USA: ACM, 2006: 424-433.[7] Mei Q, Liu C, Su H, et al. A probabilistic approach to spatiotemporal theme pattern mining on weblogs [C]//Proceedings of the 15th International Conference on World Wide Web. Edinburgh, Scotland: ACM, 2006: 533-542.[8] Mei Q, Zhai C. A mixture model for contextual text mining [C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, PA, USA: ACM, 2006: 649-655.[9] Wang C, Zhang M, Ma S, et al. Automatic online news issue construction in web environment [C]//Proceeding of the 17th International Conference on World Wide Web. Beijing, China: ACM, 2008: 457-466.[10] Xu R, Peng W, Xu J, et al. On-line new event detection using time window strategy [C]//The Proceeding of International Conference on Machine Learning and Cybernetics (ICMLC). Guilin, China: IEEE, 2011: 1932-1937.[11] Shen D, Yang Q, Sun J, et al. Thread detection in dynamic text message streams [C]//Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Seattle, WA, USA: ACM, 2006: 35-42.[12] Kim J, Candan K, D¨onderler M. Topic segmentation of message hierarchies for indexing and navigation support [C]//Proceedings of the 14th International Conference on World Wide Web. Chiba, Japan: ACM, 2005: 322–331.[13] Fung G P C, Yu J X, Liu H, et al. Time-dependent event hierarchy construction [C]//Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, California, USA: ACM, 2007: 300-309.[14] Trieschnigg D, Kraaij W. Hierarchical topic detection in large digital news archives [C]//Proceedings of the 5th Dutch Belgian Information Retrieval Workshop. Utrecht, The Netherlands: University of Twente, 2005: 55-62.[15] Kleinberg J. Bursty and hierarchical structure in streams [J]. Data Mining and Knowledge Discovery, 2003, 7(4): 373-397.[16] Turney P. Coherent keyphrase extraction via web mining [C]//Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03). Acapulco, Mexico: Morgan Kaufmann Publishers, 2003.[17] Ong T, Chen H, Sung W, et al. Newsmap: A knowledge map for online news [J]. Decision Support Systems, 2005, 39(4): 583-597.[18] Chang J, Boyd-Graber J, Gerrish S, et al. Reading tea leaves: How humans interpret topic models [C]//Proceedings of the 23rd Annual Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada: Curran Associates Inc., 2009.[19] Pantel P, Ravichandran D. Automatically labeling semantic classes [C]//Proceedings of HLT/NAACL. Stroudsburg, PA, USA: Association for Computational Linguistics, 2004: 321-328.[20] Yang Y, Pedersen J. A comparative study on feature selection in text categorization [C]//Proceedings of the Fourteenth International Conference on Machine Learning (ICML’97). Palo Alto, California, USA: AAAI Press, 1997: 412-420.[21] Gabrilovich E, Markovitch S. Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge [C]//Proceedings of the 21st National Conference on Artificial Intelligence. Palo Alto, California, USA: AAAI Press, 2006: 1301-1306.[22] Carmel D, Roitman H, Zwerdling N. Enhancing cluster labeling using wikipedia [C]//Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Boston, Massachusetts, USA: ACM, 2009: 139-146.[23] Lau J H, Newman D, Karimi S, et al. Best topic word selection for topic labelling [C]//Coling 2010: Posters. Beijing, China: Coling 2010 Organizing Committee, 2010: 605-613.[24] Lau J, Grieser K, Newman D, et al. Automatic labelling of topic models [C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA, USA: Association for Computational Linguistics, 2011: 1536-1545.[25] Song Y, Pan S, Liu S, et al. Topic and keyword re-ranking for LDA-based topic modeling [C]//Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York, NY, USA: ACM, 2009: 1757-1760.[26] Blei D, Lafferty J. Visualizing topics with multiword expressions [EB/OL]. (2009-07-06) [2011-07-07]. http://arxiv.org/abs/0907.1013.[27] Wilson A T, Chew P A. Term weighting schemes for latent dirichlet allocation [C]//Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010: 465-473.[28] Wang X, McCallum A, Wei X. Topical n-grams: Phrase and topic discovery, with an application to information retrieval [C]//Seventh IEEE International Conference on Data Mining. Omaha, NE, USA: IEEE, 2007: 697-702. |
[1] | . [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(6): 757-767. |
[2] | . [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(2): 190-201. |
[3] | . [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(2): 240-249. |
[4] | . [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(1): 7-14. |
[5] | . [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(1): 24-35. |
[6] | . [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(1): 99-111. |
[7] | . [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(1): 121-136. |
[8] | . [J]. J Shanghai Jiaotong Univ Sci, 2021, 26(5): 577-586. |
[9] | . [J]. J Shanghai Jiaotong Univ Sci, 2021, 26(5): 587-597. |
[10] | . [J]. J Shanghai Jiaotong Univ Sci, 2021, 26(5): 670-679. |
[11] | SHI Lianxing (石连星), WANG Zhiheng (王志恒), LI Xiaoyong (李小勇) . Novel Data Placement Algorithm for Distributed Storage System Based on Fault-Tolerant Domain[J]. J Shanghai Jiaotong Univ Sci, 2021, 26(4): 463-470. |
[12] | ZHAN Zhu (占竹), ZHANG Wenjun (张文俊), CHEN Xia (陈霞), WANG Jun (汪军) . Objective Evaluation of Fabric Flatness Grade Based on Convolutional Neural Network[J]. J Shanghai Jiaotong Univ Sci, 2021, 26(4): 503-510. |
[13] | LIU Ziwen (刘子文), XIAO Lei (肖雷), BAO Jinsong (鲍劲松), TAO Qingbao (陶清宝) . Bearing Incipient Fault Detection Method Based on Stochastic Resonance with Triple-Well Potential System[J]. J Shanghai Jiaotong Univ Sci, 2021, 26(4): 482-487. |
[14] | MA Qunsheng (马群圣), CEN Xingxing (岑星星), YUAN Junyi (袁骏毅), HOU Xumin (侯旭敏). Word Embedding Bootstrapped Deep Active Learning Method to Information Extraction on Chinese Electronic Medical Record[J]. J Shanghai Jiaotong Univ Sci, 2021, 26(4): 494-502. |
[15] | SHAN Rui (山蕊), JIANG Lin (蒋林), WU Haoyue (吴昊玥), HE Feilong (贺飞龙), LIU Xinchuang (刘新闯). Dynamical Self-Reconfigurable Mechanism for Data-Driven Cell Array[J]. J Shanghai Jiaotong Univ Sci, 2021, 26(4): 511-521. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||