上海交通大学学报(英文版) ›› 2013, Vol. 18 ›› Issue (4): 418-424.doi: 10.1007/s12204-013-1416-z
YAN Ze-hua (闫泽华), LI Fang* (李 芳)
出版日期:
2013-08-28
发布日期:
2013-08-12
通讯作者:
LI Fang(李 芳)
E-mail:fli@sjtu.edu.cn
YAN Ze-hua (闫泽华), LI Fang* (李 芳)
Online:
2013-08-28
Published:
2013-08-12
Contact:
LI Fang(李 芳)
E-mail:fli@sjtu.edu.cn
摘要: Automatic thread labeling for news events can help people know different aspects of a news event. In this paper, we present a method to label threads of a news event. We use latent Dirichlet allocation (LDA) topic model to extract news threads from news corpus. Our method first selects the thread words subset then extracts phrases based on co-occurrence calculation. The extracted phrase is then used as a label of a news thread. Experimental results show that about 60% of generated labels visualize the meaningful aspects of a news event. These labels can help people fast to capture many different aspects of a news event.
中图分类号:
YAN Ze-hua (闫泽华), LI Fang* (李 芳). Thread Labeling for News Event[J]. 上海交通大学学报(英文版), 2013, 18(4): 418-424.
YAN Ze-hua (闫泽华), LI Fang* (李 芳). Thread Labeling for News Event[J]. Journal of shanghai Jiaotong University (Science), 2013, 18(4): 418-424.
[1] Cnnic. The 28th statistical report on the Internet development in China [R]. Beijing, China: CNNIC, 2011 (in Chinese). [2] Mei Q, Shen X, Zhai C. Automatic labeling of multinomial topic models [C]//Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, California, USA: ACM, 2007: 490-499. [3] Blei D M, Ng A Y, Jordan M I, et al. Latent dirichlet allocation [J]. Journal of Machine Learning Research, 2003, 3: 993-1022. [4] Nallapati R, Feng A, Peng F, et al. Event threading within news topics [C]//Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management. Washington, DC, USA: ACM, 2004: 446-453. [5] Feng A, Allan J. Finding and linking incidents in news [C]//Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management. Lisboa, Portugal: ACM, 2007: 821-830. [6] Wang X, McCallum A. Topics over time: A non-Markov continuous-time model of topical trends [C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, PA, USA: ACM, 2006: 424-433. [7] Mei Q, Liu C, Su H, et al. A probabilistic approach to spatiotemporal theme pattern mining on weblogs [C]//Proceedings of the 15th International Conference on World Wide Web. Edinburgh, Scotland: ACM, 2006: 533-542. [8] Mei Q, Zhai C. A mixture model for contextual text mining [C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, PA, USA: ACM, 2006: 649-655. [9] Wang C, Zhang M, Ma S, et al. Automatic online news issue construction in web environment [C]//Proceeding of the 17th International Conference on World Wide Web. Beijing, China: ACM, 2008: 457-466. [10] Xu R, Peng W, Xu J, et al. On-line new event detection using time window strategy [C]//The Proceeding of International Conference on Machine Learning and Cybernetics (ICMLC). Guilin, China: IEEE, 2011: 1932-1937. [11] Shen D, Yang Q, Sun J, et al. Thread detection in dynamic text message streams [C]//Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Seattle, WA, USA: ACM, 2006: 35-42. [12] Kim J, Candan K, D¨onderler M. Topic segmentation of message hierarchies for indexing and navigation support [C]//Proceedings of the 14th International Conference on World Wide Web. Chiba, Japan: ACM, 2005: 322–331. [13] Fung G P C, Yu J X, Liu H, et al. Time-dependent event hierarchy construction [C]//Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, California, USA: ACM, 2007: 300-309. [14] Trieschnigg D, Kraaij W. Hierarchical topic detection in large digital news archives [C]//Proceedings of the 5th Dutch Belgian Information Retrieval Workshop. Utrecht, The Netherlands: University of Twente, 2005: 55-62. [15] Kleinberg J. Bursty and hierarchical structure in streams [J]. Data Mining and Knowledge Discovery, 2003, 7(4): 373-397. [16] Turney P. Coherent keyphrase extraction via web mining [C]//Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03). Acapulco, Mexico: Morgan Kaufmann Publishers, 2003. [17] Ong T, Chen H, Sung W, et al. Newsmap: A knowledge map for online news [J]. Decision Support Systems, 2005, 39(4): 583-597. [18] Chang J, Boyd-Graber J, Gerrish S, et al. Reading tea leaves: How humans interpret topic models [C]//Proceedings of the 23rd Annual Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada: Curran Associates Inc., 2009. [19] Pantel P, Ravichandran D. Automatically labeling semantic classes [C]//Proceedings of HLT/NAACL. Stroudsburg, PA, USA: Association for Computational Linguistics, 2004: 321-328. [20] Yang Y, Pedersen J. A comparative study on feature selection in text categorization [C]//Proceedings of the Fourteenth International Conference on Machine Learning (ICML’97). Palo Alto, California, USA: AAAI Press, 1997: 412-420. [21] Gabrilovich E, Markovitch S. Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge [C]//Proceedings of the 21st National Conference on Artificial Intelligence. Palo Alto, California, USA: AAAI Press, 2006: 1301-1306. [22] Carmel D, Roitman H, Zwerdling N. Enhancing cluster labeling using wikipedia [C]//Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Boston, Massachusetts, USA: ACM, 2009: 139-146. [23] Lau J H, Newman D, Karimi S, et al. Best topic word selection for topic labelling [C]//Coling 2010: Posters. Beijing, China: Coling 2010 Organizing Committee, 2010: 605-613. [24] Lau J, Grieser K, Newman D, et al. Automatic labelling of topic models [C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA, USA: Association for Computational Linguistics, 2011: 1536-1545. [25] Song Y, Pan S, Liu S, et al. Topic and keyword re-ranking for LDA-based topic modeling [C]//Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York, NY, USA: ACM, 2009: 1757-1760. [26] Blei D, Lafferty J. Visualizing topics with multiword expressions [EB/OL]. (2009-07-06) [2011-07-07]. http://arxiv.org/abs/0907.1013. [27] Wilson A T, Chew P A. Term weighting schemes for latent dirichlet allocation [C]//Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010: 465-473. [28] Wang X, McCallum A, Wei X. Topical n-grams: Phrase and topic discovery, with an application to information retrieval [C]//Seventh IEEE International Conference on Data Mining. Omaha, NE, USA: IEEE, 2007: 697-702. |
[1] | 蒋祖华1, 周宏明2, 陶宁蓉3, 李柏鹤1. 基于知识的船舶曲面分段建造调度及应用[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(5): 759-765. |
[2] | 于佳琪1,王殊轶1,王浴屺1,谢华2,吴张檑1,付小妮1,马邦峰1. 基于增强现实技术的新型经皮肾穿刺训练可视化工具[J]. J Shanghai Jiaotong Univ Sci, 2023, 28(4): 517-. |
[3] | 姜锐1,朱瑞祥1,蔡萧萃1,苏虎2. 具有增强注意力的前景分割网络[J]. J Shanghai Jiaotong Univ Sci, 2023, 28(3): 360-369. |
[4] | 祝 楷, 熊柏青, 闫宏伟, 张永安, 李志辉, 李锡武, 刘红伟, 温 凯, 闫丽珍, . 辊道传送速度对大规格铝合金厚板应力分布及演变影响的数值模拟研究[J]. J Shanghai Jiaotong Univ Sci, 2023, 28(2): 255-263. |
[5] | . [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(6): 757-767. |
[6] | . [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(2): 190-201. |
[7] | . [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(2): 240-249. |
[8] | . [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(1): 7-14. |
[9] | . [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(1): 24-35. |
[10] | . [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(1): 99-111. |
[11] | . [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(1): 121-136. |
[12] | . [J]. J Shanghai Jiaotong Univ Sci, 2021, 26(5): 577-586. |
[13] | . [J]. J Shanghai Jiaotong Univ Sci, 2021, 26(5): 587-597. |
[14] | . [J]. J Shanghai Jiaotong Univ Sci, 2021, 26(5): 670-679. |
[15] | SHI Lianxing (石连星), WANG Zhiheng (王志恒), LI Xiaoyong (李小勇) . Novel Data Placement Algorithm for Distributed Storage System Based on Fault-Tolerant Domain[J]. J Shanghai Jiaotong Univ Sci, 2021, 26(4): 463-470. |
阅读次数 | ||||||||||||||||||||||||||||||||||||||||||||||||||
全文 298
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||
摘要 704
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||