上海交通大学学报(自然版) ›› 2012, Vol. 46 ›› Issue (11): 1753-1758.

• 自动化技术、计算机技术 • 上一篇    下一篇

基于局部和全局的LDA话题演化分析

章建,李芳   

  1. (上海交通大学 计算机科学与工程系, 上海  200240)
  • 收稿日期:2012-03-30 出版日期:2012-11-30 发布日期:2012-11-30
  • 基金资助:

    国家自然科学基金资助项目(60873134)

LDA Topic Evolution Based on Global and Local Modeling

 ZHANG  Jian, LI  Fang   

  1. (Deptartment of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200240, China)
  • Received:2012-03-30 Online:2012-11-30 Published:2012-11-30

摘要: 对话题演化进行形式化描述,探讨了基于全局和局部话题演化的2种建模方式,并应用话题相似度和困惑度进行评测.对房地产话题和奥运会话题进行实例分析,给出了2种不同建模方法在话题演化方面的优缺点.两会报告实验结果表明,全局话题演化能够获得较好的模型参数,方法简单可靠;而局部话题演化则能产生细粒度话题,反映新话题的产生和旧话题的消亡.    

关键词: 文字信息处理, 狄利特利分布, 话题关联和演化

Abstract: Topic evolution means the changes of contents and strength of a topic over time. This paper first gives the definition of topic evolution, describes two methods of topic evolution based on global and local documents. Two metrics of topic similarity and perplexity are used to evaluate both methods. The evolutions of the two topics (the real estate vs. the 2008 Olympic games) are analyzed. The experiments on the recent five years of NPC&CPPCC news reports show that topic evolution based on global documents can get good topic model, the evolution method is easy, while topic evolution based on local documents can produce fine topics and show the arising of new topics and the vanishing of old topics.  

Key words: text information processing, latent dirichlet allocation, topic detection and evolution

中图分类号: