Journal of Shanghai Jiao Tong University (Science) ›› 2018, Vol. 23 ›› Issue (4): 584-.doi: 10.1007/s12204-018-1957-2

• • 上一篇    

Automatic Multi-Document Summarization Based on Keyword Density and Sentence-Word Graphs

YE Feiyue (叶飞跃), XU Xinchen (徐欣辰)   

  1. (School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China)
  • 发布日期:2018-08-02
  • 通讯作者: XU Xinchen (徐欣辰) E-mail: xinchenxu8011802@gmail.com

Automatic Multi-Document Summarization Based on Keyword Density and Sentence-Word Graphs

YE Feiyue (叶飞跃), XU Xinchen (徐欣辰)   

  1. (School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China)
  • Published:2018-08-02
  • Contact: XU Xinchen (徐欣辰) E-mail: xinchenxu8011802@gmail.com

摘要: As a fundamental and effective tool for document understanding and organization, multi-document summarization enables better information services by creating concise and informative reports for large collections of documents. In this paper, we propose a sentence-word two layer graph algorithm combining with keyword density to generate the multi-document summarization, known as Graph & Keywordρ. The traditional graph methods of multi-document summarization only consider the influence of sentence and word in all documents rather than individual documents. Therefore, we construct multiple word graph and extract right keywords in each document to modify the sentence graph and to improve the significance and richness of the summary. Meanwhile, because of the differences in the words importance in documents, we propose to use keyword density for the summaries to provide rich content while using a small number of words. The experiment results show that the Graph & Keywordρ method outperforms the state of the art systems when tested on the Duc2004 data set.

关键词: multi-document, graph algorithm, keyword density, Graph & Keywordρ, Duc2004

Abstract: As a fundamental and effective tool for document understanding and organization, multi-document summarization enables better information services by creating concise and informative reports for large collections of documents. In this paper, we propose a sentence-word two layer graph algorithm combining with keyword density to generate the multi-document summarization, known as Graph & Keywordρ. The traditional graph methods of multi-document summarization only consider the influence of sentence and word in all documents rather than individual documents. Therefore, we construct multiple word graph and extract right keywords in each document to modify the sentence graph and to improve the significance and richness of the summary. Meanwhile, because of the differences in the words importance in documents, we propose to use keyword density for the summaries to provide rich content while using a small number of words. The experiment results show that the Graph & Keywordρ method outperforms the state of the art systems when tested on the Duc2004 data set.

Key words: multi-document, graph algorithm, keyword density, Graph & Keywordρ, Duc2004

中图分类号: