Automatic Multi-Document Summarization Based on Keyword Density and Sentence-Word Graphs

Expand
  • (School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China)

Online published: 2018-08-02

Abstract

As a fundamental and effective tool for document understanding and organization, multi-document summarization enables better information services by creating concise and informative reports for large collections of documents. In this paper, we propose a sentence-word two layer graph algorithm combining with keyword density to generate the multi-document summarization, known as Graph & Keywordρ. The traditional graph methods of multi-document summarization only consider the influence of sentence and word in all documents rather than individual documents. Therefore, we construct multiple word graph and extract right keywords in each document to modify the sentence graph and to improve the significance and richness of the summary. Meanwhile, because of the differences in the words importance in documents, we propose to use keyword density for the summaries to provide rich content while using a small number of words. The experiment results show that the Graph & Keywordρ method outperforms the state of the art systems when tested on the Duc2004 data set.

Cite this article

YE Feiyue (叶飞跃), XU Xinchen (徐欣辰) . Automatic Multi-Document Summarization Based on Keyword Density and Sentence-Word Graphs[J]. Journal of Shanghai Jiaotong University(Science), 2018 , 23(4) : 584 . DOI: 10.1007/s12204-018-1957-2

References

[1] CHAO S, Tao L. Multi-document summarization viathe minimum dominating set [C]//Proceedings of the23rd International Conference on Computational Linguistics.Beijing: ACM, 2010: 984-992. [2] BHARTI S K, BABU K S, PRADHAN A. Automatickeyword extraction for text summarization in multidocumente-newspapers articles [J]. European Journalof Advances in Engineering and Technology, 2017,4(6): 410-427. [3] MA L, HE T, LI F, et al. Query-focused multidocumentsummarization using keyword extraction[C]//Proceedings of 2008 International Conference onComputer Science and Software Engineering. Wuhan:IEEE, 2008: 20-23. [4] LITVAK M, LAST M. Graph-based keywordextraction for single-document summarization[C]//Proceedings of the Workshop on Multi-sourceMultilingual Information Extraction and Summarization.Manchester, UK: ACM, 2008: 17-24. [5] HONG K, CONROY J M, FAVRE B, et al. Arepository of state of the art and competitivebaseline summaries for generic news summarization[C]//Proceedings of the 9th International Conferenceon Language Resources and Evaluation. Reykjavik,Iceland: ELRA, 2014: 1608-1616. [6] RADEV D R, JING H, STYS M, et al. Centroid-basedsummarization of multiple documents [J]. InformationProcessing & Management, 2004, 40(6): 919-938. [7] ERKAN G, RADEV D R. Lexrank: Graph-based lexicalcentrality as salience in text summarization [J].Journal of Artificial Intelligence Research, 2004, 22(1):457-479. [8] WAN X, YANG J. Multi-document summarization usingcluster-based link analysis [C]//Proceedings of the31st Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval.Singapore: ACM, 2008: 299-306. [9] WAN X, YANG J, XIAO J. Manifold-ranking basedtopic-focused multi-document summarization [C]//Proceedings of the 20th International Joint Conferenceon Artifical Intelligence. Hyderabad, India: MorganKaufmann Publishers Inc., 2007: 2903-2908. [10] WAN X, XIAO J. Graph-based multi-modality learningfor topic-focused multi-document summarization[C]//Proceedings of the 21th International Joint Conferenceon Artificial Intelligence. Pasadena, California,USA: Morgan Kaufmann Publishers Inc., 2009: 1586-1591. [11] CAO Z, LI W, LI S, et al. Improving multi-documentsummarization via text classification [C]//Proceedingsof the 31st AAAI Conference on Artificial Intelligence.San Francisco, California, USA: AAAI, 2017: 3053-3059. [12] HADYAN F, SHAUFIAH BIJAKSANA M A. Comparisonof document index graph using TextRank andHITS weighting method in automatic text summarization[J]. Journal of Physics: Conference Series, 2017,801(1): 012076. [13] XIONG C, LI Y, LV K. Multi-documents summarizationbased on the TextRank and its application in argumentationsystem [C]//Proceedings of the 5th InternationalConference on Emerging Internetworking, Data& Web Technologies. Wuhan, China: Springer, 2017:457-466. [14] YU S, SU J, LI P, et al. Towards high performance textmining: A TextRank-based method for automatic textsummarization [J]. International Journal of Grid andHigh Performance Computing, 2016, 8(2): 58-75. [15] BRITSOM D V, BRONSELAER A, TR′E G D. Usingdata merging techniques for generating multidocumentsummarizations [J]. IEEE Transactions on Fuzzy Systems,2015, 23(3): 576-592. [16] BARRIOS F, L′OPEZ F, ARGERICH L, et al. Variationsof the similarity function of TextRank for automatedsummarization [EB/OL]. (2016-02-11). [2017-10-23]. https://arxio.org/pdf/1602.03606.pdf. [17] AL-HASHEMI R. Text summarization extraction system(TSES) Using extracted keywords [J]. InternationalArab Journal of E-Technology, 2010, 1(4): 164-168. [18] LIN C Y. ROUGE: A package for automatic evaluationof summaries [C]//Proceedings of Workshop on TextSummarization Branches Out. Barcelina, Spain: ACL,2004. [19] WANG D, ZHU S, LI T, et al. Integrating documentclustering and multidocument summarization[J]. ACM Transactions on Knowledge Discovery fromData, 2011, 5(3): 1-26. [20] KULESZA A, TASKAR B. Determinantal point processesfor machine learning [J]. Foundations andTrends? in Machine Learning, 2012, 5(2/3): 123-286. [21] DAVIS S T, CONROY J M, SCHLESINGER JD. OCCAMS — An optimal combinatorial coveringalgorithm for multi-document summarization[C]//Proceedings of the 2012 IEEE 12th InternationalConference on Data Mining Workshops. Brussels, Belgium:IEEE, 2012: 454-463.
Options
Outlines

/