Affiliation:
1. Computer Engineering Department, İstanbul Ticaret University, Küçükyalı, İstanbul, TURKEY
Abstract
An Extractive Multi-Document Summarizer must select the most informative units and prevents duplication in extraction. In order to achieve this goal, a new technique, called “comprising at least one Representative Term at the Highest Frequency”, called RTHF, is proposed in this work. The units which include representative terms, but with low frequencies are not considered for extraction (selection of the most informative units). On the other hand, these units which provide RTHF feature, precede other similar units in ranking (prevents duplication). The heuristic behind the RTHF is explained by probability. RTHF was experimented on a previously developed and tested paragraph- based Extractive Multi-Document Summarizer. The results show that it enhances the original system by 0.8% ~ 3.2% (Average-F values of ROUGE metrics).
Publisher
World Scientific and Engineering Academy and Society (WSEAS)
Subject
Computer Science Applications,Control and Systems Engineering
Reference23 articles.
1. Kumar YJ, Salim N. Automatic multi document summarization approaches. J Computer Sci 2012; 8: 133-140.
2. Sizov G. Extraction-based automatic summarization - theoretical and empirical investigation of summarization techniques. MSc, Norwegian University, Norwegian, Oslo, 2010.
3. Nenkova A, McKeown K. A survey of text summarization techniques. In: Aggarwal CC, Zhai C-X, editors. Mining Text Data, USA: Springer US, 2012. pp. 43-76.
4. Das D, Martins AFT. A survey on automatic text summarization. 2007; Language Technologies Institute, Technical Report.
5. Mitra M, Singhal A, Buckley C. Automatic text summarization by paragraph extraction. In: Workshop on Intelligent Scalable Text Summarization; 11 July 1997, Madrid, Spain. pp. 39-46.