Affiliation:
1. National University of Modern Languages Islamabad Pakistan
2. Department of Computer Science Business Information Systems NUI Galway, Ireland
3. Faculty of Basic and Applied Sciences International Islamic University, Islamabad, Pakistan
Abstract
Due to the tremendous amount of data available today, extracting essential information from such a large volume of data is quite tough. Particularly in the case of text documents, which need a significant amount of time from the user to read the material and extract useful information. The major problem is identifying the user's relevant documents, removing the most significant pieces of information, determining document relevancy, excluding extraneous information, reducing details, and generating a compact, consistent report. For all these issues, we proposed a novel technique that solves the problem of extracting important information from a huge amount of text data and using previously read documents to generate summaries of new documents. Our technique is more focused on extracting topics (also known as topic signatures) from the previously read documents and then selecting the sentences that are more relevant to these topics based on update summary generation. Besides this, the concept of overlapping value is used that digs out the meaningful words and word similarities. Another thing that makes our work better is the Dice Coefficient which measures the intersection of words between document sets and helps to eliminate redundancy. The summary generated is based on more diverse and highly representative sentences with an average length. Empirically, we have observed that our proposed novel technique performed better with baseline competitors on the real-world TAC2008 dataset.
Subject
Computer Networks and Communications,Hardware and Architecture,Software
Reference18 articles.
1. R. Li and H. Shindo, "A hierarchical tree model for update summarization," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9022, pp. 660–665, 2015, doi: 10.1007/978-3-319-16354-3_72.
2. K. M. Svore, L. Vanderwende, and C. J. C. Burges, "Enhancing single-document summarization by combining RankNet and third-party sources," EMNLP-CoNLL 2007 - Proc. 2007 Jt. Conf. Empir. Methods Nat. Lang. Process. Comput. Nat. Lang. Learn., no. June, pp. 448–457, 2007.
3. L. Bing, P. Li, Y. Liao, W. Lam, W. Guo, and R. J. Passonneau, "Abstractive Multi-Document Summarization via Phrase Selection and Merging," ACL-IJCNLP 2015 - 53rd Annu. Meet. Assoc. Comput. Linguist. 7th Int. Jt. Conf. Nat. Lang. Process. Asian Fed. Nat. Lang. Process. Proc. Conf., vol. 1, pp. 1587–1597, Jun. 2015, doi: 10.48550/arxiv.1506.01597.
4. R. O. and S. W. Anjum. M. S, Mumtaz. S, "Heart Attack Risk Prediction with Duke Treadmill Score with Symptoms using Data Mining," I nternational J. Innov. Sci. Technol., vol. 3, no. 4, pp. 174–185, 2021.
5. C. Li, Y. Liu, and L. Zhao, "Improving update summarization via supervised ILP and sentence reranking," NAACL HLT 2015 - 2015 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Conf., no. August 2016, pp. 1317–1322, 2015, doi: 10.3115/v1/n15-1145.