Affiliation:
1. Research Scholar, School of Computer Science and Engineering, VIT-AP University, Amaravathi, Andhra Pradesh 522237, India
2. Assistant Professor, School of Computer Science and Engineering, VIT-AP University, Amaravathi, Andhra Pradesh 522237, India
Abstract
The rapid growth of the internet and computing devices significantly increases the rise of the quantity of information. The major technique which is utilized to lower the limit of information quantity is ATS. By this technique, the text documents are precisely summarized within a short time as per user requirements. However, its effectiveness is limited due to various issues like subject identification, elucidation, summary generation and analysis of the generated summary. To tackle these issues and to obtain precise and concise summarized text, this proposed approach results as the efficient multi-document summarization approach by focusing on those issues. This proposed approach is incorporated with preprocessing, feature extraction, knowledge extraction and summarization processes. Initially, preprocessing is done by BERT tokenization and it results in preprocessed tokens from multiple documents. The features extracted from preprocessed tokens are the proposed aspect term feature, TF-IDF feature, Word2Vec feature and average word length feature. After, relevant features are extracted from the preprocessed token to train modified CNN in the feature extraction process. Instantaneously, the preprocessed tokens are employed in the knowledge extraction process for extracting essential knowledge to improve the training ability of modified CNN. This modified CNN is trained with the obtained weight, extracted feature set and knowledge to obtain precise and concise summarized text from multiple documents. Thus, the final summarized outcome has been from modified CNN with transfer learning (MCNN-TL). Finally, a variety of analyses are used to assess the suggested multi-document summary methodology, demonstrating its superiority over current summarization methods.
Publisher
World Scientific Pub Co Pte Ltd