Paragraph-level attention based deep model for chapter segmentation-Reference-Cited by-同舟云学术

Paragraph-level attention based deep model for chapter segmentation

Published:2022-06-10 Issue: Volume:8 Page:e1003
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Virameteekul Paveen

Abstract

Books are usually divided into chapters and sections. Correctly and automatically recognizing chapter boundaries can work as a proxy when segmenting long texts (a more general task). Book chapters can be easily segmented by humans, but automatic segregation is more challenging because the data is semi-structured. Since the concept of language is prone to ambiguity, it is essential to identify the relationship between the words in each paragraph and classify each consecutive paragraph based on their respective relationships with one another. Although researchers have designed deep learning-based models to solve this problem, these approaches have not considered the paragraph-level semantics among the consecutive paragraphs. In this article, we propose a novel deep learning-based method to segment book chapters that uses paragraph-level semantics and an attention mechanism. We first utilized a pre-trained XLNet model connected to a convolutional neural network (CNN) to extract the semantic meaning of each paragraph. Then, we measured the similarities in the semantics of each paragraph and designed an attention mechanism to inject the similarity information in order to better predict the chapter boundaries. The experimental results indicated that the performance of our proposed method can surpass those of other state-of-the-art (SOTA) methods for chapter segmentation on public datasets (the proposed model achieved an F1 score of 0.8856, outperforming the Bidirectional Encoder Representations from Transformers (BERT) model’s F1 score of 0.6640). The ablation study also illustrated that the paragraph-level attention mechanism could produce a significant increase in performance.

Publisher

PeerJ

Subject

General Computer Science

Link

https://peerj.com/articles/cs-1003.pdf

Reference41 articles.

1. Understanding of a convolutional neural network;Albawi,2017

2. A comparative analysis of machine/deep learning models for parking space availability prediction;Awan;Sensors,2020

3. Natural language processing (NLP) based text summarization - a survey;Awasthi,2021

4. Attention-based neural text segmentation;Badjatiya;ArXiv preprint,2018

5. Neural machine translation by jointly learning to align and translate;Bahdanau;ArXiv preprint,2015

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multi-Wiki90k: Multilingual Benchmark Dataset for Paragraph Segmentation;Advances in Computational Collective Intelligence;2022