SECTOR: A Neural Model for Coherent Topic Segmentation and Classification-Reference-Cited by-同舟云学术

SECTOR: A Neural Model for Coherent Topic Segmentation and Classification

Published:2019-11 Issue: Volume:7 Page:169-184
ISSN:2307-387X
Container-title:Transactions of the Association for Computational Linguistics
language:en
Short-container-title:Transactions of the Association for Computational Linguistics

Author:

Arnold Sebastian¹,Schneider Rudolf¹,Cudré-Mauroux Philippe²,Gers Felix A.¹,Löser Alexander¹

Affiliation:

1. Beuth University of Applied Sciences Berlin, Germany.

2. University of Fribourg, Fribourg, Switzerland.

Abstract

When searching for information, a human reader first glances over a document, spots relevant sections, and then focuses on a few sentences for resolving her intention. However, the high variance of document structure complicates the identification of the salient topic of a given section at a glance. To tackle this challenge, we present SECTOR, a model to support machine reading systems by segmenting documents into coherent sections and assigning topic labels to each section. Our deep neural network architecture learns a latent topic embedding over the course of a document. This can be leveraged to classify local topics from plain text and segment a document at topic shifts. In addition, we contribute WikiSection, a publicly available data set with 242k labeled sections in English and German from two distinct domains: diseases and cities. From our extensive evaluation of 20 architectures, we report a highest score of 71.6% F1 for the segmentation and classification of 30 topics from the English city domain, scored by our SECTOR long short-term memory model with Bloom filter embeddings and bidirectional segmentation. This is a significant improvement of 29.5 points F1 over state-of-the-art CNN classifiers with baseline segmentation.

Publisher

MIT Press - Journals

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00261

Reference63 articles.

1. Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion

2. Unit Segmentation of Argumentative Texts

3. Topic Detection and Tracking

Cited by 31 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. In-Page Navigation Aids for Screen-Reader Users with Automatic Topicalisation and Labelling;ACM Transactions on Accessible Computing;2024-06-30

2. Topic Segmentation of Educational Video Lectures Using Audio and Text;Communications in Computer and Information Science;2024

3. Global-SEG: Text Semantic Segmentation Based on Global Semantic Pair Relations;Lecture Notes in Computer Science;2024

4. Comparing neural sentence encoders for topic segmentation across domains: not your typical text similarity task;PeerJ Computer Science;2023-11-03

5. AspectMMKG: A Multi-modal Knowledge Graph with Aspect-aware Entities;Proceedings of the 32nd ACM International Conference on Information and Knowledge Management;2023-10-21