BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents-Reference-Cited by-同舟云学术

BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents

Published:2022-06-28 Issue:10 Volume:36 Page:10767-10775
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Hong Teakgyu,Kim DongHyun,Ji Mingi,Hwang Wonseok,Nam Daehyun,Park Sungrae

Abstract

Key information extraction (KIE) from document images requires understanding the contextual and spatial semantics of texts in two-dimensional (2D) space. Many recent studies try to solve the task by developing pre-trained language models focusing on combining visual features from document images with texts and their layout. On the other hand, this paper tackles the problem by going back to the basic: effective combination of text and layout. Specifically, we propose a pre-trained language model, named BROS (BERT Relying On Spatiality), that encodes relative positions of texts in 2D space and learns from unlabeled documents with area-masking strategy. With this optimized training scheme for understanding texts in 2D space, BROS shows comparable or better performance compared to previous methods on four KIE benchmarks (FUNSD, SROIE*, CORD, and SciTSR) without relying on visual features. This paper also reveals two real-world challenges in KIE tasks--(1) minimizing the error from incorrect text ordering and (2) efficient learning from fewer downstream examples--and demonstrates the superiority of BROS over previous methods.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 56 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SGFNet: A semantic graph-based multimodal network for financial invoice information extraction;Expert Systems with Applications;2024-12

2. Extractive text summarization on medical insights using fine-tuned transformers;International Journal of Computers and Applications;2024-09-13

3. Deep learning approaches for information extraction from visually rich documents: datasets, challenges and methods;International Journal on Document Analysis and Recognition (IJDAR);2024-07-29

4. A task‐centric knowledge graph construction method based on multi‐modal representation learning for industrial maintenance automation;Engineering Reports;2024-07-07

5. Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30