EmmDocClassifier: Efficient Multimodal Document Image Classifier for Scarce Data-Reference-Cited by-同舟云学术

EmmDocClassifier: Efficient Multimodal Document Image Classifier for Scarce Data

Published:2022-01-29 Issue:3 Volume:12 Page:1457
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Kanchi Shrinidhi,Pagani Alain,Mokayed Hamam^ORCID,Liwicki Marcus^ORCID,Stricker Didier,Afzal Muhammad Zeshan^ORCID

Abstract

Document classification is one of the most critical steps in the document analysis pipeline. There are two types of approaches for document classification, known as image-based and multimodal approaches. Image-based document classification approaches are solely based on the inherent visual cues of the document images. In contrast, the multimodal approach co-learns the visual and textual features, and it has proved to be more effective. Nonetheless, these approaches require a huge amount of data. This paper presents a novel approach for document classification that works with a small amount of data and outperforms other approaches. The proposed approach incorporates a hierarchical attention network (HAN) for the textual stream and the EfficientNet-B0 for the image stream. The hierarchical attention network in the textual stream uses dynamic word embedding through fine-tuned BERT. HAN incorporates both the word level and sentence level features. While earlier approaches rely on training on a large corpus (RVL-CDIP), we show that our approach works with a small amount of data (Tobacco-3482). To this end, we trained the neural network at Tobacco-3482 from scratch. Therefore, we outperform the state-of-the-art by obtaining an accuracy of 90.3%. This results in a relative error reduction rate of 7.9%.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/3/1457/pdf

Reference57 articles.

1. Limitations of information extraction methods and techniques for heterogeneous unstructured big data

2. Improving Document-Level Sentiment Classification Using Importance of Sentences

3. Rethinking Complex Neural Network Architectures for Document Classification

4. Hierarchical Attention Networks for Document Classification

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DocXclassifier: towards a robust and interpretable deep neural network for document image classification;International Journal on Document Analysis and Recognition (IJDAR);2024-06-25

2. Multimodal Information Extraction:A Systematic Review of Subtask, Modal Types and Applications Based on Deep Learning in Banking Sector;2024 5th International Conference for Emerging Technology (INCET);2024-05-24

3. VisFormers—Combining Vision and Transformers for Enhanced Complex Document Classification;Machine Learning and Knowledge Extraction;2024-02-16

4. Enhancing Document Image Retrieval in Education: Leveraging Ensemble-Based Document Image Retrieval Systems for Improved Precision;Applied Sciences;2024-01-16

5. Fine-tuned convolutional neural networks for feature extraction and classification of scanned document images using semi-automatic labelling approach;International Journal of Intelligent Engineering Informatics;2024