A visual transformer-based smart textual extraction method for financial invoices-Reference-Cited by-同舟云学术

A visual transformer-based smart textual extraction method for financial invoices

Published:2023 Issue:10 Volume:20 Page:18630-18649
ISSN:1551-0018
Container-title:Mathematical Biosciences and Engineering
language:
Short-container-title:MBE

Author:

Wang Tao¹,Qiu Min²

Affiliation:

1. School of Innovation and Entrepreneurship, Zhengzhou University of Science and Technology, Zhengzhou 450064, China

2. Institute of Business Administration, Zhengzhou University of Science and Technology, Zhengzhou 450064, China

Abstract

<abstract><p>In era of big data, the computer vision-assisted textual extraction techniques for financial invoices have been a major concern. Currently, such tasks are mainly implemented via traditional image processing techniques. However, they highly rely on manual feature extraction and are mainly developed for specific financial invoice scenes. The general applicability and robustness are the major challenges faced by them. As consequence, deep learning can adaptively learn feature representation for different scenes and be utilized to deal with the above issue. As a consequence, this work introduces a classic pre-training model named visual transformer to construct a lightweight recognition model for this purpose. First, we use image processing technology to preprocess the bill image. Then, we use a sequence transduction model to extract information. The sequence transduction model uses a visual transformer structure. In the stage target location, the horizontal-vertical projection method is used to segment the individual characters, and the template matching is used to normalize the characters. In the stage of feature extraction, the transformer structure is adopted to capture relationship among fine-grained features through multi-head attention mechanism. On this basis, a text classification procedure is designed to output detection results. Finally, experiments on a real-world dataset are carried out to evaluate performance of the proposal and the obtained results well show the superiority of it. Experimental results show that this method has high accuracy and robustness in extracting financial bill information.</p></abstract>

Publisher

American Institute of Mathematical Sciences (AIMS)

Subject

Applied Mathematics,Computational Mathematics,General Agricultural and Biological Sciences,Modeling and Simulation,General Medicine

Reference36 articles.

1. Y. Chen, C. Liu, W. Huang, S. Cheng, R. Arcucci, Z. Xiong, Generative text-guided 3d vision-language pretraining for unified medical image segmentation, preprint, arXiv: 2306.04811. https://doi.org/10.48550/arXiv.2306.04811

2. Z. Wan, C. Liu, M. Zhang, J. Fu, B. Wang, S. Cheng, et al., Med-unic: Unifying cross-lingual medical vision-language pre-training by diminishing bias, preprint, arXiv: 2305.19894. https://doi.org/10.48550/arXiv.2305.19894

3. C. Liu, S. Cheng, C. Chen, M. Qiao, W. Zhang, A. Shah, et al., M-FLAG: medical vision-language pre-training with frozen language models and latent space geometry optimization, preprint, arXiv: 2307.08347. https://doi.org/10.48550/arXiv.2307.08347

4. Z. Guo, K. Yu, N. Kumar, W. Wei, S. Mumtaz, M. Guizani, Deep distributed learning-based poi recommendation under mobile edge networks, IEEE Internet Things J., 10 (2023), 303–317. https://doi.org/10.1109/JIOT.2022.3202628

5. Y. Jin, L. Hou, Y. Chen, A time series transformer based method for the rotating machinery fault diagnosis, Neurocomputing, 494 (2022), 379–395. https://doi.org/10.1016/j.neucom.2022.04.111