Author:
Vineetha Borra, ,Harini D. N. D.,Yelesvarupu Ravi, ,
Abstract
In the recent advancement, the extensive usage of electronic devices to photograph and upload documents, the requirement for extracting the information present in the unstructured document images is becoming progressively intense. The major obstacle to the objective is, these images often contain information in tabular form and extracting the data from table images presents a series of challenges due to the various layouts and encodings of the tables. It includes the accurate detection of the table present in an image and eventually recognizing the internal structure of the table and extracting the information from it. Although some progress has been made in table detection, obtaining the table contents is still a challenge since this involves more fine-grained table structure (rows and columns) recognition. The digitization of critical information has to be carried out automatically since there are millions of documents. Based on the motivation that AI-based solutions are automating many processors, this work comprises three different stages: First, the table detection using Faster R-CNN algorithm. Second, table internal structure recognition process using morphology operation and refine operation and last the table data extraction using contours algorithm. The dataset used in this work was taken from the UNLV dataset.
Publisher
Blue Eyes Intelligence Engineering and Sciences Engineering and Sciences Publication - BEIESP
Subject
Electrical and Electronic Engineering,Mechanics of Materials,Civil and Structural Engineering,General Computer Science
Reference13 articles.
1. S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards Real-time Object Detection with Region Proposal Networks," in Advances in Neural Information Processing Systems, pp. 91-99, v3, 2016.
2. P. Pyreddy and W. B. Croft. "Tintin: A System for Retrieval in Text Tables", in Proceedings of the Second ACM International Conference on Digital Libraries, pp.193-200, 1997.
3. Miao Fan and Doo Soon Kim, "Detecting Table Region in PDF Documents Using Distant Supervision", corpus ID: 14348894, Version 6, 2015.
4. Florence Folake Babatunde, Bolanle Adefowoke Ojokoh, Samuel Adebayo Oluwadare, "Automatic Table Recognition and Extraction from Heterogeneous Documents", Journal of Computer and Communications 03, pp 100-110, 2015
5. H. T. Ha, M. Medved, Z. Neverilova, and A. Horak, "Recognition of OCR Invoice Metadata Block Types", 21st International Conference, TSD 2018, Proceedings, Pp. 304-312.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Detection and Recognition of Table structures from Unstructured Documents;2024 Conference on Information Communications Technology and Society (ICTAS);2024-03-07