On methods and tools of table detection, extraction and annotation in PDF documents-Reference-Cited by-同舟云学术

On methods and tools of table detection, extraction and annotation in PDF documents

Published:2014-10-03 Issue:1 Volume:41 Page:41-57
ISSN:0165-5515
Container-title:Journal of Information Science
language:en
Short-container-title:Journal of Information Science

Author:

Khusro Shah¹,Latif Asima¹,Ullah Irfan¹

Affiliation:

1. Department of Computer Science, University of Peshawar, Pakistan

Abstract

Table detection, extraction and annotation have been an important research problem for years. To handle this issue, different approaches have been designed for different types of documents. Among these PDF is a widely used format for preserving and presenting different types of documents. We investigate the state of the art in table detection, extraction and annotation in PDF documents. Because of varying table structural anatomy, the state of the art in table-related research enumerates a number of approaches that are critically and analytically investigated for identifying their strengths and limitations as well as for making recommendations for further improvement. An evaluation framework is contributed that compares different information extraction tools that may be used in table detection, extraction and annotation. We found very limited attention towards these aspects in books, especially books in PDF format. There is no searching solution that can find books having tables that are semantically related to a table in a given book.

Publisher

SAGE Publications

Subject

Library and Information Sciences,Information Systems

Link

http://journals.sagepub.com/doi/pdf/10.1177/0165551514551903

Reference113 articles.

1. Yıldız B. Information extraction–utilizing table patterns: Master’s thesis, Vienna University of Technology, 2004.

2. Web Publishing with Acrobat/PDF

3. Towards a theory of tables

4. A model for detecting and merging vertically spanned table cells in plain text documents

Cited by 42 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A task‐centric knowledge graph construction method based on multi‐modal representation learning for industrial maintenance automation;Engineering Reports;2024-07-07

2. Deep Learning for Table Detection and Structure Recognition: A Survey;ACM Computing Surveys;2024-04-10

3. Towards End-to-End Semi-supervised Table Detection with Semantic Aligned Matching Transformer;Lecture Notes in Computer Science;2024

4. UnSupDLA: Towards Unsupervised Document Layout Analysis;Lecture Notes in Computer Science;2024

5. Environmental, Social, and Governance Taxonomy Simplification: A Hybrid Text Mining Approach;Journal of Emerging Technologies in Accounting;2023-05-01