An automatic system for extracting figure-caption pair from medical documents: a six-fold approach-Reference-Cited by-同舟云学术

An automatic system for extracting figure-caption pair from medical documents: a six-fold approach

Published:2023-07-26 Issue: Volume:9 Page:e1452
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Chaki Jyotismita¹

Affiliation:

1. Department of Computational Intelligence, School of Computer Science and Engineering, Vellore Instiute of Technology, Vellore, India

Abstract

Background Figures and captions in medical documentation contain important information. As a result, researchers are becoming more interested in obtaining published medical figures from medical papers and utilizing the captions as a knowledge source. Methods This work introduces a unique and successful six-fold methodology for extracting figure-caption pairs. The A-torus wavelet transform is used to retrieve the first edge from the scanned page. Then, using the maximally stable extremal regions connected component feature, text and graphical contents are isolated from the edge document, and multi-layer perceptron is used to successfully detect and retrieve figures and captions from medical records. The figure-caption pair is then extracted using the bounding box approach. The files that contain the figures and captions are saved separately and supplied to the end useras theoutput of any investigation. The proposed approach is evaluated using a self-created database based on the pages collected from five open access books: Sergey Makarov, Gregory Noetscher and Aapo Nummenmaa’s book “Brain and Human Body Modelling 2021”, “Healthcare and Disease Burden in Africa” by Ilha Niohuru, “All-Optical Methods to Study Neuronal Function” by Eirini Papagiakoumou, “RNA, the Epicenter of Genetic Information” by John Mattick and Paulo Amaral and “Illustrated Manual of Pediatric Dermatology” by Susan Bayliss Mallory, Alanna Bree and Peggy Chern. Results Experiments and findings comparing the new method to earlier systems reveal a significant increase in efficiency, demonstrating the suggested technique’s robustness and efficiency.

Publisher

PeerJ

Subject

General Computer Science

Link

https://peerj.com/articles/cs-1452.pdf

Reference35 articles.

1. Auto-CORPus: a natural language processing tool for standardising and reusing biomedical literature;Beck;bioRxiv,2021

2. Figure metadata extraction from digital documents;Choudhury,2013

3. Looking beyond text: extracting figures, tables and captions from computer science papers;Clark,2015

4. Using deep learning to segment breast and fibroglandular tissue in MRI volumes;Dalmış;Medical Physics,2017

5. Design and development of a multimodal biomedical information retrieval system;Demner-Fushman;Journal of Computing Science and Engineering,2012