Abstract
Chart data extraction is a crucial research field in recovering information from chart images. With the recent rise in image processing and computer vision algorithms, researchers presented various approaches to tackle this problem. Nevertheless, most of them use different datasets, often not publicly available to the research community. Therefore, the main focus of this research was to create a chart data extraction algorithm for circular-shaped and grid-like chart types, which will accelerate research in this field and allow uniform result comparison. A large-scale dataset is provided containing 120,000 chart images organized into 20 categories, with corresponding ground truth for each image. Through the undertaken extensive research and to the best of our knowledge, no other author reports the chart data extraction of the sunburst diagrams, heatmaps, and waffle charts. In this research, a new, fully automatic low-level algorithm is also presented that uses a raster image as input and generates an object-oriented structure of the chart of that image. The main novelty of the proposed approach is in chart processing on binary images instead of commonly used pixel counting techniques. The experiments were performed with a synthetic dataset and with real-world chart images. The obtained results demonstrate two things: First, a low-level bottom-up approach can be shared among different chart types. Second, the proposed algorithm achieves superior results on a synthetic dataset. The achieved average data extraction accuracy on the synthetic dataset can be considered state-of-the-art within multiple error rate groups.
Subject
Electrical and Electronic Engineering,Computer Graphics and Computer-Aided Design,Computer Vision and Pattern Recognition,Radiology, Nuclear Medicine and imaging
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献