Review of techniques and models used in optical chemical structure recognition in images and scanned documents-Reference-Cited by-同舟云学术

Review of techniques and models used in optical chemical structure recognition in images and scanned documents

Published:2022-09-09 Issue:1 Volume:14 Page:
ISSN:1758-2946
Container-title:Journal of Cheminformatics
language:en
Short-container-title:J Cheminform

Author:

Musazade Fidan,Jamalova Narmin,Hasanov Jamaladdin

Abstract

AbstractExtraction of chemical formulas from images was not in the top priority of Computer Vision tasks for a while. The complexity both on the input and prediction sides has made this task challenging for the conventional Artificial Intelligence and Machine Learning problems. A binary input image which might seem trivial for convolutional analysis was not easy to classify, since the provided sample was not representative of the given molecule: to describe the same formula, a variety of graphical representations which do not resemble each other can be used. Considering the variety of molecules, the problem shifted from classification to that of formula generation, which makes Natural Language Processing (NLP) a good candidate for an effective solution. This paper describes the evolution of approaches from rule-based structure analyses to complex statistical models, and compares the efficiency of models and methodologies used in the recent years. Although the latest achievements deliver ideal results on particular datasets, the authors mention possible problems for various scenarios and provide suggestions for further development.

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Computer Graphics and Computer-Aided Design,Physical and Theoretical Chemistry,Computer Science Applications

Link

https://link.springer.com/content/pdf/10.1186/s13321-022-00642-3.pdf

Reference58 articles.

1. Rozas R, Fernandez H (1990) Automatic processing of graphics for image databases in science. J Chem Informat Comput Sci 30:7–12

2. Contreras ML, Allendes C, Alvarez LT, Rozas R (1990) Computational perception and recognition of digitized molecular structures. J Chem Inf Comput Sci 30:302–307

3. McDaniel JR, Balmuth JR (1992) Kekule: OCR-optical chemical (structure) recognition. J Chem Inf Comput Sci 32:373–378

4. Casey R, Boyer S, Healey P, Miller A, Oudot B, Zilles K (1993) Optical recognition of chemical graphics. In: Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR ’93), pp. 627–631

5. Ibison P, Jacquot M, Kam F, Neville AG, Simpson RW, Tonnelier C, Venczel T, Johnson AP (1993) Chemical literature data extraction: the CLiDE Project. J Chem Inf Comput Sci 33:338–344

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A review of transformers in drug discovery and beyond;Journal of Pharmaceutical Analysis;2024-08

2. ChemReco: automated recognition of hand-drawn carbon–hydrogen–oxygen structures using deep learning;Scientific Reports;2024-07-25

3. Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture;Journal of Cheminformatics;2024-07-05

4. Artificial Intelligence Techniques and Pedigree Charts in Oncogenetics: Towards an Experimental Multioutput Software System for Digitization and Risk Prediction;Computation;2024-03-03

5. Automation and machine learning augmented by large language models in a catalysis study;Chemical Science;2024