Author:
Ouyang Hengjie,Liu Wei,Tao Jiajun,Luo Yanghong,Zhang Wanjia,Zhou Jiayu,Geng Shuqi,Zhang Chengpeng
Abstract
AbstractChemical molecular structures are a direct and convenient means of expressing chemical knowledge, playing a vital role in academic communication. In chemistry, hand drawing is a common task for students and researchers. If we can convert hand-drawn chemical molecular structures into machine-readable formats, like SMILES encoding, computers can efficiently process and analyze these structures, significantly enhancing the efficiency of chemical research. Furthermore, with the progress of educational technology, automated grading is gaining popularity. When machines automatically recognize chemical molecular structures and assess the correctness of the drawings, it offers great convenience to teachers. We created ChemReco, a tool designed to identify chemical molecular structures involving three atoms: C, H, and O, providing convenience for chemical researchers. Currently, there are limited studies on hand-drawn chemical molecular structures. Therefore, the primary focus of this paper is constructing datasets. We propose a synthetic image method to rapidly generate images resembling hand-drawn chemical molecular structures, enhancing dataset acquisition efficiency. Regarding model selection, the hand-drawn chemical molecule structural recognition model developed in this article achieves a final recognition accuracy of 96.90%. This model employs the encoder-decoder architecture of EfficientNet + Transformer, demonstrating superior performance compared to other encoder-decoder combinations.
Funder
Natural Science Funding Project of Hunan Province
Natural Science Funding of Changsha City
Publisher
Springer Science and Business Media LLC
Reference29 articles.
1. McDaniel, J. R., Balmuth, J. R., Associates, F.-M. KekulC: OCR-Optical chemical (structure) recognition. J. Chem. Inf. Comput. Sci. 32, 373–378 (1992).
2. Ibison, P. et al. Chemical literature data extraction: The CLiDE project. J. Chem. Inf. Comput. Sci. 33, 338–344 (1993).
3. Valko, A. T. & Johnson, A. P. CLiDE Pro: The latest generation of CLiDE, a tool for optical chemical structure recognition. J. Chem. Inf. Model. 49, 780–787 (2009).
4. Fujiyoshi, A., Nakagawa, K. & Suzuki, M. Robust method of segmentation and recognition of chemical structure images in cheminfty (2011).
5. Musazade, F., Jamalova, N. & Hasanov, J. Review of techniques and models used in optical chemical structure recognition in images and scanned documents. J. Cheminform. 14, 61 (2022).