Research on automatic recognition of hand-drawn chemical molecular structures based on deep learning


Ouyang Hengjie1,Liu Wei1,Tao Jiajun1,Luo Yanghong1,Zhang Wanjia1,Zhou Jiayu1,Geng Shuqi1,Zhang Chengpeng1


1. Hunan University of Traditional Chinese Medicine


Abstract Chemical molecule structures are important in academic communication because they allow for a more direct and convenient representation of chemical knowledge. Hand-drawn chemical molecular structures are a common task for chemistry students and researchers. If hand-drawn chemical molecular structures, such as SMILES codes, could be converted into machine-readable data forms. Computers would be able to process and analyze these chemical molecular structures, greatly increasing the efficiency of chemical research. Furthermore, with the advancement of information technology in education, automatic marking is becoming increasingly popular. Teachers will benefit greatly from having a machine recognize the chemical molecular structure and then determine whether they are drawn correctly. In this study, we will investigate the chemical molecular formulas consisting of three atoms C, H, O. Because there has been little research on hand-drawn chemical molecular structures, the first major task of this paper is to create a dataset. This paper proposes a synthetic image method for quickly generating synthetic images resembling hand-drawn chemical molecular structures and improving dataset acquisition efficiency. The final recognition accuracy of the hand-drawn chemical structure recognition model designed in this paper is 96.90% in terms of model selection. The model employs the EfficientNet + Transformer encoder-decoder architecture, which outperforms other encoder-decoder combinations.


Research Square Platform LLC

Reference29 articles.

1. BALMUTH J R MCDANIEL’JR, ASSOCIATES FM, KekulC (eds) OCR-Optical Chemical (Structure) Recognition[J]. Journal of chemical information and computer sciences, 1992, 32(4): 373–378

2. Chemical literature data extraction: The CLiDE Project[J];IBISON P;J Chem Inf Comput Sci,1993

3. CLiDE Pro: The Latest Generation of CLiDE, a Tool for Optical Chemical Structure Recognition[J];VALKO A T, JOHNSON AP;J Chem Inf Model,2009

4. FUJIYOSHI A, NAKAGAWA K (2011) SUZUKI M. Robust Method of Segmentation and Recognition of Chemical Structure Images in ChemInfty[C]//Pre-proceedings of the 9th IAPR international workshop on graphics recognition, GREC.

5. Review of techniques and models used in optical chemical structure recognition in images and scanned documents[J];MUSAZADE F;J Cheminform,2022







Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3