Abstract
AbstractPublished reports of chemical compounds often contain multiple machine-readable descriptions which may supplement each other in order to yield coherent and complete chemical representations. This publication presents a method to cross-check such descriptions using a canonical representation and isomorphism of molecular graphs. If immediate agreement between compound descriptions is not found, the algorithm derives the minimal set of simplifications required for both descriptions to arrive to a matching form (if any). The proposed algorithm is used to cross-check chemical descriptions from the Crystallography Open Database to identify coherently described entries as well as those requiring further curation.
Funder
Research Council of Lithuania
Publisher
Springer Science and Business Media LLC
Subject
Library and Information Sciences,Computer Graphics and Computer-Aided Design,Physical and Theoretical Chemistry,Computer Science Applications
Reference44 articles.
1. Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Félix E, Magariños MP, Mosquera JF, Mutowo P, Nowotka M, Gordillo-Marañón M, Hunter F, Junco L, Mugumbate G, Rodriguez-Lopez M, Atkinson F, Bosc N, Radoux CJ, Segura-Cabrera A, Hersey A, Leach AR (2018) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47(D1):930–940. https://doi.org/10.1093/nar/gky1075
2. Gražulis S, Daškevič A, Merkys A, Chateigner D, Lutterotti L, Quirós M, Serebryanaya NR, Moeck P, Downs RT, Le Bail A (2012) Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration. Nucleic Acids Res 40(D1):420–427. https://doi.org/10.1093/nar/gkr900
3. Murray-Rust P, Rzepa H (2011) CML: evolution and design. J Cheminformatics 3:44. https://doi.org/10.1186/1758-2946-3-44
4. Anderson E, Veith GD, Weininger D (1987) SMILES: a line notation and computerized interpreter for chemical structures. Technical report, Environmental Research Laboratory-Duluth
5. Heller SR, McNaught A, Pletnev I, Stein S, Tchekhovskoi D (2015) InChI, the IUPAC International Chemical Identifier. J Cheminformatics 7(1):23. https://doi.org/10.1186/s13321-015-0068-4
Cited by
42 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献