An algorithm to classify homologous series within compound datasets

Author:

Lai AdeleneORCID,Schaub JonasORCID,Steinbeck ChristophORCID,Schymanski Emma L.ORCID

Abstract

AbstractHomologous series are groups of related compounds that share the same core structure attached to a motif that repeats to different degrees. Compounds forming homologous series are of interest in multiple domains, including natural products, environmental chemistry, and drug design. However, many homologous compounds remain unannotated as such in compound datasets, which poses obstacles to understanding chemical diversity and their analytical identification via database matching. To overcome these challenges, an algorithm to detect homologous series within compound datasets was developed and implemented using the RDKit. The algorithm takes a list of molecules as SMILES strings and a monomer (i.e., repeating unit) encoded as SMARTS as its main inputs. In an iterative process, substructure matching of repeating units, molecule fragmentation, and core detection lead to homologous series classification through grouping of identical cores. Three open compound datasets from environmental chemistry (NORMAN Suspect List Exchange, NORMAN-SLE), exposomics (PubChemLite for Exposomics), and natural products (the COlleCtion of Open NatUral producTs, COCONUT) were subject to homologous series classification using the algorithm. Over 2000, 12,000, and 5000 series with CH2 repeating units were classified in the NORMAN-SLE, PubChemLite, and COCONUT respectively. Validation of classified series was performed using published homologous series and structure categories, including a comparison with a similar existing method for categorising PFAS compounds. The OngLai algorithm and its implementation for classifying homologues are openly available at: https://github.com/adelenelai/onglai-classify-homologues.

Funder

Fonds National de la Recherche Luxembourg

Carl-Zeiss-Stiftung

Friedrich-Schiller-Universität Jena

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Computer Graphics and Computer-Aided Design,Physical and Theoretical Chemistry,Computer Science Applications

Reference83 articles.

1. Markush EA (1924) Pyrazolone Dye and Process of Making the Same. USA101506316, August 26, 1924. https://pdfpiw.uspto.gov/.piw?PageNum=USA101506316&docid=01506316&IDKey=83E682D73B35&HomeUrl=http%3A%2F%2Fpatft.uspto.gov%2Fnetacgi%2Fnph-Parser%3FSect1%3DPTO1%2526Sect2%3DHITOFF%2526p%3D1%2526u%3D%2Fnetahtml%2FPTO%2Fsrchnum.html%2526r%3D1%2526f%3DG%2526l%3D50%2526d%3DPALL%2526s1%3D1506316.PN.%2526OS%3D%2526RS%3D . Accessed 25 Mar 2022

2. Lima LM, Alves MA, Amaral DN (2019) Homologation: a versatile molecular modification strategy to drug discovery. Curr Top Med Chem. 19:1734–1750. https://doi.org/10.2174/1568026619666190808145235

3. Niemczak M, Rzemieniecki T, Sobiech Ł, Skrzypczak G, Praczyk T, Pernak J (2019) Influence of the alkyl chain length on the physicochemical properties and biological activity in a homologous series of dichlorprop-based herbicidal ionic liquids. J Mol Liq 276:431–440. https://doi.org/10.1016/j.molliq.2018.12.013

4. Zhu J-P, Liang M-Y, Ma Y-R, White LV, Banwell MG, Teng Y, Lan P (2022) Enzymatic synthesis of an homologous series of long- and very long-chain sucrose esters and evaluation of their emulsifying and biological properties. Food Hydrocoll 124:107149. https://doi.org/10.1016/j.foodhyd.2021.107149

5. Wolf SE, Liu T, Govind S, Zhao H, Huang G, Zhang A, Wu Y, Chin J, Cheng K, Salami-Ranjbaran E, Gao F, Gao G, Jin Y, Pu Y, Toledo TG, Ablajan K, Walsh PJ, Fakhraai Z (2021) Design of a homologous series of molecular glassformers. J Chem Phys 155(22):224503. https://doi.org/10.1063/5.0066410

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3