An exploration of machine learning models for the determination of reaction coordinates associated with conformational transitions

Author:

Naleem Nawavi1ORCID,Abreu Charlles R. A.2ORCID,Warmuz Krzysztof3ORCID,Tong Muchen4ORCID,Kirmizialtin Serdal145ORCID,Tuckerman Mark E.4678ORCID

Affiliation:

1. Chemistry Program, Science Division, New York University 1 , Abu Dhabi, UAE

2. Chemical Engineering Department, Escola de Química, Universidade Federal do Rio de Janeiro 2 , 21941-909 Rio de Janeiro, RJ, Brazil

3. Computer Science Program, Science Division, New York University 3 , Abu Dhabi, UAE

4. Department of Chemistry, New York University (NYU) 4 , New York, New York 10003, USA

5. Center for Smart Engineering Materials 8 , New York University, Abu Dhabi, UAE

6. Courant Institute of Mathematical Sciences, New York University 5 , New York, New York 10012, USA

7. NYU-ECNU Center for Computational Chemistry at NYU Shanghai 6 , 3663 Zhongshan Rd. North, Shanghai 200062, China

8. Simons Center for Computational Physical Chemistry at New York University 7 , New York, New York 10003, USA

Abstract

Determining collective variables (CVs) for conformational transitions is crucial to understanding their dynamics and targeting them in enhanced sampling simulations. Often, CVs are proposed based on intuition or prior knowledge of a system. However, the problem of systematically determining a proper reaction coordinate (RC) for a specific process in terms of a set of putative CVs can be achieved using committor analysis (CA). Identifying essential degrees of freedom that govern such transitions using CA remains elusive because of the high dimensionality of the conformational space. Various schemes exist to leverage the power of machine learning (ML) to extract an RC from CA. Here, we extend these studies and compare the ability of 17 different ML schemes to identify accurate RCs associated with conformational transitions. We tested these methods on an alanine dipeptide in vacuum and on a sarcosine dipeptoid in an implicit solvent. Our comparison revealed that the light gradient boosting machine method outperforms other methods. In order to extract key features from the models, we employed Shapley Additive exPlanations analysis and compared its interpretation with the “feature importance” approach. For the alanine dipeptide, our methodology identifies ϕ and θ dihedrals as essential degrees of freedom in the C7ax to C7eq transition. For the sarcosine dipeptoid system, the dihedrals ψ and ω are the most important for the cisαD to transαD transition. We further argue that analysis of the full dynamical pathway, and not just endpoint states, is essential for identifying key degrees of freedom governing transitions.

Funder

National Science Foundation

New York University Abu Dhabi

U.S. Department of Energy

Publisher

AIP Publishing

Subject

Physical and Theoretical Chemistry,General Physics and Astronomy

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Molecular Simulation Meets Machine Learning;Journal of Chemical & Engineering Data;2023-12-19

2. Minimal Peptoid Dynamics Inform Self-Assembly Propensity;The Journal of Physical Chemistry B;2023-12-01

3. Toward a structural identification of metastable molecular conformations;The Journal of Chemical Physics;2023-09-15

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3