k-Means NANI: an improved clustering algorithm for Molecular Dynamics simulations-Reference-Cited by-同舟云学术

k-Means NANI: an improved clustering algorithm for Molecular Dynamics simulations

Published:2024-03-08 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Chen Lexin,Roe Daniel R.,Kochert Matthew,Simmerling Carlos^ORCID,Miranda-Quintana Ramón Alain^ORCID

Abstract

AbstractOne of the key challenges ofk-means clustering is the seed selection or the initial centroid estimation since the clustering result depends heavily on this choice. Alternatives such ask-means++ have mitigated this limitation by estimating the centroids using an empirical probability distribution. However, with high-dimensional and complex datasets such as those obtained from molecular simulation,k-means++ fails to partition the data in an optimal manner. Furthermore, stochastic elements in all flavors ofk-means++ will lead to a lack of reproducibility.K-meansN-Ary Natural Initiation (NANI) is presented as an alternative to tackle this challenge by using efficientn-ary comparisons to both identify high-density regions in the data and select a diverse set of initial conformations. Centroids generated from NANI are not only representative of the data and different from one another, helpingk-means to partition the data accurately, but also deterministic, providing consistent cluster populations across replicates. From peptide and protein folding molecular simulations, NANI was able to create compact and well-separated clusters as well as accurately find the metastable states that agree with the literature. NANI can cluster diverse datasets and be used as a standalone tool or as part of our MDANCE clustering package.

Publisher

Cold Spring Harbor Laboratory

Reference60 articles.

1. Clustering Molecular Dynamics Trajectories: 1. Characterizing the Performance of Different Clustering Algorithms;Journal of Chemical Theory and Computation,2007

2. Unsupervised Learning Methods for Molecular Simulation Data

3. Cluster analysis of molecular simulation trajectories for systems where both conformation and orientation of the sampled states are important;Journal of Computational Chemistry,2016

4. Clustering Molecular Dynamics Trajectories for Optimizing Docking Experiments;Computational Intelligence and Neuroscience,2015

5. Novel Druggable Hot Spots in Avian Influenza Neuraminidase H5N1 Revealed by Computational Solvent Mapping of a Reduced and Representative Receptor Ensemble;Chemical Biology & Drug Design,2008

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Research on the Innovation Path of Teaching Methods of Civics Classes in Colleges and Universities Based on K-means Cluster Analysis;Applied Mathematics and Nonlinear Sciences;2024-01-01