Affiliation:
1. Department of Chemistry and Applied Biosciences, ETH Zürich , Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
Abstract
Clustering has become an indispensable tool in the presence of increasingly large and complex datasets. Most clustering algorithms depend, either explicitly or implicitly, on the sampled density. However, estimated densities are fragile due to the curse of dimensionality and finite sampling effects, for instance, in molecular dynamics simulations. To avoid the dependence on estimated densities, an energy-based clustering (EBC) algorithm based on the Metropolis acceptance criterion is developed in this work. In the proposed formulation, EBC can be considered a generalization of spectral clustering in the limit of large temperatures. Taking the potential energy of a sample explicitly into account alleviates requirements regarding the distribution of the data. In addition, it permits the subsampling of densely sampled regions, which can result in significant speed-ups and sublinear scaling. The algorithm is validated on a range of test systems including molecular dynamics trajectories of alanine dipeptide and the Trp-cage miniprotein. Our results show that including information about the potential-energy surface can largely decouple clustering from the sampled density.
Funder
National Center of Competence in Research Materials’ Revolution: Computational Design and Discovery of Novel Materials
Subject
Physical and Theoretical Chemistry,General Physics and Astronomy
Reference67 articles.
1. Survey of clustering algorithms;IEEE Trans. Neural Networks,2005
2. A comprehensive survey of clustering algorithms;Ann. Data Sci.,2015
3. Density-based clustering;Wiley Interdiscip. Rev. Data Min. Knowl. Discovery,2011
4. Algorithms for hierarchical clustering: An overview;Wiley Interdiscip. Rev. Data Min. Knowl. Discovery,2012
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献