Abstract
In order to efficiently explore the chemical space of all possible small molecules, a common approach is to compress the dimension of the system to facilitate downstream machine learning tasks. Towards this end, we present a data-driven approach for clustering potential energy landscapes of molecular structures by applying recently developed Network Embedding techniques to obtain latent variables defined through the embedding function. To scale up the method, we also incorporate an entropy sensitive adaptive scheme for hierarchical sampling of the energy landscape, based on Metadynamics and Transition Path Theory. Taking into account the kinetic information implied by the energy landscape of a system, we can interpret dynamical node-node relationships in reduced dimensions. We demonstrate the framework through Lennard-Jones clusters and a human DNA sequence.