Abstract
This study addresses the challenge of predicting the tensile stress of natural rubber with limited molecular dynamics simulation data, which is a crucial mechanical property for this material. Molecular dynamics (MD) simulations are limited by their scale and computational cost, making it difficult to obtain sufficient data to train machine learning algorithms. To overcome this limitation, we propose a machine learning framework involving three stages: (1) utilizing a Variational Autoencoder (VAE) to rapidly expand the data diversity; (2) employing Ordinary Kriging (OK) to label the VAE-generated virtual samples; and (3) training gradient enhanced regression [Gradient Boosting Regression (GBR)] models by using relevant data on tensile stress in natural rubber. The results demonstrate that the generated data exhibits enhanced rationality, significantly improving the accuracy and reliability of various regression models. This approach provides an effective solution to the problem of data scarcity in MD simulations.