Affiliation:
1. Key Laboratory of Geoscience Big Data and Deep Resource of Zhejiang Province School of Earth Sciences Zhejiang University Hangzhou China
2. School of Computing National University of Singapore Singapore Singapore
3. School of Earth Sciences China University of Geosciences Wuhan China
4. School of Earth Sciences Lanzhou University Lanzhou China
5. School of Information Engineering China University of Geosciences Beijing China
6. Department of Data Science Nissan Motor Corporation Yokohama Japan
Abstract
AbstractAlthough machine learning (ML) has brought new insights into geochemistry research, its implementation is laborious and time‐consuming. Here, we announce Geochemistry π, an open‐source automated ML Python framework. Geochemists only need to provide tabulated data and select the desired options to clean data and run ML algorithms. The process operates in a question‐and‐answer format, and thus does not require that users have coding experience. After either automatic or manual parameter tuning, the automated Python framework provides users with performance and prediction results for the trained ML model. Based on the scikit‐learn library, Geochemistry π has established a customized automated process for implementing classification, regression, dimensionality reduction, and clustering algorithms. The Python framework enables extensibility and portability by constructing a hierarchical pipeline architecture that separates data transmission from the algorithm application. The AutoML module is constructed using the Cost‐Frugal Optimization and Blended Search Strategy hyperparameter search methods from the A Fast and Lightweight AutoML Library, and the model parameter optimization process is accelerated by the Ray distributed computing framework. The MLflow library is integrated into ML lifecycle management, which allows users to compare multiple trained models at different scales and manage the data and diagrams generated. In addition, the front‐end and back‐end frameworks are separated to build the web portal, which demonstrates the ML model and data science workflow through a user‐friendly web interface. In summary, Geochemistry π provides a Python framework for users and developers to accelerate their data mining efficiency with both online and offline operation options.
Publisher
American Geophysical Union (AGU)
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献