Abstract
Statistics for machine learning come as a significant tool for studying data. General Circulation Model (GCM) are the most sophisticated model for predicting climate and weather. This study deployed a two stage of machine learning model for statistical downscaling approach to predict daily rainfall in Bogor, Indonesia. This study compared three different domains of GCM and compare two different approaches to handling missing data. First, we made two datasets based on approaches to handling missing value. Then, Support Vector Classification model was applied to classify rainy and non-rainy days. Finally, we developed a model of rainy-day data using Recurrent Neural Networks (RNN) method to estimate daily rainfall. The results show that using random forest imputation for handling missing value can increase the accuracy and lower the RMSE of the model. The best domain from GCM data is 5 km from local station climatology. SVC model with radial basis kernel is the best model for classify rainy and non-rainy data with 0.985 (98.5%) accuracy and RNN model have RMSE at 16.19. Accurately estimating the increase or decrease in extreme rainfall is crucial to provide effective recommendations in disaster mitigation efforts.