Abstract
The use of machine learning in real estate is quite new. When the working area is large, the factors affecting the price may vary according to the geographical regions and socioeconomic factors. It is thought that the price prediction performance of a model that will reflect these differences will be more successful than a general model. Unsupervised learning methods can be used both to increase performance and to show the variation of different factors affecting the price according to regions. With this aim, a hybrid model of X-Means clustering and CART decision trees was established in this study. This model successfully learned the geographical and physical variables that affect the price. The prediction performance of the model was compared with the direct capitalization method, which is the gold standard in the domain. The hybrid model has a superior performance over direct capitalization in terms of mean square error, root mean square error and adjusted R-Squared metrics. The scores were 72.86, 0.0057 and 0.978, respectively. The effect of clustering was also examined. Clustering increased the prediction performance by 36%.