Assessing predictive performance of supervised machine learning algorithms for a diamond pricing model

Author:

Kigo Samuel Njoroge,Omondi Evans Otieno,Omolo Bernard Oguna

Abstract

AbstractThis study conducted a comprehensive analysis of multiple supervised machine learning models, regressors and classifiers, to accurately predict diamond prices. Diamond pricing is a complex task due to the non-linear relationships between key features such as carat, cut, clarity, table, and depth. The analysis aimed to develop an accurate predictive model by utilizing both regression and classification approaches. To preprocess the data, the study employed various techniques. The work addressed outliers, standardized the predictors, performed median imputation of missing values, and resolved multicollinearity issues. Equal-width binning on the cut variable was performed to handle class imbalance. Correlation-based feature selection was utilized to eliminate highly correlated variables, ensuring that only relevant features were included in the models. Outliers were handled using the inter-quartile range method, and numerical features were normalized through standardization. Missing values in numerical features were imputed using the median, preserving the integrity of the dataset. Among the models evaluated, the RF regressor exhibited exceptional performance. It achieved the lowest root mean squared error (RMSE) of 523.50, indicating superior accuracy compared to the other models. The RF regressor also obtained a high R-squared ($$\text {R}^2$$ R 2 ) score of 0.985, suggesting it explained a significant portion of the variance in diamond prices. Furthermore, the area under the curve with RF classifier for the test set was 1.00 $$\, (100\%)$$ ( 100 % ) , indicating perfect classification performance. These results solidify the RF’s position as the best-performing model in terms of accuracy and predictive power, both in regression and classification. The MLP regressor showed promising results with an RMSE of 563.74 and an $$\text {R}^2$$ R 2 score of 0.980, demonstrating its ability to capture the complex relationships in the data. Although it achieved slightly higher errors than the RF regressor, further analysis is needed to determine its suitability and potential advantages compared to the RF regressor. The XGBoost Regressor achieved an RMSE of 612.88 and an $$\text {R}^2$$ R 2 score of 0.972, indicating its effectiveness in predicting diamond prices but with slightly higher errors compared to the RF regressor. The Boosted Decision Tree Regressor had an RMSE of 711.31 and an $$\text {R}^2$$ R 2 score of 0.968, demonstrating its ability to capture some of the underlying patterns but with higher errors than the RF and XGBoost models. In contrast, the KNN regressor yielded a higher RMSE of 1346.65 and a lower $$\text {R}^2$$ R 2 score of 0.887, indicating its inferior performance in accurately predicting diamond prices compared to the other models. Similarly, the Linear Regression model performed similarly to the KNN regressor, with an RMSE of 1395.41 and an $$\text {R}^2$$ R 2 score of 0.876. The Support Vector Regression model showed the highest RMSE of 3044.49 and the lowest $$\text {R}^2$$ R 2 score of 0.421, indicating its limited effectiveness in capturing the complex relationships in the data. Overall, the study demonstrates that the RF outperforms the other models in terms of accuracy and predictive power, as evidenced by its lowest RMSE, highest $$\text {R}^2$$ R 2 score, and perfect classification performance. This highlights its suitability for accurately predicting diamond prices. The study not only provides an effective tool for the diamond industry but also emphasizes the importance of considering both regression and classification approaches in developing accurate predictive models. The findings contribute valuable insights for pricing strategies, market trends, and decision-making processes in the diamond industry and related fields.

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Reference40 articles.

1. Garside, M. Diamond industry statistics and facts. Diamond Industry, 2022 (accessed on 15 February 2022); https://www.statista.com/topics/1704/diamond-industry/#dossierContents__outerWrapper

2. Garside, M. Global diamond jewelry market value 2010–2020. Diamond Industry, 2021a (accessed on 15 November 2021); https://www.statista.com/statistics/585267/diamond-jewelry-market-value-worldwide/.

3. Garside, M. Global diamond jewelry market value by country 2020. Diamond Industry, 2021b (accessed on 15 November 2021) https://www.statista.com/statistics/585103/diamond-jewelry-market-value-worldwide-by-region/.

4. M.Garside. Global demand value for polished diamonds by country 2019 . Diamond Industry, 2020 (accessed on 11 November 2020) https://www.statista.com/statistics/894919/global-polished-diamond-demand-value-by-country/.

5. Mamonov, S. & Triantoro, T. Subjectivity of diamond prices in online retail: Insights from a data mining study. J. Theor. Appl. Electron. Commer. Res. 13(2), 15–28 (2018).

Cited by 13 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3