Predicting Win-Loss outcomes in MLB regular season games – A comparative study using data mining methods-Reference-Cited by-同舟云学术

Predicting Win-Loss outcomes in MLB regular season games – A comparative study using data mining methods

Published:2016-12-01 Issue:2 Volume:15 Page:91-112
ISSN:1684-4769
Container-title:International Journal of Computer Science in Sport
language:en
Short-container-title:

Author:

Soto Valero C.¹

Affiliation:

1. Department of Computer Science, Universidad Central “Marta Abreu” de Las Villas, Cuba

Abstract

Abstract Baseball is a statistically filled sport, and predicting the winner of a particular Major League Baseball (MLB) game is an interesting and challenging task. Up to now, there is no definitive formula for determining what factors will conduct a team to victory, but through the analysis of many years of historical records many trends could emerge. Recent studies concentrated on using and generating new statistics called sabermetrics in order to rank teams and players according to their perceived strengths and consequently applying these rankings to forecast specific games. In this paper, we employ sabermetrics statistics with the purpose of assessing the predictive capabilities of four data mining methods (classification and regression based) for predicting outcomes (win or loss) in MLB regular season games. Our model approach uses only past data when making a prediction, corresponding to ten years of publicly available data. We create a dataset with accumulative sabermetrics statistics for each MLB team during this period for which data contamination is not possible. The inherent difficulties of attempting this specific sports prediction are confirmed using two geometry or topology based measures of data complexity. Results reveal that the classification predictive scheme forecasts game outcomes better than regression scheme, and of the four data mining methods used, SVMs produce the best predictive results with a mean of nearly 60% prediction accuracy for each team. The evaluation of our model is performed using stratified 10-fold cross-validation.

Publisher

Walter de Gruyter GmbH

Subject

Biomedical Engineering,General Computer Science

Reference50 articles.

1. Ahmad, A., & Dey, L. (2005). A feature selection technique for classificatory analysis. Pattern Recognition Letters, 26(1), 43-56. doi: 10.1016/j.patrec.2004.08.015

2. Alcalá-Fdez, J., Sánchez, L., García, S., Jesus, M. J., Ventura, S., Garrell, J. M., . . . Herrera, F. (2008). KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing, 13(3), 307-318. doi: 10.1007/s00500-008-0323-y

3. Aslan, B. G., & Inceoglu, M. M. (2007). A comparative study on neural network based soccer result prediction. Paper presented at the Seventh International Conference on Intelligent Systems Design and Applications.

4. Baumer, B., & Zimbalist, A. (2014). Quantifying Market Inefficiencies in the Baseball Players’ Market. Eastern Economic Journal, 40(4), 488-498. doi: 10.1057/eej.2013.43

5. Burges, C. J. C. (1998). A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2), 121-167. doi: 10.1023/a:1009715923555

Cited by 41 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Prediction of esports competition outcomes using EEG data from expert players;Computers in Human Behavior;2024-11

2. Sports betting on sports consumers’ behavioral intention;Managing Sport and Leisure;2024-07-30

3. Development of Sequential Winning Percentage Prediction Model for Badminton Competitions: Applying the Expert System Sequential Probability Ratio Test;2024-07-15

4. Machine learning-based optimization of contract renewal predictions in Korea Baseball organization;Heliyon;2023-12

5. Using Conformal Win Probability to Predict the Winners of the Canceled 2020 NCAA Basketball Tournaments;The American Statistician;2023-11-17