BACKGROUND
Individual factors related to performance in age group triathletes competing in different race distances have been explored in scientific literature. However, only a few studies have been conducted using machine learning (ML) predictive models to explore the im-portance of those individual factors.
OBJECTIVE
This study intended to build and analyze machine learning regression models that pre-dict the performance of IRONMAN® 70.3 age group triathletes, considering sex, age, country of origin, and event location as predictive factors.
METHODS
A total of 823,464 finishers´ records (625,398 men and 198,066 women) of IRONMAN® 70.3 age group triathletes from 240 different countries and participating in 197 different events in 183 different locations between 2004 and 2020 were analyzed. The triathletes’ gender, age, country of origin, event location and year, and race finish times were thus obtained and considered for the study. Four different ML regression models were built to predict the triathletes’ race times from their age, gender, country of origin, and race location. The model with the best performance was then selected and further analyzed using model-agnostic interpretability tools to understand which factors would contribute most to the model predictions.
RESULTS
The Random Forest Regressor model obtained the best predictive score. This model's partial dependence plots indicated that men under 30 years, from Switzerland or Denmark, competing in IRONMAN®70.3 Austria/St. Polten, IRONMAN® 70.3 Switzerland, IRONMAN® 70.3 Sunshine Coast, and IRONMAN® 70.3 Busselton presented the best performance.
CONCLUSIONS
Our results prove that ML models can be used to look into the complex, non-linear interactions between the factors that influence performance and gain insights that can help IRONMAN® 70.3 age group triathletes better plan their races.