Interpretable Machine Learning for 100-Yard Freestyle Performance: SHAP-Driven Feature Selection of Lap-Level Stroke, Splits, and Breakout Metrics
Research Article
Open Access
CC BY

Interpretable Machine Learning for 100-Yard Freestyle Performance: SHAP-Driven Feature Selection of Lap-Level Stroke, Splits, and Breakout Metrics

Ziqiu Wang 1*
1 Beijing 101 Middle School
*Corresponding author: wangziqiu99@outlook.com
Published on 19 November 2025
Volume Cover
TNS Vol.151
ISSN (Print): 2753-8826
ISSN (Online): 2753-8818
ISBN (Print): 978-1-80590-559-2
ISBN (Online): 978-1-80590-560-8
Download Cover

Abstract

Sprint-freestyle performance prediction and interpretation require precise and actionable models for coaches and athletes. This study presents an interpretable machine learning model applied to lap-by-lap metrics from A-final 100-yard freestyle swims (n = 67). We construct a 12-dimensional feature vector from three technical metrics (mean stroke rate, cycle count, and breakout distance) across four laps, and construct both a regression task (smooth race time prediction) and a binary classification task (fast/slow, threshold at 41.4 s). Several algorithms were explored—Linear Regression, Random Forest, k-Nearest Neighbors (kNN), and Support Vector techniques—on multiple train/test splits and based on measures of R², MAPE, accuracy, and F1 score. Where regression R² values were low (best mean R² ≈ −0.042 for Random Forest), MAPE was nonetheless small (~0.011), with modest absolute error but little explained variance. Classification fared better: kNN recorded the best mean accuracy (≈0.727) and F1 (≈0.717). Most significantly, SHAP (Shapley Additive Explanations) identified Lap2_Stroke_Rate and Lap4_Breakout_Dist as two of the top features. Feature-selection tests showed that models that are trained on higher features perform with identical MAPE with significantly fewer inputs, towards useful, interpretable, and data-efficient ways for performance monitoring and coaching decisions.

Keywords:

machine learning, SHAP, sports analytics, random forest regression, feature selection

View PDF
Wang,Z. (2025). Interpretable Machine Learning for 100-Yard Freestyle Performance: SHAP-Driven Feature Selection of Lap-Level Stroke, Splits, and Breakout Metrics. Theoretical and Natural Science,151,1-11.

References

[1]. Rozi, G., Mavromatis, G., Toubekis, A., & Ozen, S. (2018, January 1). Relationship between force parameters and performance in 100 m front crawl swimming. Sport Science, 11(1), 57–60.https: //www.researchgate.net/publication/330344100_Relationship_between_force_parameters_and_performance_in_100m_front_crawl_swimming

[2]. Figueiredo, P., Zamparo, P., Sousa, A., Vilas-Boas, J. P., & Fernandes, R. J. (2013). Interplay of biomechanical, energetic, coordinative, and muscular factors in a 200 m front crawl swim. BioMed Research International, 897232. https: //doi.org/10.1155/2013/897232

[3]. Breiman, L. (2001, October). Random Forests. Machine Learning, 45(1), 5–32. https: //doi.org/10.1023/A: 1010933404324

[4]. Cortes, C., & Vapnik, V. (1995, September). Support-vector networks. Machine Learning, 20(3), 273–297. https: //doi.org/10.1007/BF00994018

[5]. Bunker, R. P., & Thabtah, F. (2019, January). A machine learning framework for sport result prediction. Applied Computing and Informatics, 15(1), 27–33. https: //doi.org/10.1016/j.aci.2017.09.005

[6]. Xie, J., Wang, J., Zhang, Z., & Zhang, Y. (2016, October). Machine learning of swimming data via wisdom of crowd and regression analysis. Mathematical Biosciences and Engineering, 13(6), 9–19. https: //doi.org/10.3934/mbe.2017031

[7]. Rudin, C. (2019, May). Stop explaining black box machine learning models for high-stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. https: //doi.org/10.1038/s42256-019-0048-x

[8]. Lundberg, S. M., & Lee, S.-I. (2017, November 24). A unified approach to interpreting model predictions. Proceedings of NIPS (arXiv: 1705.07874). https: //arxiv.org/abs/1705.07874

[9]. justinr111. (2024). NCAA 100 Freestyle 2015–2024 (Kaggle dataset). Kaggle. https: //www.kaggle.com/datasets/justinr111/ncaa-100-freestyle-2015-2024

[10]. NCAA Men’s Swimming. (2025, March 30). 10 Years of the Men’s 100 Freestyle – NCAA Edition (2014-2024) [Video]. YouTube. https: //youtu.be/YpVjiKIgS6g

[11]. Scikit-learn developers. (2023).  LinearRegression. scikit-learn 1.7.2 documentation.  https: //scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html

[12]. Scikit-learn developers. (2023).  RandomForestRegressor. scikit-learn 1.7.2 documentation.  https: //scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

[13]. Scikit-learn developers. (2023).  KNeighborsRegressor. scikit-learn 1.7.2 documentation.  https: //scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html

[14]. Scikit-learn developers. (2023).  RandomForestClassifier. scikit-learn 1.7.2 documentation.  https: //scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

[15]. Scikit-learn developers. (2023).  KNeighborsClassifier. scikit-learn 1.7.2 documentation.  https: //scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html

[16]. Scikit-learn developers. (2023). SVC. scikit-learn 1.7.2 documentation. https: //scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

Cite this article

Wang,Z. (2025). Interpretable Machine Learning for 100-Yard Freestyle Performance: SHAP-Driven Feature Selection of Lap-Level Stroke, Splits, and Breakout Metrics. Theoretical and Natural Science,151,1-11.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of CONF-CIAP 2026 Symposium: Applied Mathematics and Statistics

ISBN: 978-1-80590-559-2(Print) / 978-1-80590-560-8(Online)
Editor: Marwan Omar
Conference date: 27 January 2026
Series: Theoretical and Natural Science
Volume number: Vol.151
ISSN: 2753-8818(Print) / 2753-8826(Online)