Predicting the number of oocytes retrieved from controlled ovarian hyperstimulation with machine learning

Author:

Ferrand Timothy1,Boulant Justine2,He Chloe345ORCID,Chambost Jérôme1,Jacques Céline1,Pena Chris-Alexandre1,Hickman Cristina36,Reignier Arnaud2ORCID,Fréour Thomas27ORCID

Affiliation:

1. AI Team, Apricity , Paris, France

2. Centre Hospitalier Universitaire de Nantes , Nantes, France

3. AI Team, Apricity , London, UK

4. Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London , London, UK

5. Department of Computer Science, University College London , London, UK

6. Institute of Reproductive and Developmental Biology, Imperial College London , London, UK

7. Department of Reproductive Medicine, Dexeus University Hospital , Barcelona, Spain

Abstract

Abstract STUDY QUESTION Can machine learning predict the number of oocytes retrieved from controlled ovarian hyperstimulation (COH)? SUMMARY ANSWER Three machine-learning models were successfully trained to predict the number of oocytes retrieved from COH. WHAT IS KNOWN ALREADY A number of previous studies have identified and built predictive models on factors that influence the number of oocytes retrieved during COH. Many of these studies are, however, limited in the fact that they only consider a small number of variables in isolation. STUDY DESIGN, SIZE, DURATION This study was a retrospective analysis of a dataset of 11,286 cycles performed at a single centre in France between 2009 and 2020 with the aim of building a predictive model for the number of oocytes retrieved from ovarian stimulation. The analysis was carried out by a data analysis team external to the centre using the Substra framework. The Substra framework enabled the data analysis team to send computer code to run securely on the centre’s on-premises server. In this way, a high level of data security was achieved as the data analysis team did not have direct access to the data, nor did the data leave the centre at any point during the study. PARTICIPANTS/MATERIALS, SETTING, METHODS The Light Gradient Boosting Machine algorithm was used to produce three predictive models: one that directly predicted the number of oocytes retrieved and two that predicted which of a set of bins provided by two clinicians the number of oocytes retrieved fell into. The resulting models were evaluated on a held-out test set and compared to linear and logistic regression baselines. In addition, the models themselves were analysed to identify the parameters that had the biggest impact on their predictions. MAIN RESULTS AND THE ROLE OF CHANCE On average, the model that directly predicted the number of oocytes retrieved deviated from the ground truth by 4.21 oocytes. The model that predicted the first clinician’s bins deviated by 0.73 bins whereas the model for the second clinician deviated by 0.62 bins. For all models, performance was best within the first and third quartiles of the target variable, with the model underpredicting extreme values of the target variable (no oocytes and large numbers of oocytes retrieved). Nevertheless, the erroneous predictions made for these extreme cases were still within the vicinity of the true value. Overall, all three models agreed on the importance of each feature which was estimated using Shapley Additive Explanation (SHAP) values. The feature with the highest mean absolute SHAP value (and thus the highest importance) was the antral follicle count, followed by basal AMH and FSH. Of the other hormonal features, basal TSH, LH, and testosterone levels were similarly important and baseline LH was the least important. The treatment characteristic with the highest SHAP value was the initial dose of gonadotropins. LIMITATIONS, REASONS FOR CAUTION The models produced in this study were trained on a cohort from a single centre. They should thus not be used in clinical practice until trained and evaluated on a larger cohort more representative of the general population. WIDER IMPLICATIONS OF FINDINGS These predictive models for the number of oocytes retrieved from COH may be useful in clinical practice, assisting clinicians in optimizing COH protocols for individual patients. Our work also demonstrates the promise of using the Substra framework for allowing external researchers to provide clinically relevant insights on sensitive fertility data in a fully secure, trustworthy manner and opens a number of exciting avenues for accelerating future research. STUDY FUNDING/COMPETING INTEREST(S) This study was funded by the French Public Bank of Investment as part of the Healthchain Consortium. T.Fe., C.He., J.C., C.J., C.-A.P., and C.Hi. are employed by Apricity. C.Hi. has received consulting fees and honoraria from Vitrolife, Merck Serono, Ferring, Cooper Surgical, Dibimed, Apricity, and Fairtility and travel support from Fairtility and Vitrolife, participates on an advisory board for Merck Serono, was the founder and organizer of the AI Fertility conference, has stock in Aria Fertility, TMRW, Fairtility, Apricity, and IVF Professionals, and received free equipment from Planar in exchange for first user feedback. C.J. has received a grant from BPI. J.C. has also received a grant from BPI, is a member of the Merck AI advisory board, and is a board member of Labelia Labs. C.He has a contract for medical writing of this manuscript by CHU Nantes and has received travel support from Apricity. A.R. haș received honoraria from Ferring and Organon. T.Fe. has received a grant from BPI. TRIAL REGISTRATION NUMBER N/A.

Funder

Healthchain Consortium

Publisher

Oxford University Press (OUP)

Subject

Obstetrics and Gynecology,Rehabilitation,Reproductive Medicine

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3