RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes-Reference-Cited by-同舟云学术

RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes

Published:2023-04 Issue: Volume:30 Page:
ISSN:1073-2748
Container-title:Cancer Control
language:en
Short-container-title:Cancer Control

Author:

Laios Alexandros¹^ORCID,Kalampokis Evangelos²,Mamalis Marios Evangelos²^ORCID,Tarabanis Constantine³,Nugent David¹,Thangavelu Amudha¹,Theophilou Georgios¹,De Jong Diederick¹^ORCID

Affiliation:

1. Department of Gynaecologic Oncology, ESGO Center of Excellence for Ovarian Cancer Surgery, St James’s University Hospital, Leeds, UK

2. Information Systems Lab, Department of Business Administration, University of Macedonia, Thessaloniki, Greece

3. Department of Internal Medicine, School of Medicine, New York University, New York, NY, USA

Abstract

Introduction Contemporary efforts to predict surgical outcomes focus on the associations between traditional discrete surgical risk factors. We aimed to determine whether natural language processing (NLP) of unstructured operative notes improves the prediction of residual disease in women with advanced epithelial ovarian cancer (EOC) following cytoreductive surgery. Methods Electronic Health Records were queried to identify women with advanced EOC including their operative notes. The Term Frequency – Inverse Document Frequency (TF-IDF) score was used to quantify the discrimination capacity of sequences of words (n-grams) regarding the existence of residual disease. We employed the state-of-the-art RoBERTa-based classifier to process unstructured surgical notes. Discrimination was measured using standard performance metrics. An XGBoost model was then trained on the same dataset using both discrete and engineered clinical features along with the probabilities outputted by the RoBERTa classifier. Results The cohort consisted of 555 cases of EOC cytoreduction performed by eight surgeons between January 2014 and December 2019. Discrete word clouds weighted by n-gram TF-IDF score difference between R0 and non-R0 resection were identified. The words ‘adherent’ and ‘miliary disease’ best discriminated between the two groups. The RoBERTa model reached high evaluation metrics (AUROC .86; AUPRC .87, precision, recall, and F1 score of .77 and accuracy of .81). Equally, it outperformed models that used discrete clinical and engineered features and outplayed the performance of other state-of-the-art NLP tools. When the probabilities from the RoBERTa classifier were combined with commonly used predictors in the XGBoost model, a marginal improvement in the overall model’s performance was observed (AUROC and AUPRC of .91, with all other metrics the same). Conclusion/Implications We applied a sui generis approach to extract information from the abundant textual surgical data and demonstrated how it can be effectively used for classification prediction, outperforming models relying on conventional structured data. State-of-art NLP applications in biomedical texts can improve modern EOC care.

Publisher

SAGE Publications

Subject

Oncology,Hematology,General Medicine

Link

http://journals.sagepub.com/doi/pdf/10.1177/10732748231209892

Reference43 articles.

1. Preoperative hypoalbuminemia is an independent predictor of poor perioperative outcomes in women undergoing open surgery for gynecologic malignancies

2. A preoperative personalized risk assessment calculator for elderly ovarian cancer patients undergoing primary cytoreductive surgery

3. The Value of Electronic Health Records Since the Health Information Technology for Economic and Clinical Health Act: Systematic Review

4. Clinical Text Data in Machine Learning: Systematic Review

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Leveraging Large Language Models in Gynecologic Oncology: A Systematic Review of Current Applications and Challenges;2024-08-09

2. A Large Language Model Agent Based Legal Assistant for Governance Applications;Lecture Notes in Computer Science;2024