Building the Model

Author:

Yang He S.1,Rhoads Daniel D.23,Sepulveda Jorge4,Zang Chengxi5,Chadburn Amy1,Wang Fei5

Affiliation:

1. From the Department of Pathology and Laboratory Medicine (Yang, Chadburn), Weill Cornell Medicine, New York, New York.

2. From the Department of Laboratory Medicine, Cleveland Clinic, Cleveland, Ohio (Rhoads).

3. From the Department of Pathology, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, Ohio (Rhoads).

4. From the Department of Pathology, School of Medicine and Health Sciences, George Washington University, Washington, District of Columbia (Sepulveda).

5. From the Department of Population Health Sciences (Zang, Wang), Weill Cornell Medicine, New York, New York.

Abstract

Context.— Machine learning (ML) allows for the analysis of massive quantities of high-dimensional clinical laboratory data, thereby revealing complex patterns and trends. Thus, ML can potentially improve the efficiency of clinical data interpretation and the practice of laboratory medicine. However, the risks of generating biased or unrepresentative models, which can lead to misleading clinical conclusions or overestimation of the model performance, should be recognized. Objectives.— To discuss the major components for creating ML models, including data collection, data preprocessing, model development, and model evaluation. We also highlight many of the challenges and pitfalls in developing ML models, which could result in misleading clinical impressions or inaccurate model performance, and provide suggestions and guidance on how to circumvent these challenges. Data Sources.— The references for this review were identified through searches of the PubMed database, the US Food and Drug Administration white papers and guidelines, conference abstracts, and online preprints. Conclusions.— With the growing interest in developing and implementing ML models in clinical practice, laboratorians and clinicians need to be educated in order to collect sufficiently large and high-quality data, properly report the data set characteristics, and combine data from multiple institutions with proper normalization. They will also need to assess the reasons for missing values, determine the inclusion or exclusion of outliers, and evaluate the completeness of a data set. In addition, they require the necessary knowledge to select a suitable ML model for a specific clinical question and accurately evaluate the performance of the ML model, based on objective criteria. Domain-specific knowledge is critical in the entire workflow of developing ML models.

Publisher

Archives of Pathology and Laboratory Medicine

Subject

Medical Laboratory Technology,General Medicine,Pathology and Forensic Medicine

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3