Federated Random Forests can improve local performance of predictive models for various healthcare applications

Author:

Hauschild Anne-Christin1ORCID,Lemanczyk Marta1,Matschinske Julian23,Frisch Tobias4,Zolotareva Olga2,Holzinger Andreas5ORCID,Baumbach Jan3,Heider Dominik1ORCID

Affiliation:

1. Department of Mathematics and Computer Science, University of Marburg , Marburg, Germany

2. TUM School of Life Sciences Weihenstephan, Technical University of Munich , Freising-Weihenstephan, Germany

3. Computational Systems Biology, University of Hamburg , Hamburg, Germany

4. Department of Mathematics and Computer Science, University of Southern Denmark , Odense, Denmark

5. Institut für Medizinische Informatik, Statistik und Dokumentation, Medizinische Universität Graz , Graz, Austria

Abstract

Abstract Motivation Limited data access has hindered the field of precision medicine from exploring its full potential, e.g. concerning machine learning and privacy and data protection rules. Our study evaluates the efficacy of federated Random Forests (FRF) models, focusing particularly on the heterogeneity within and between datasets. We addressed three common challenges: (i) number of parties, (ii) sizes of datasets and (iii) imbalanced phenotypes, evaluated on five biomedical datasets. Results The FRF outperformed the average local models and performed comparably to the data-centralized models trained on the entire data. With an increasing number of models and decreasing dataset size, the performance of local models decreases drastically. The FRF, however, do not decrease significantly. When combining datasets of different sizes, the FRF vastly improve compared to the average local models. We demonstrate that the FRF remain more robust and outperform the local models by analyzing different class-imbalances. Our results support that FRF overcome boundaries of clinical research and enables collaborations across institutes without violating privacy or legal regulations. Clinicians benefit from a vast collection of unbiased data aggregated from different geographic locations, demographics and other varying factors. They can build more generalizable models to make better clinical decisions, which will have relevance, especially for patients in rural areas and rare or geographically uncommon diseases, enabling personalized treatment. In combination with secure multi-party computation, federated learning has the power to revolutionize clinical practice by increasing the accuracy and robustness of healthcare AI and thus paving the way for precision medicine. Availability and implementation The implementation of the federated random forests can be found at https://featurecloud.ai/. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

European Union’s Horizon2020 research and innovation programme

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Reference51 articles.

1. On the performance of de novo pathway enrichment;Batra;NPJ Syst. Biol. Appl,2017

2. Der GALAD-Score, ein AFP-, AFP-L3- und DCP-basierter Diagnosealgorithmus verbessert die Detektionsrate des hepatozellulären Karzinoms im BCLC-Frühstadium signifikant;Best;Z. Gastroenterol,2016

3. MammaPrint versus EndoPredict: poor correlation in disease recurrence risk classification of hormone receptor positive breast cancer;Bösl;PLoS One,2017

4. Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics;Boulesteix;Wiley Interdisc. Rev. Data Min. Knowl. Discov,2012

5. Federated learning of predictive models from federated Electronic Health Records;Brisimi;Int. J. Med. Inf,2018

Cited by 36 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3