Multilayer meta-matching: translating phenotypic prediction models from multiple datasets to small data-Reference-Cited by-同舟云学术

Multilayer meta-matching: translating phenotypic prediction models from multiple datasets to small data

Published:2023-12-07 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Chen Pansheng^ORCID,An Lijun^ORCID,Wulan Naren^ORCID,Zhang Chen^ORCID,Zhang Shaoshi^ORCID,Ooi Leon Qi Rong^ORCID,Kong Ru^ORCID,Chen Jianzhong^ORCID,Wu Jianxiao^ORCID,Chopra Sidhant^ORCID,Bzdok Danilo^ORCID,Eickhoff Simon B^ORCID,Holmes Avram J^ORCID,Yeo B.T. Thomas^ORCID

Abstract

AbstractResting-state functional connectivity (RSFC) is widely used to predict phenotypic traits in individuals. Large sample sizes can significantly improve prediction accuracies. However, for studies of certain clinical populations or focused neuroscience inquiries, small-scale datasets often remain a necessity. We have previously proposed a “meta-matching” approach to translate prediction models from large datasets to predict new phenotypes in small datasets. We demonstrated large improvement of meta-matching over classical kernel ridge regression (KRR) when translating models from a single source dataset (UK Biobank) to the Human Connectome Project Young Adults (HCP-YA) dataset. In the current study, we propose two meta-matching variants (“meta-matching with dataset stacking” and “multilayer meta-matching”) to translate models from multiple source datasets across disparate sample sizes to predict new phenotypes in small target datasets. We evaluate both approaches by translating models trained from five source datasets (with sample sizes ranging from 862 participants to 36,834 participants) to predict phenotypes in the HCP-YA and HCP-Aging datasets. We find that multilayer meta-matching modestly outperforms meta-matching with dataset stacking. Both meta-matching variants perform better than the original “meta-matching with stacking” approach trained only on the UK Biobank. All meta-matching variants outperform classical KRR and transfer learning by a large margin. In fact, KRR is better than classical transfer learning when less than 50 participants are available for finetuning, suggesting the difficulty of classical transfer learning in the very small sample regime. The multilayer meta-matching model is publicly available at GITHUB_LINK.

Publisher

Cold Spring Harbor Laboratory

Reference74 articles.

1. Deriving reproducible biomarkers from multi-site resting-state data: An Autism-based example;NeuroImage,2017

2. Akiba, T. , Sano, S. , Yanase, T. , Ohta, T. , & Koyama, M . (2019). Optuna: A next-generation hyperparameter optimization framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2623–2631.

3. An open resource for transdiagnostic research in pediatric mental health and learning disorders;Scientific Data,2017

4. Alfaro-Almagro, F. , Jenkinson, M. , Bangerter, N. K. , Andersson, J. L. , Griffanti, L. , Douaud, G. , Sotiropoulos, S. N. , Jbabdi, S. , Hernandez-Fernandez, M. , & Vallee, E. (2018).

5. Image processing and Quality Control for the first 10,000 brain imaging datasets from UK Biobank. Neuroimage, 166, 400–424.