Abstract
AbstractMachine learning (ML) has proven successful in biological data analysis. However, may require massive training data. To allow broader use of ML in the full spectrum of biology and medicine, including sample-sparse domains, re-directing established models to specific tasks by add-on training via a moderate sample may be promising. Transfer learning (TL), a technique migrating pre-trained models to new tasks, fits in this requirement. Here, by TL, we retasked Enformer, a comprehensive model trained by massive data, tailored to breast cancers using breast-specific data. Its performance has been validated through statistical accuracy of predictions, annotation of genetic variants, and mapping of variants associated with breast cancer. By allowing the flexibility of adding dedicated training data, our TL protocol unlocks future discovery within specific domains with moderate add-on samples by standing on the shoulders of giant models.
Publisher
Cold Spring Harbor Laboratory