Author:
Wang Shuhui,Allauzen Alexandre,Nghe Philippe,Opuu Vaitea
Abstract
AbstractMechanistic models of genetic interactions are rarely feasible due to a lack of information and computational challenges. Alternatively, machine learning (ML) approaches may predict gene interactions if provided with enough data but they lack interpretability. Here, we propose an ML approach for interpretable genotype-fitness mapping, the Direct-Latent Interpretable Model (D-LIM). The neural network is built on a strong hypothesis: mutations in different genes cause independent effects in phenotypes, which then interact via non-linear relationships to determine fitness. D-LIM predicts genotype-fitness maps for combinations of mutations in multiple genes with state-of-the-art accuracy, showing the validity of the hypothesis in the case of a deep mutational scanning in a metabolic pathway. The hypothesisdriven structure of D-LIM offers interpretable features reminiscent of mechanistic models: the inference of phenotypes, fitness extrapolation outside of the data domain, and enhanced prediction in low-data regimes by the integration of prior knowledge.
Publisher
Cold Spring Harbor Laboratory
Reference16 articles.
1. Mapping the fitness landscape of gene expression uncovers the cause of antagonism and sign epistasis between adaptive mutations;PLoS genetics,2014
2. Faure, A. J. , and Lehner, B. Mochi: neural networks to fit interpretable models and quantify energies, energetic couplings, epistasis and allostery from deep mutational scanning data. bioRxiv (2024), 2024–01.
3. Array programming with numpy;Nature,2020
4. Multilayer feedforward networks are universal approximators;Neural networks,1989
5. Flux, toxicity, and expression costs generate complex genetic interactions in a metabolic pathway;Science Advances,2020