Abstract
A large number of important dynamical biological processes, such as the early embryonic cell cycle, cardiac rhythms, or circadian rhythms, are dominated by periodic changes, also called oscillations. It has been a long-standing interest of scientists to understand the underlying mechanisms that describe and regulate this dynamic behavior, usually using classical model identification techniques. The recent rise of data-driven methods, also called machine learning, has fundamentally changed model identification, allowing models to be inferred directly from data with almost no prior knowledge. An example is the data-driven white-box approach SINDy, which despite its recent popularity, has been mainly applied to synthetic data and has yet to prove successful on data from real (biological) experiments. In this work, we explore the limitations of the SINDy approach in the specific context of (biological) oscillatory systems. By directly applying SINDy to experimental data, we define the main limiting aspects: data availability and quality, complexity of interactions, and dimensionality (number of variables) of systems. We study these limiting factors using a set of commonly used, generic oscillator models of different complexity and/or dimensionality. From this, we formulate specific mitigation approaches leading to a step-by-step guide for model inference from real biological data, whose effectiveness we demonstrate using data of glycolytic oscillations in yeast as a test example.
Publisher
Cold Spring Harbor Laboratory