Abstract
AbstractGiven an array of phenotypes (e.g., yield across strains and conditions), one can ask how to best choose subsets of conditions that are informative about the whole dataset, enabling efficient system identification and providing a basis vector in phenotype space. Here we introduce a mixed integer linear programming approach to choose explanatory and response variables for a phenotypic matrix. We applied the algorithm to a set of fitness measurements for 462 yeast strains under 38 carbon sources, and to the growth phenotypes of 65 marine bacteria on 11 media. The algorithm identifies environments that can be used as features to predict growth under other conditions, providing biologically interpretable metabolic axes for strain discrimination. Our approach could be used to reduce the number of experiments needed to identify a strain or to map its metabolic capabilities. The generality of the algorithm makes it appropriate for addressing subset selection problems in areas beyond biology.
Publisher
Cold Spring Harbor Laboratory
Reference25 articles.
1. Untargeted Metabolic Footprinting Reveals a Surprising Breadth of Metabolite Uptake and Release by Synechococcus Sp. PCC 7002;Molecular bioSystems,2011
2. Baran, Richard , Benjamin P. Bowen , Morgan N. Price , Adam P. Arkin , Adam M. Deutschbauer , and Trent R. Northen . 2013. “Metabolic Footprinting of Mutant Libraries to Map Metabolite Utilization to Genotype.” ACS Chemical Biology. https://doi.org/10.1021/cb300477w.
3. Barnett, J. A. , R. W. Payne , and D. Yarrow . 1990. “Yeasts: Characteristics and Identification,” 1012 pp.
4. PHENOS: a high-throughput and flexible tool for microorganism growth phenotyping on solid media
5. Best Subset Selection via a Modern Optimization Lens;Annals of Statistics,2016