Abstract
Measuring the fitnesses of genetic variants is a fundamental objective in evolutionary biology. A standard approach for measuring microbial fitnesses in bulk involves labeling a library of genetic variants with unique sequence barcodes, competing the labeled strains in batch culture, and using deep sequencing to track changes in the barcode abundances over time. However, idiosyncratic properties of barcodes (e.g., GC content) can induce non-uniform amplification or uneven sequencing coverage that cause some barcodes to be over-or under-represented in samples. This systematic bias can result in erroneous read count trajectories and misestimates of fitness. Here we develop a computational method for inferring the effects of processing bias by leveraging the structure of systematic deviations in the data. We illustrate this approach by applying it to fitness assay data collected for a large library of yeast variants, and show that this method estimates and corrects for bias more accurately than standard proxies, such as GC-based corrections. Our method mitigates bias and improves fitness estimates in high-throughput assays with-out introducing additional complexity to the experimental protocols, with potential value in a range of experimental evolution and mutation screening contexts.
Publisher
Cold Spring Harbor Laboratory