Abstract
AbstractProteins have evolved over millions of years to mediate and carry-out biological processes efficiently. Directed evolution approaches have been used to genetically engineer proteins with desirable functions such as catalysis, mineralization, and target-specific binding. Next-generation sequencing technology offers the capability to discover a massive combinatorial sequence space that is costly to sample experimentally through traditional approaches. Since the permutation space of protein sequence is virtually infinite, and evolution dynamics are poorly understood, experimental verifications have been limited. Recently, machine-learning approaches have been introduced to guide the evolution process that facilitates a deeper and denser search of the sequence-space. Despite these developments, however, frequently used high-fidelity models depend on massive amounts of properly labeled quality data, which so far has been largely lacking in the literature. Here, we provide a preliminary high-throughput peptide-selection protocol with functional scoring to enhance the quality of the data. Solid binding dodecapeptides have been selected against molybdenum disulfide substrate, a two-dimensional atomically thick semiconductor solid. The survival rate of the phage-clones, upon successively stringent washes, quantifies the binding affinity of the peptides onto the solid material. The method suggested here provides a fast generation of preliminary data-pool with ∼2 million unique peptides with 12 amino-acids per sequence by avoiding amplification. Our results demonstrate the importance of data-cleaning and proper conditioning of massive datasets in guiding experiments iteratively. The established extensive groundwork here provides unique opportunities to further iterate and modify the technique to suit a wide variety of needs and generate various peptide and protein datasets. Prospective statistical models developed on the datasets to efficiently explore the sequence-function space will guide towards the intelligent design of proteins and peptides through deep directed evolution. Technological applications of the future based on the peptide-single layer solid based bio/nano soft interfaces, such as biosensors, bioelectronics, and logic devices, is expected to benefit from the solid binding peptide dataset alone. Furthermore, protocols described herein will also benefit efforts in medical applications, such as vaccine development, that could significantly accelerate a global response to future pandemics.
Publisher
Cold Spring Harbor Laboratory