Author:
Galik Bence,Landry Jonathan J.M.,Kirkpatrick Joanna M.,Fritz Markus Hsi-Yang,Baying Bianka,Blake Jonathon,Haase Bettina,Collier Paul G.,Hercog Rajna,Pavlinic Dinko,Stolt-Bergner Peggy,Besir Hüseyin,Remans Kim,Gyenesei Attila,Benes Vladimir
Abstract
AbstractInsect-derived cell lines, from Spodoptera frugiperda (Sf21) and from Trichoplusia ni (Tni), are the two most widely used cell lines for recombinant protein expression in combination with the Baculoviral Expression Vector System (BEVS). Genomic sequences and annotations are still incomplete for Sf21 and absent for Tni. In this study, we present an approach using different sequencing data types, including short-read sequencing, long synthetic and Oxford Nanopore reads, to build genomes. The Sf21 and Tni assemblies contain 4,020 scaffolds of 463 Mb in size with N50 of 364 Kb and 2,954 scaffolds of 332 Mb in size with N50 of 326 Kb, respectively. Furthermore, we built a new gene prediction workflow, which integrates transcriptome and proteome information using pre-existing tools. Using this approach, we predicted 21,506 Sf21 and 14,159 Tni genes, generated and integrated proteomic datasets to validate predicted genes and could identify 5577 and 4919 proteins in the Sf21 and Tni cell lines respectively. This integrative approach could be theoretically applied to any uncharacterized genome and result in valuable new resources. With this information available, Sf21 and Tni cells will become even better tools for protein expression and could be used in a wider range of applications, from promoter identification to genome engineering and editing.
Publisher
Cold Spring Harbor Laboratory