Abstract
AbstractBackgroundMany rice protein sequences are very different from the sequence of proteins with known structures. Homology modeling is not possible for many rice proteins. However, it is possible to use computational intensive de novo techniques to obtain protein models when the protein sequence cannot be mapped to a protein of known structure. The Nutritious Rice for the World project generated 10 billion models encompassing more than 60,000 small proteins and protein domains for the rice strains Oryza sativa and Oryza japonica.FindingsOver a period of 1.5 years, the volunteers of World Community Grid supported by IBM generated 10 billion candidate structures, a task that would have taken a single CPU on the order of 10 millennia. For each protein sequence, 5 top structures were chosen using a novel clustering methodology developed for analyzing large datasets. These are provided along with the entire set of 10 billion conformers.ConclusionsWe anticipate that the centroid models will be of use in visualizing and determining the role of rice proteins where the function is unknown. The entire set of conformers is unique in terms of size and that they were derived from sequences that lack detectable homologs. Large sets of de novo conformers are rare and we anticipate that this set will be useful for benchmarking and developing new protein structure prediction methodologies.
Publisher
Cold Spring Harbor Laboratory