1. B. Alberts et al. Molecular Biology of the Cell (Garland New York ed. 3 1994); H. Lodish et al. Molecular Cell Biology (Scientific American Books New York ed. 3 1995).
2. Fields S., Song O. K., Nature 340, 243 (1989).
3. Berger J. M. Gamblin S. J. Harrison S. C. Wang J. C. 379 225 (1996).
4. The Complete Genome Sequence of
Escherichia coli
K-12
5. The triplets of proteins are found with the aid of protein domain databases such as the ProDom or Pfam databases (17). Here a list of all ProDom domains in every one of the 64 568 SWISS-PROT proteins was prepared as well as a list of all proteins that contain each of the 53 597 ProDom domains. Then every protein in ProDom was considered for its ability to be a linking (or Rosetta Stone) member in a triplet. All pairs of domains that are both members of a given protein P were defined as being linked by protein P if we could find at least one protein with only one of the two domains. By this method we found 14 899 links between the 7843 ProDom domains. Then in a single genome (such as E. coli ) we found all nonhomologous pairs of proteins containing linked domains. These pairs are linked by the Rosetta Stone proteins. For E. coli this method finds 3531 protein pairs. An alternate method for discovering protein triplets uses amino acid sequence alignment techniques to find two proteins that align to a Rosetta Stone protein such that the alignments do not overlap on the Rosetta Stone protein. For E. coli this method finds 4487 protein pairs 1209 of which were also found by the ProDom search method (even though different sequence databases were searched for each method). All predictions are available on the World Wide Web at www.doe-mbi.ucla.edu.