SynGenome: 100 billion base pairs of AI-generated genomic sequence

SynGenome is a first-of-its-kind database consisting of synthetic DNA sequences generated by Evo, a genomic language model. Given a DNA sequence prompt, Evo will generate a DNA sequence response that continues the genomic sequence. Essentially, Evo enables "autocomplete" for the genome. In the genomes of prokaryotes and phage, genes with related functions frequently appear directly next to each other along the DNA sequence. As a result, prompting Evo with a sequence encoding a function of interest instructs the model to generate functionally related genes. This enables function-guided generative design by prompt engineering a genomic language model. SynGenome is organized according to the known functions, domains, and species of the prompt sequences. The corresponding response sequences are likely enriched for genes with related functions or domains, but they could contain many other interesting genes as well. The generated sequences in SynGenome may be very different from anything found in nature while still performing useful biological functions, opening up a new universe of biological discovery.

Click here to access SynGenome