Evo: DNA Foundation Model

Evo is a biological foundation model capable of long-context modeling and design. Evo uses the StripedHyena architecture to enable modeling of sequences at a single-nucleotide, byte-level resolution with near-linear scaling of compute and memory relative to context length. Evo has 7 billion parameters and is trained on a prokaryotic whole-genome dataset containing ~300 billion nucleotides.
Preprint, February 2024

Click here to access Evo on Github.

Click here to access the Evo integration with HuggingFace.

Click here to read our blog post to learn more about Evo.

Tool Features

A foundation model for multiple modalities

Evo is able to learn across DNA, RNA and proteins, reaching competitive zero-shot performance on function prediction in prokaryotes with state-of-the-art protein language models without explicitly being shown protein coding regions.

Understanding at the whole genome level

Evo understands that small mutations to genes can have large effects on whole-organism function, which we use to perform zero-shot gene essentiality prediction.

Generation from molecular to genome scale

Evo can generate sequences that include molecular complexes (Cas proteins bound to noncoding RNA), systems (mobile genetic elements), and coding-rich genome-length sequences.

Single-nucleotide resolution

Evo can design model long sequences without losing single-nucleotide resolution, enabled by fundamental changes to the machine learning model architecture based on the latest advances in deep signal processing.