Evo: DNA Foundation Model

Evo is a genomic foundation model capable of generalist prediction and design tasks across DNA, RNA, and proteins. It uses a frontier deep learning architecture to enable modeling of biological sequences at single-nucleotide resolution with near-linear scaling of compute and memory relative to context length. Evo is trained with seven billion parameters and 131kilobase context length on over 300 billion nucleotides of diverse prokaryotic genomes.
Published November 2024, Science

Click here to access Evo on Github.

Click here to access the Evo integration with HuggingFace.

Click here to read our blog post to learn more about Evo.

Tool Features

A multimodal foundation model for biology

Evo is able to learn across DNA, RNA and proteins, demonstrating competitive zero-shot performance on function prediction in prokaryotes with state-of-the-art language models for single modalities:

• DNA regulatory regions – Evo can predict impact of single-nucleotide changes on gene expression
• Non-coding RNA – Evo outperforms all other nucleotide language models at predicting effects of mutations on bacterial fitness.
• Protein - Evo can predict the impact of mutations on functional activity without explicitly being shown protein-coding regions.

Understanding at the whole genome level

Evo understands that small mutations to genes can have large effects on whole-organism function, which we use to perform zero-shot gene essentiality prediction across a diverse range of bacteria and phage.

Generation from molecular to genome scale

Evo can generate sequences that include:

• Functional protein:RNA complexes as shown with EvoCas9-1 and Evo-designed guide RNAs
• Functional molecular systems as shown with ISEvo1 (IS200-like) and ISEvo2 (IS605-like) mobile genetic elements
• Megabase-scale sequences with plausible, coding sequence-rich genomic architecture