Evo: Creating Generative AI for Genomes
DNA contains the blueprint for life itself, working through RNA and proteins to orchestrate biological function. Yet decoding the complex interactions between these foundational molecules, and generating new, functional sequences, has remained out of reach—especially at genomic scales.
Now, researchers at the Arc Institute, Stanford University, and University of California, Berkeley have led the development of Evo, the first biological foundation model trained on DNA at scale. By learning the information encoded within DNA using a frontier deep learning architecture, Evo is capable of both prediction and design not just at the level of DNA but also across RNA and proteins. Its interpretive and generative capabilities span biological scales, from nucleotides to the whole genome—bringing entire life forms into focus.
Created by the labs of Brian Hie, Arc Innovation Investigator and Stanford Assistant Professor of Chemical Engineering, and Patrick Hsu, Arc Core Investigator and UC Berkeley Assistant Professor of Bioengineering, Evo was first introduced in a preprint earlier this year and has now been published in Science. Since the preprint was posted, the researchers have used Evo to design a functional CRISPR system unknown in nature, showing how this deeper understanding of biological sequences can yield new molecular tools.
"Evo deciphers the patterns written into DNA over billions of years of evolution, breaking new ground in our ability to understand and engineer biology," said Hsu. "Just as generative AI has revolutionized how we work with text, audio, and video, these same creative capabilities can now be applied to life's fundamental codes."
"What makes Evo exciting is that it's a true foundation model for biology," added Hie. "Being both multimodal and multiscale, it gives us a unified approach for harnessing the immense complexity of living systems."
Designing new gene editing tools
CRISPR systems are molecular machines composed of both proteins and RNA that work together to edit DNA. Traditionally, developing new CRISPR tools has meant searching through nature's existing systems. Building upon the preprint, the Science paper explains how Evo opens a new frontier by designing these complex molecular machines from scratch—creating both the protein and RNA components simultaneously in a way that ensures they work together.
The team put this capability to the test by prompting Evo to generate entirely new CRISPR systems. They created EvoCas9-1—a fully functional system—after testing just eleven designs. This success is particularly notable because EvoCas9-1 is significantly different from known systems, sharing only about 73% of its sequence with the commonly used CRISPR-Cas9 yet achieving comparable activity—suggesting there may be many more effective biological systems that AI could help discover.
"Creating functional CRISPR systems requires intricate coordination between proteins and RNA," explained Hsu. "Evo's ability to design both components in concert, and have them work effectively, demonstrates a new level of sophistication in biological engineering tools."
The team then pushed Evo further, asking it to design mobile genetic elements—DNA sequences that can move within genomes. They chose to work with a particularly challenging family of these elements, called IS200/IS605 transposons, that operate through an intricate "peel-and-paste" mechanism to both cut out and insert DNA sequences. These systems require multiple components working in harmony: proteins that must pair up properly, specific DNA structures that fold into hairpin shapes, and in some cases, guidance from RNA molecules. Despite this complexity, Evo designed a new set of transposons in this family, distinct from those known in nature, that could successfully cut and paste DNA.
Deep collaboration drives discovery
Evo was developed by an interdisciplinary team of over twenty scientists across computational and biological disciplines. A core machine learning subteam, spearheaded by Stanford Bioengineering doctoral student Eric Nguyen and Stanford Computer Science doctoral student Michael Poli, focused on architecture development, model training, and scaling infrastructure.
The computational biology subteam, led by Arc Institute senior scientist Matthew Durrant from the Hsu lab, focused on curation of massive datasets of biological sequences, as well as rigorous evaluation of the model on downstream tasks. The experimental biology subteam, led by Stanford Bioengineering doctoral students Brian Kang and David Li from the Hie Lab and Arc Institute senior scientist Dhruva Katrekar from the Hsu Lab, performed intensive biological experiments to validate the highly complex designs generated by Evo.
“In the training and testing of Evo, we wanted to demonstrate capabilities that would spark the imagination of a diverse audience of machine learning researchers, computational biologists, and experimentalists,” said Hsu. “We think this is a foundation model for biological research because it enables such a general set of tasks across the central dogma of molecular biology.”
Team members also considered the prudent use of Evo and other biology foundation models. Stanford Professor Tina Hernandez-Boussard, postdoctoral scholar Madalena Ng, and doctoral student Ashley Lewis conducted an ethics and safety investigation, addressing potential risks and outlining precautionary measures for the responsible development and deployment of this new technology.
What’s next
Evo can generate DNA sequences of over 1 million bases, larger than the genomes of many simple life forms. The team now looks to scale Evo to more complex organisms and apply it to larger scales of biological organization.
"What we've shown with this first Evo model is just the beginning," said Hsu. "Our next goal is to move beyond single cell life to understand the multicellular organisms that evolution has created over billions of years. Long-term, we’re working toward a new field of ‘genome design’ where we can create entire cellular pathways and potentially entire organisms."
"As we scale Evo to more complex datasets and broader scales, we're working to make that complexity programmable, allowing researchers to leverage these learned rules for biological design in a way that’s never been accessible before," added Hie.