Evo 2: One Year Later

Now published in the journal Nature, the DNA foundation model is showcasing the power of open and collaborative science

When we first released Evo 2 as a preprint in February 2025 (see here for a recap), it represented the largest fully open biological AI model to date. With its publication in Nature today, we'd like to highlight some updates to Evo 2 since the preprint and share details about how others are applying it to their work.

For those who are new to Evo 2, this frontier DNA language model was developed as a broad team effort. Arc and NVIDIA scientists convened collaborators across Stanford University, UCSF, and UC Berkeley as well as Goodfire and the University of Washington. Trained on genome sequences across all domains of life, from bacteria and phage genomes to plants and animals, including humans, the model is incredibly flexible. Not only can Evo 2 be used to make predictions, such as identifying disease-causing mutations, it also features strong generative capabilities.

One of our notable successes has been our functional bacteriophage representing the first AI-designed and experimentally validated organisms. Building a custom phage holds the potential to treat antibiotic-resistant bacteria, which rapidly evolve resistance to traditional antibiotic treatments and are a growing problem. Using supervised fine-tuning on thousands of bacteriophage genomes similar to ΦX174, we demonstrated that Evo 2 had superior performance to the previous Evo 1 model, likely due to the improved StripedHyena2 architecture. 16 of 285 tested designs successfully propagated and inhibited growth of the appropriate bacterial strains with no impact on unrelated strains. This host tropism is a critical feature if we hope to use future designs for phage therapy. While this was a relatively small genome to design, with just 11 genes, it is an exciting proof-of-concept that paves the way for larger and more complex designs.

The preprint has been cited over 200 times, including reviews and commentaries, while some researchers have been benchmarking Evo 2 against existing models for specific tasks, developing new models to see if they outperform Evo 2, and exploring Evo Designer. The model has been downloaded over 88,000 times on GitHub with 380 "forks", and on Hugging Face, Evo 2 7B and 40B have had over two and six million API requests respectively with over 100,000 downloads across models since its release. The training dataset also seems to be in high demand with OpenGenome2 being downloaded more than 48,000 times.

Evo 2's many strengths, including learned features like exon-intron boundaries, transcription factor binding sites, and protein structural elements, have led to a diverse range of applications of the model, some of which have revealed new ways to think about the model.

Evo 2 is useful for genetic disease risk prediction in humans.

Zhu et al. combine two areas of interest for Arc, AI and Alzheimer's Disease (AD). This team successfully applied Evo 2 to cohorts of AD patients and found that the model's scores for variants of the APOE locus reflect each variant's potential contribution to disease risk. This highlights how Evo 2 can synergize with traditional population genetics datasets (such as AD GWAS data, Human Pangenome assemblies, deeply phenotyped AD cohort (ADNI), and the population-scale UK Biobank) in connecting variants to disease risk. This type of connection enables population geneticists to go from a comprehensive catalog of variants to inferring functional impact directly from DNA sequences.
Evo 2 can help with the study of farm animal genetics.

Jiang et al. tested Evo 2's performance across a range of domesticated animal species, finding it robustly categorized variants across species (AUROC 0.921) when blinded to the variant type. Even when variant types were matched before classification, the model was still highly effective (AUROC 0.844). Potential applications of Evo 2 in farm animal genetics include fine-mapping complex traits through transfer learning from databases of human functional variants, mutation load assessment for breeding decisions to reduce the risk of deleterious allele transmission, and integration with traditional quantitative genetics to estimate the value of complex interactions between variants rather than focusing on single variants.
Evo 2 can be adapted for 3D genome analysis.

While Evo 2 is a powerful model, it requires significant computational resources to run. Fang et al. made an impressive and creative adaptation of the model for Hi-C data, which they called Evo2HiC. Hi-C is an approach to study the 3D organization of genomes. This organization changes as cells differentiate or respond to stimuli with implications for critical functions such as DNA replication and gene expression. One issue with current approaches for predicting 3D genome organization is that these models are trained solely using sequences as inputs and need to be retrained when analyzing a new cell type. This makes it challenging for these models to characterize cell type-specific chromatin interaction patterns. One potential solution is to jointly train a model using genome sequences and Hi-C data. Evo2HiC addresses this issue while achieving superior performance over current models in predicting Hi-C contact matrices and state-of-the-art performance in predicting other epigenomic features. This study is also noteworthy because the authors were able to distill Evo 2 into a compact encoder, while guiding the distillation with Hi-C data to preserve genomic features critical for 3D genome analysis. This distillation also reduces the time and memory requirements for downstream applications by a factor of 500 compared to Evo 2.
Evo 2 is capable of in-context learning like LLMs.

In-context learning (ICL) by AI models is the capability to make predictions based purely on examples contained within the prompt. It was assumed that the ability of LLMs to do this was related to properties of human language. Breslow et al. were curious as to whether ICL might similarly emerge in genome language models, and Evo 2 presented a unique example for testing given that its scale is comparable to many LLMs. Strikingly, Evo 2 is capable of ICL and even outperforms similarly-scaled Qwen3 models. Additionally, the Striped-Hyena2 architecture used by Evo 2 is a hybrid architecture (convolutional + attention layers, not a pure Transformer). This suggests that with sufficient scale and structured data, ICL extends to biological contexts.

We have also continued to iterate on Evo 2 since its initial release, particularly benchmarking and validating the model's performance. One of our first priorities has been to experimentally validate the generative epigenomic designs. These designs were generated by combining Evo 2 with predictive models like Enformer and Borzoi that enabled controllable generation of chromatin accessibility patterns. In our initial proof-of-concept, we reported Morse code sequences spelling ARC and EVO2.

Reassuringly, when they were synthesized by our collaborators at the University of Washington and inserted into mouse embryonic stem cells, the experimentally measured chromatin accessibility patterns show remarkably high concordance with the predicted patterns (AUROCs of 0.92-0.95). In our Arc labs, we also designed elements with cell-type-specific accessibility patterns between two different human cell lines, demonstrating that 4 of 24 designs produced more than a 2-fold difference in accessibility. This level of control would be important for designing cell-type specific gene expression programs. Beyond proof-of-concept that we can achieve controllable design of mammalian chromatin architecture, this also sets the stage for using other application-specific models to guide Evo 2 and enable other types of biological design so long as a capable predictive model exists.

This also adds to the growing number of validated Evo model designs. First, we had functional CRISPR systems and mobile genetic elements using Evo 1. Then functional anti-CRISPRs and toxin:antitoxin pairs were designed using semantic design with Evo 1.5. Given these successes, one of the next steps will be to deliver Evo-generated sequences into specific genomic loci using the ever-expanding toolkit of genome engineering such as our programmable bridge recombinases that we demonstrated are capable of megabase-scale rearrangements in human cells. We are also excited about AI lab-in-the-loop experiments building on our recent MULTI-evolve paper, but expanding beyond proteins to the scale of entire genomic regions or genomes.

Even as we push forward new discoveries, human safety and ethical use have been at the forefront of Arc's approach to this exciting new technology. Bacteriophage infect only bacteria, and we excluded eukaryotic viruses from Evo 2 training for safety reasons. We verified that these exclusions weaken language modeling and downstream prediction on human viruses, and our red-teaming evaluations show generated sequences are "effectively random" for pathogenic viral proteins, meaning that any resulting "designer phage" would be highly unlikely to cause damage to human cells. However, further work for model alignment will be needed for ongoing biosafety as future versions of biological foundation models become more capable, and Arc and our collaborators are proud to be at the forefront of those efforts.

As these models continue to develop, we expect stronger predictions of disease variants and improved capabilities for generative genomics. We want to be able to design entire collections of genes or proteins without telling the model the explicit rules for each design since the model should have already learned these rules. With new design and engineering capabilities combined with multi-omics tools, we are entering a phase where we can experimentally read biological states and use AI and machine learning to think about and write new biological designs in an iterative read-write-think loop.

Brixi, G., Durrant, M.G., Ku, J., Naghipourfar, M., Poli, M., Brockman, G., Chang, D., Fanton, A., Gonzalez, G.A., King, S.H., Li, D.B., Merchant, A.T., Nguyen, E., Ricci-Tam, C., Romero, D.W., Schmok, J.C., Sun, G., Taghibakhshi, A., Vorontsov, A., Yang, B., Deng, M., Gorton, L., Nguyen, N., Wang, N.K., Pearce, M.T., Simon, E., Adams, E., Amador, Z.J., Ashley, E.A., Baccus, S.A., Dai, H., Dillmann, S., Ermon, S., Guo, D., Herschl, M.H., Ilango, R., Janik, K., Lu, A.X., Mehta, R., Mofrad, M.R.K., Ng, M.Y., Pannu, J., Ré, C., St. John, J., Sullivan, J., Tey, J., Viggiano, B., Zhu, K., Zynda, G., Balsam, D., Collison, P., Costa, A.B., Hernandez-Boussard, T., Ho, E., Liu, M.-Y., McGrath, T., Powell, K., Pinglay, S., Burke, D.P., Goodarzi, H., Hsu, P.D., & Hie, B.L. (2026). Genome modeling and design across all domains of life with Evo 2. Nature. https://doi.org/10.1038/s41586-026-10176-5

Brian Hie (X: @BrianHie), an Assistant Professor of Chemical Engineering at Stanford University, the Dieter Schwarz Foundation Stanford Data Science Faculty Fellow, and Arc Institute Innovation Investigator in Residence.

Patrick Hsu (X: @pdhsu) is Co-Founder and a Core Investigator of Arc Institute and Assistant Professor of Bioengineering and Deb Faculty Fellow at the University of California, Berkeley.

More AIxBio from Arc Institute

Evo 2 is one piece of a broader effort at Arc to build the full stack of interconnected AI and biology. On the AI side, these resources range from the Evo series of models to learn the language of DNA, to virtual cell models that predict how cells respond to perturbations, to agentic mining of public data, to tools for training models and applying their predictions and designs.

Learn more about:

Arc's Virtual Cell Initiative — Our Institute-wide effort taking a full-stack approach to generate training data and build virtual cell models.

State — Arc's first virtual cell model, trained on large perturbational datasets to predict how genetic, chemical, and environmental changes shift gene expression across cell types.

Stack — A single-cell foundation model that uses in-context learning to predict cellular responses to perturbations never directly measured.

scBaseCount — AI agents that find, clean, and uniformly process single-cell data for model training, part of Arc's Virtual Cell Atlas.

The Virtual Cell Challenge — An annual competition launched in 2025 to evaluate and improve virtual cell models across the field.

MULTI-evolve — An AI-guided protein engineering framework from the Hsu and Konermann labs that accelerates iterative design-test-build cycles.

CodonFM — A family of open-source AI models developed with NVIDIA that reveal the grammar underlying codon choice.

ProPer-seq — A cost-efficient method for linking perturbations to transcriptional phenotypes.

Subscribe to the Arc newsletter to stay up to date on our AIxBio work.