Arc Research

Learning the Language of Codon Translation with CodonFM
NVIDIA blog
Goodarzi Lab

Learning the Language of Codon Translation with CodonFM

Here, we introduce the EnCodon model series within CodonFM, a family of large foundation models trained on more than 130 million coding sequences spanning over 22,000 species, designed to learn the contextual grammar of codon usage directly from sequence.

Integrated epigenetic and genetic programming of primary human T cells
Nature Biotechnology
Gilbert Lab

Integrated epigenetic and genetic programming of primary human T cells

Here, we develop an all-RNA platform for efficient, durable and multiplexed epigenetic programming in primary human T cells, stably turning endogenous genes off or on using CRISPRoff and CRISPRon epigenetic editors. We achieve epigenetic programming of targeted genomic elements without the need for sustained expression of CRISPR systems.

Megabase-scale human genome rearrangement with programmable bridge recombinases
Science
Hsu Lab

Megabase-scale human genome rearrangement with programmable bridge recombinases

Bridge recombinases are naturally occurring RNA-guided DNA recombinases that we previously demonstrated can programmably insert, excise, and invert DNA in vitro and in Escherichia coli. In this study, we report the discovery and engineering of the bridge recombinase ortholog ISCro4 for universal rearrangements of the human genome.

Generative design of novel bacteriophages with genome language models
bioRxiv
Hie Lab

Generative design of novel bacteriophages with genome language models

Genome language models have emerged as a promising strategy for designing biological systems, but their ability to generate functional sequences at the scale of whole genomes has remained untested. Here, we report the first generative design of viable bacteriophage genomes.

ENPP1 inhibitor with ultralong drug-target residence time as an innate immune checkpoint blockade cancer therapy
Cell Reports Medicine
Li Lab

ENPP1 inhibitor with ultralong drug-target residence time as an innate immune checkpoint blockade cancer therapy

Existing ENPP1 inhibitors have been optimized for prolonged systemic residence time rather than effective target inhibition within tumors. Here, we report the characterization of STF-1623, a highly potent ENPP1 inhibitor with an exceptionally long tumor residence time despite rapid systemic clearance, enabled by its high binding affinity and slow dissociation rate.

Predicting cellular responses to perturbation across diverse contexts with STATE
bioRxiv
Computational

Predicting cellular responses to perturbation across diverse contexts with STATE

Here, we introduce STATE, a transformer model that predicts perturbation effects while accounting for cellular heterogeneity within and across experiments. State predicts perturbation effects across sets of cells and is trained using gene expression data from over 100 million perturbed cells.

Cysteine allostery and autoinhibition govern human STING oligomer functionality
Nature Chemical Biology
Li Lab

Cysteine allostery and autoinhibition govern human STING oligomer functionality

The STING innate immune pathway can exacerbate inflammatory diseases when aberrantly activated, emphasizing an unmet need for STING antagonists. However, it remains unclear which mechanistic step(s) are crucial for inhibition of downstream signaling. Here we report that C91 palmitoylation is not universally necessary for human STING signaling.

Genome modeling and design across all domains of life with Evo 2
bioRxiv
Hsu LabHie Lab

Genome modeling and design across all domains of life with Evo 2

We introduce Evo 2, a biological foundation model trained on 9.3 trillion DNA base pairs from a highly curated genomic atlas spanning all domains of life. We train Evo 2 with 7B and 40B parameters to have an unprecedented 1 million token context window with single-nucleotide resolution.

Sequence modeling and design from molecular to genome scale with Evo
Science
Hie LabHsu Lab

Sequence modeling and design from molecular to genome scale with Evo

The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an organism's function. We present Evo, a long-context genomic foundation model with a frontier architecture trained on millions of prokaryotic and phage genomes, and report scaling laws on DNA to complement observations in language and vision.

PELI2 is a negative regulator of STING signaling that is dynamically repressed during viral infection
Molecular Cell
Li Lab

PELI2 is a negative regulator of STING signaling that is dynamically repressed during viral infection

The innate immune cGAS-STING pathway is activated by cytosolic double-stranded DNA (dsDNA), a ubiquitous danger signal, to produce interferon. However, STING activation must be tightly controlled because aberrant interferon production leads to debilitating interferonopathies. Here, we discover PELI2 as a crucial negative regulator of STING.