Arc Research

Stack: In-context learning of single-cell biology
BioRxiv
Computational

Stack: In-context learning of single-cell biology

Single-cell transcriptomics offers the promise of measuring the diversity of cellular phenotypes across species and diseases. Here, we present Stack, a foundation model trained on 149 million uniformly preprocessed human single cells that leverages tabular attention to generate representations for each cell informed by the cells in its context.

Semantic design of functional de novo genes from a genomic language model
Nature
Hie Lab

Semantic design of functional de novo genes from a genomic language model

Generative genomic models can design increasingly complex biological systems. However, controlling these models to generate novel sequences with desired functions remains challenging. Here, we show that Evo can leverage genomic context to perform function-guided design that accesses novel regions of sequence space.

Site-specific DNA insertion into the human genome with engineered recombinases
Nature Biotechnology
Hsu Lab

Site-specific DNA insertion into the human genome with engineered recombinases

Large serine recombinases can mediate direct, site-specific genomic integration of multi-kilobase DNA sequences without a pre-installed landing pad, albeit with low insertion rates and high off-target activity. Here we present an engineering roadmap for jointly optimizing their DNA recombination efficiency and specificity.

Genome-scale CRISPR screens identify PTGES3 as a direct modulator of androgen receptor function in advanced prostate cancer
Nature Genetics
Gilbert Lab

Genome-scale CRISPR screens identify PTGES3 as a direct modulator of androgen receptor function in advanced prostate cancer

The androgen receptor is a critical driver of prostate cancer. Here, to study regulators of AR protein levels and oncogenic activity, we developed a live-cell quantitative endogenous AR fluorescent reporter. Leveraging this AR reporter, we performed genome-scale CRISPRi screens to systematically identify genes that modulate AR protein levels.

scBaseCount: an AI agent-curated, uniformly processed, and autonomously updated single cell data repository
bioRxiv
Computational

scBaseCount: an AI agent-curated, uniformly processed, and autonomously updated single cell data repository

Single-cell RNA sequencing has transformed cell biology by enabling precise transcriptomic measurements of individual cells. Here, we introduce scBaseCount, a single-cell RNA sequencing database that leverages an AI agent to automate discovery and metadata extraction, and standardize data processing.

Learning the Language of Codon Translation with CodonFM
NVIDIA blog
Goodarzi Lab

Learning the Language of Codon Translation with CodonFM

Here, we introduce the EnCodon model series within CodonFM, a family of large foundation models trained on more than 130 million coding sequences spanning over 22,000 species, designed to learn the contextual grammar of codon usage directly from sequence.

Integrated epigenetic and genetic programming of primary human T cells
Nature Biotechnology
Gilbert Lab

Integrated epigenetic and genetic programming of primary human T cells

Here, we develop an all-RNA platform for efficient, durable and multiplexed epigenetic programming in primary human T cells, stably turning endogenous genes off or on using CRISPRoff and CRISPRon epigenetic editors. We achieve epigenetic programming of targeted genomic elements without the need for sustained expression of CRISPR systems.

Megabase-scale human genome rearrangement with programmable bridge recombinases
Science
Hsu Lab

Megabase-scale human genome rearrangement with programmable bridge recombinases

Bridge recombinases are naturally occurring RNA-guided DNA recombinases that we previously demonstrated can programmably insert, excise, and invert DNA in vitro and in Escherichia coli. In this study, we report the discovery and engineering of the bridge recombinase ortholog ISCro4 for universal rearrangements of the human genome.

Efficient generation of epitope-targeted de novo antibodies with Germinal
bioRxiv
Hie Lab

Efficient generation of epitope-targeted de novo antibodies with Germinal

Obtaining novel antibodies against specific protein targets is a widely important yet experimentally laborious process. Here, we introduce Germinal, a broadly enabling generative framework that designs antibodies against specific epitopes with nanomolar binding affinities while requiring only low-throughput experimental testing.

Generative design of novel bacteriophages with genome language models
bioRxiv
Hie Lab

Generative design of novel bacteriophages with genome language models

Genome language models have emerged as a promising strategy for designing biological systems, but their ability to generate functional sequences at the scale of whole genomes has remained untested. Here, we report the first generative design of viable bacteriophage genomes.