Arc Research

Rewriting endogenous human transcripts with dual CRISPR-guided 3′ trans-splicing
Cell Systems
Hsu LabKonermann LabPublication

Rewriting endogenous human transcripts with dual CRISPR-guided 3′ trans-splicing

Here, we report the development of RNA-guided trans-splicing with Cas editor (RESPLICE). RESPLICE uses two orthogonal RNA-targeting CRISPR effectors to co-localize a trans-splicing pre-mRNA and to inhibit the cis-splicing reaction, respectively.

Systematic annotation of orphan RNAs reveals blood-accessible molecular barcodes of cancer identity and cancer-emergent oncogenic drivers
Cell Reports Medicine
Goodarzi LabPublication

Systematic annotation of orphan RNAs reveals blood-accessible molecular barcodes of cancer identity and cancer-emergent oncogenic drivers

From extrachromosomal DNA to neo-peptides, reprogramming of cancer genomes leads to the emergence of cancer state-specific molecules. Here, we systematically identify and characterize a large repertoire of orphan non-coding RNAs (oncRNAs), a class of cancer-emergent small RNAs, across 32 tumor types.

cyto: ultra high-throughput processing of 10x-flex single cell sequencing
BioRxiv
preprintComputational

cyto: ultra high-throughput processing of 10x-flex single cell sequencing

Single-cell genomics is rapidly scaling toward billion-cell atlases, but computational analysis has become a critical bottleneck. Here we present cyto, an ultra highthroughput processor for 10x Genomics Flex single-cell sequencing optimized for production-scale analysis.

Stack: In-context learning of single-cell biology
BioRxiv
preprintComputational

Stack: In-context learning of single-cell biology

Single-cell transcriptomics offers the promise of measuring the diversity of cellular phenotypes across species and diseases. Here, we present Stack, a foundation model trained on 149 million uniformly preprocessed human single cells that leverages tabular attention to generate representations for each cell informed by the cells in its context.

Semantic design of functional de novo genes from a genomic language model
Nature
Hie LabPublication

Semantic design of functional de novo genes from a genomic language model

Generative genomic models can design increasingly complex biological systems. However, controlling these models to generate novel sequences with desired functions remains challenging. Here, we show that Evo can leverage genomic context to perform function-guided design that accesses novel regions of sequence space.

Site-specific DNA insertion into the human genome with engineered recombinases
Nature Biotechnology
Hsu LabPublication

Site-specific DNA insertion into the human genome with engineered recombinases

Large serine recombinases can mediate direct, site-specific genomic integration of multi-kilobase DNA sequences without a pre-installed landing pad, albeit with low insertion rates and high off-target activity. Here we present an engineering roadmap for jointly optimizing their DNA recombination efficiency and specificity.

Genome-scale CRISPR screens identify PTGES3 as a direct modulator of androgen receptor function in advanced prostate cancer
Nature Genetics
Gilbert LabPublication

Genome-scale CRISPR screens identify PTGES3 as a direct modulator of androgen receptor function in advanced prostate cancer

The androgen receptor is a critical driver of prostate cancer. Here, to study regulators of AR protein levels and oncogenic activity, we developed a live-cell quantitative endogenous AR fluorescent reporter. Leveraging this AR reporter, we performed genome-scale CRISPRi screens to systematically identify genes that modulate AR protein levels.

scBaseCount: an AI agent-curated, uniformly processed, and autonomously updated single cell data repository
BioRxiv
preprintComputational

scBaseCount: an AI agent-curated, uniformly processed, and autonomously updated single cell data repository

Single-cell RNA sequencing has transformed cell biology by enabling precise transcriptomic measurements of individual cells. Here, we introduce scBaseCount, a single-cell RNA sequencing database that leverages an AI agent to automate discovery and metadata extraction, and standardize data processing.

Learning the Language of Codon Translation with CodonFM
NVIDIA blog
Goodarzi Labpreprint

Learning the Language of Codon Translation with CodonFM

Here, we introduce the EnCodon model series within CodonFM, a family of large foundation models trained on more than 130 million coding sequences spanning over 22,000 species, designed to learn the contextual grammar of codon usage directly from sequence.

Integrated epigenetic and genetic programming of primary human T cells
Nature Biotechnology
Gilbert LabPublication

Integrated epigenetic and genetic programming of primary human T cells

Here, we develop an all-RNA platform for efficient, durable and multiplexed epigenetic programming in primary human T cells, stably turning endogenous genes off or on using CRISPRoff and CRISPRon epigenetic editors. We achieve epigenetic programming of targeted genomic elements without the need for sustained expression of CRISPR systems.