Borrowing from bacterial viruses: New tools for large-scale genome insertions

Electron micrograph of multiple bacteriophages attached to a bacterial cell wall.
Image: Graham Beards

Today, Arc Institute scientists report in Nature Biotechnology the discovery of new serine recombinases, a family of genome editing tools, for precisely inserting large DNA payloads into the human genome.

Genetic diseases can result from small “typos” or entire missing “paragraphs” in our 3 billion-letter genome. Editing the genome—either by precisely correcting the mutation or by supplying a replacement version of the missing or damaged gene—could rescue cellular function and alleviate disease.

The original version of CRISPR, known as CRISPR-Cas9, uses a kind of molecular scissors in order to cut the human genome at a desired location. This causes a double-stranded DNA break, which activates the cell to stitch its DNA back together. If you supply an additional, corrective DNA template along with a CRISPR cut, the cell will sometimes patch up the DNA break by inserting the template. However, this gene editing approach is unreliable, often resulting in random insertions or deletions of several nucleotide letters at the cut site. Newer variations on CRISPR, such as base or prime editing, are more precise but are limited to making only small changes.

“We wanted a simple tool for inserting large DNA sequences into a human genome that wouldn't require double-stranded DNA breaks or rely on cellular DNA repair machinery,” says Arc Institute and UC Berkeley Bioengineering investigator Patrick Hsu. “We reasoned that nature may have already evolved solutions to these challenges through bacteriophages and other mobile genetic elements, perhaps the oldest systems for genetic diversification.”

Hunting for new genome editing systems in bacterial genomes

Working with collaborators from Stanford University—Ami Bhatt, Lacramioara Bintu, and Michael Bassik—Hsu’s team turned to microbial evolutionary biology to study large serine recombinases (LSRs). In the natural world, bacterial viruses known as bacteriophages carry LSR enzymes that integrate large stretches of bacteriophage DNA into new host bacterial genomes, thereby facilitating viral spread. “They had naturally evolved to tackle the exact problem we were hoping to solve”, says co-first author and Arc senior scientist Matthew Durrant. These LSR enzymes integrate DNA into a bacterial genome by matching pre-existing DNA sequence tags known as attachment sites on both the payload and the target genomic site. Through some DNA acrobatics, LSRs integrate DNA sequences without ever creating a double-strand DNA break in the genome. “This avoids the imprecise smashing of the genetic keyboard at the cut site during DNA repair,” explains co-first author Josh Tycko of Stanford University. Because LSRs can insert many thousands of nucleotides—big enough to supply an entire human gene, or even a couple genes—“we saw the potential of LSR tools to deliver a functional version of a mutated gene into the human genome to restore cellular function,” says Hsu.

Only a handful of LSRs have previously been characterized and developed into research tools, and they have such low efficiency that their utility is greatly limited. According to Tycko, “we imagined that if we could discover new LSRs at a large scale, we would find some that already have robust specificity and efficiency for human genome targeting, right out of the box.” Owing to their long evolutionary battle over billions of years fought through genetic warfare, bacteria and bacteriophages are a treasure trove of enzymes that detect, modify, cut, and combine DNA and RNA, both offensively and defensively. The team devised a clever computational approach to search bacterial and bacteriophage genome sequences for hints of new LSR enzymes and to map their attachment sites, gathering the critical pieces of information necessary to coax the LSR to insert DNA into a desired site for genome editing. Overall, their large-scale discovery effort expanded the diversity of the LSR enzyme family by over 100-fold, resulting in a huge library of enzyme variants that offer a wide variety of options to edit different locations in the human genome.

Building the serine recombinase platform

But it wasn’t simple: many of the recombinases they discovered integrate payloads into a short, specific attachment site sequence that is not found in the human genome. This turned out to be an asset because it introduced editing flexibility. The researchers first installed the preferred attachment sequence as a so-called “landing pad” into the desired genomic location, followed by LSR delivery to drag and drop their desired DNA payload to that site. With this approach, the team demonstrated dramatic improvement over previously developed recombinases, achieving the desired DNA insertion in as many as 7 out of every 10 cells.

Beyond therapeutic applications that seek to put the same DNA payload in as many cells as possible, researchers often want to study many genetic mutations of the same gene in parallel to understand their impact on the function of the gene. To do this accurately, DNA sequences need to be inserted into a pool of cells such that each cell receives one unique piece of DNA in the same place. This has been surprisingly challenging. Scientists currently use laborious virus-based workflows that make insertions randomly in different areas of the genome, complicating the ability to compare apples to apples. The team devised a way to turn this weeks-long saga into a single-day experiment by using LSRs to integrate a library of thousands of DNA sequences into a landing pad with high efficiency. “LSRs have the potential to overcome the great technical challenges involved in existing workflows, significantly accelerating functional genomics research” explains co-first author and UC Berkeley graduate student Alison Fanton.

In addition to the landing pad approach, Hsu’s team discovered other ways to leverage the natural variety of recombinases, discovering other LSRs that prefer to insert DNA into attachment site sequences that are already found in one or more locations across the human genome. The researchers showed that they could computationally predict and experimentally verify the genomic insertion sites for the first time. There’s no need to pre-install a landing pad for these LSRs, but it turns out that they often insert into multiple places in the genome. The best LSRs they identified in this category were approximately 10 times more efficient than the previous standard recombinase in the field that could target the human genome. Fanton is optimistic about the research applications of LSRs: “We found different LSRs that range from having virtually zero natural target sites to thousands of target sites in the human genome, providing a flexible suite of tools that can install a gene into a pre-installed landing pad but also integrate DNA payloads across numerous sites at once.”

Promising outlook for genome editing therapeutics

The Arc team is now developing human genome-targeting LSRs that are highly specific to one site in the human genome and could be used to deliver therapeutic genetic payloads directly into a patient’s cells. Future work in developing and optimizing these LSR tools is aimed at creating a DNA integration machine that, paired with any DNA payload, can catalyze efficient integration into a single unique site within the human genome—making this “a much more scalable and ‘one size fits all’ approach for gene therapy”, Hsu says. Supplying an intact, functional version of a missing or mutated gene could work for many patients, regardless of whether each patient has a different mutation in the original sequence, in contrast to other gene editing approaches that need to be tailored to correct each specific patient mutation.

The potential applications go on: since each LSR prefers its specific attachment site sequence, multiple LSRs can be used simultaneously or sequentially in the same cells to integrate multiple cargoes into locations specified by each LSR’s unique “postal code”, in a process known as multiplexing. And LSRs have practical benefits: they are, on average, less than half the size of most CRISPR genome editing enzymes, making them much easier to deliver into cells and overcoming a major barrier to clinical utility. Tycko explains, “A single, compact LSR protein can integrate a large DNA sequence into the human genome, without any requirement for guide RNAs or cellular co-factors that may be variable across cell types,” illustrating the power and versatility of these enzymes as genome engineering tools. With this work, the genome editing toolbox continues to grow, bringing us closer to realizing the full potential of genome modification for human health.

In addition to co-first authors Durrant, Fanton, and Tycko and principal investigators Hsu, Bhatt, Bintu, and Bassik, the other authors of the paper are Michaela Hinks, Julia Schaepe, and Peter Du of Stanford University, Sita Chandrasekaran and Nicholas Perry of the Arc Institute and University of California, Berkeley, and Peter Lotfy of the Salk Institute. This research was supported by funding from the Rose Hill Innovators Program at UC Berkeley, the Stanford Maternal and Child Health Research Institute, the National Science Foundation, the National Institutes of Health, Stanford ChEM-H, the Emerson Foundation, the Sloan Foundation, the Rainwater Foundation, the Curci Foundation, and Arc Institute.