Paul Datlinger, Arc’s Associate Director of Genome Engineering, on inventing the technologies needed to train a virtual cell
One of Arc’s Institute Initiatives is to build a highly predictive, single-cell foundation model that can help drug discovery efforts or assist biologists in their research. Released in June 2025, our first virtual model of a human cell, called STATE, can predict resulting shifts in RNA expression from a starting transcriptome and a perturbation. But building such a model requires huge amounts of experimental data—much of which falls to Paul Datlinger’s team, which is working to collect single-cell perturbation data on hundreds of millions of cells.
Datlinger (X: PaulDatlinger), Arc’s Associate Director of Genome Engineering, is ideally suited to the task ahead. In 2017, he invented CROP-seq, a method that combines pooled CRISPR screens and single-cell RNA sequencing to understand how a cell’s gene expression profile shifts after perturbations. In 2021, Datlinger also invented “microfluidic droplet overloading,” a technology that enabled researchers to run far more cells per single-cell RNA sequencing experiment. He later spent time at Illumina and helped build Xaira Therapeutics before moving to Arc in 2024.
At each career step, Datlinger has worked backwards: he identifies biological questions that can't be answered with existing tools, then invents entirely new technologies to fill those gaps. "This was the idea behind CROP-seq," he says. "We had a dream data type in mind, then figured out how to get there."
What made CROP-seq particularly rewarding was witnessing the creative ways other scientists applied the method to their own research and seeing the technology unlock experiments that were previously impossible. Building tools that reveal new dimensions of biology is the fundamental thread connecting Datlinger's entire professional arc.
As a PhD student at Austria's Research Center for Molecular Medicine of the Austrian Academy of Sciences (CeMM), Datlinger was trying to answer a basic question about human cells: How are cells epigenetically regulated?
At the time, the methods available to figure this out were quite tedious. Researchers typically knocked out epigenetic modifiers one by one, cultured separate batches of cells, and then performed bulk RNA sequencing to see if gene expression patterns changed. To be rigorous, each modifier needed multiple knockout experiments, often with independent cell lines to verify results. This made the screens slow, labor-intensive, and costly—and because these measurements averaged thousands of cells together, subtle but important effects in specific cell populations were easily missed. As the cell culture dishes kept piling up, Datlinger thought there must be a more elegant way.
So Datlinger began developing CROP-seq, a technology that enabled scientists to study gene function at the level of single cells. Instead of knocking out one gene at a time and measuring averaged responses across large populations, researchers could now use CRISPR to perturb hundreds or thousands of genes simultaneously within a single pooled set of cells. In each cell, Datlinger and colleagues could read out precisely which gene had been disrupted, along with its corresponding RNA expression profile.
Later, during his postdoc, Datlinger pushed single-cell RNA sequencing even further. Single-cell sequencing was just too slow, so Datlinger developed scifi RNA-seq, which combines multiplexed barcoding with droplet overloading to help researchers analyze millions of individual cells per experiment.
Around 2021, it was becoming increasingly clear that the foundational single-cell methods, such as the ones that Datlinger had been building, could be used to collect large datasets for training virtual cell models. For the first time, researchers could perform genome-wide perturbation screens, knocking out different genes across a population of cells and measuring the effects in individual cells.
Moving to the US to pursue virtual cell modeling, Datlinger initially joined the genomics biotech Illumina, allowing him to fully immerse himself and lead a team in an AI frontier lab environment. Several months later, his group spun out into a new company called Xaira Therapeutics. His team contributed functional genomics datasets crucial for training AI models for identifying drug targets. The experience gave Datlinger deep insights into the scale and complexity of building a true virtual cell, and the leaps in single-cell profiling required.
In October 2024, Datlinger joined Arc full-time. His group is collecting the requisite data for the Institute’s Virtual Cell Initiative and also inventing new multi-omics technologies to train future models. For example, his team is developing technologies to measure proteins and RNA simultaneously in single cells at massive scales.
In the future, Datlinger would also like to profile cells continuously, so that the same cell could be studied before, during, and after a perturbation without destroying it. These measurements would be ideal for a "lab-in-the-loop" model where computational predictions are tested in real-time experiments and be used to fine-tune virtual cell models. One area where they plan to apply this approach is cancer immunotherapy, for example to enhance the performance of CAR T therapies.
“I think biological AI will be integrated into every stage of research and drug development. You use it to understand your data set, to find genes that cause diseases and to suggest the next experiment you should do,” he says.
Virtual cell modeling is still in its early stages, but Datlinger believes it's probably the most important thing he could be working on. “My intuition is that biology is structured and systematic enough, that it should eventually work,” he says.
“There’s a fundamental disconnect between the biology we can read and that we can perturb," he explains. “Most cell states are actually completely inaccessible for functional genomics screens. You don’t have the cell number, can’t culture them in their natural state, or you might not be able to engineer the cells that you’re interested in.”
But you can almost always obtain detailed molecular profiles like RNA-seq and ATAC-seq. In the ideal case, virtual cell models trained on cells we can screen should be able to transfer what they’ve learned to other contexts, Datlinger says. This would enable virtual experiments in cells we’ll never be able to test. Still, computational predictions alone will not replace laboratory experiments entirely. Biology will always rely on experiments and real-world validation—the most extreme case being clinical trials, which would be hard to replace entirely with AI predictions. Virtual cells are being designed instead to guide scientists, helping them test ideas and refine their thinking more quickly.
"You can't replace experiments completely, even though that would be the dream," Datlinger says. "But a virtual cell could help you perform drug screens, toxicity studies or design cell therapies directly on the computer and find the right patient population for the most promising clinical trial.
