Meet Arc Innovation Investigator in Residence, Brian Hie, who is building deep learning models to design biology across scales

Brian Hie, Arc Institute Innovation Investigator in Residence

As a pioneer at the interface of machine learning and biology, Brian Hie (X: @brianhie) is known especially for developing deep learning models that can interpret the language of entire genomes—not just individual genes or proteins.

Last year, Hie's group, in collaboration with many others, introduced Evo, a deep learning model trained on prokaryote genome sequences. Despite being trained on a single type of data, Evo demonstrated generalization across vastly different tasks, from predicting the effects of genetic mutations to designing synthetic CRISPR systems. More recently, Evo 2, an expansion of the original model trained on genomes from across the entire tree of life, opened the door to applications like minimal genome design, chromatin accessibility engineering, and beyond.

At Arc Institute, where he is an Innovation Investigator in Residence, and Stanford, where he is an Assistant Professor of Chemical Engineering at Stanford Engineering and a Dieter Schwarz Foundation Stanford Data Science Faculty Fellow, Hie's lab aims to understand—and eventually design—more complex biological systems, from single molecules up to entire organisms. Below, Hie discusses his career path from graduate school to projects like Evo.

***

When did you first become interested in doing research at the intersection of machine learning and biology?

When I started graduate school, I wasn't fully committed to biology at all. My background was computational, and I've always enjoyed tackling difficult technical problems—especially the quantitative ones. But what really gives me intrinsic motivation is applying computation to problems that feel practical to me.

When I first joined MIT as a student, I was deciding whether to focus on biology or natural language processing. At that point, biology seemed cool—but so did robotics and language. All of these seemed like exciting applications of computer science. Biology, though, felt particularly appealing because it was such a different domain; it seemed constantly intellectually challenging.

At that time, though, many biologists were skeptical of computational approaches, or at least skeptical of anything beyond simple linear models.

You were still in graduate school when AlphaFold came out in 2020. What was that moment like for you?

I found AlphaFold extremely inspirational for several reasons. At that time, many people in academia—including me—did some soul-searching. We wondered if there was something unique about the DeepMind environment: maybe their team structure, the fact that it was industry-based, or just the enormous compute resources they had. That definitely crossed my mind a lot. Back then, I was just a grad student without much power or resources—it's not like I could spin up a team of professional machine learning engineers or access a huge computing cluster to run my own experiments. So I quickly realized it probably wasn't the right move for me to directly compete in protein structure prediction.

Instead, I decided to focus on a different kind of project. The core idea was to draw inspiration from natural language processing and apply it to viral evolution. Specifically, I thought if you trained a language model on viral sequences from nature, it might learn evolutionary patterns and constraints better than existing models trained on site-independent multiple sequence alignments.

We applied this to the specific problem of viral escape prediction. For a virus to escape immunity, it needs to achieve two things at once: it has to maintain fitness—it still needs to function properly—but it also has to change enough from the original strain to evade detection by the immune system. The key insight from this work was that certain properties of language models might capture both constraints simultaneously.

Importantly, though, this wasn't a structural or molecular-level project. It was purely a sequence-level approach trained on protein sequences from viruses. In fact, most of my work—even now—is primarily at the sequence level rather than at the atomic or structural level.

After your PhD, you moved to Meta AI for a while to work on the ESM Metagenomic Atlas. What was that like?

I had done a lot of computational biology work in graduate school, but I wanted to get more serious about hands-on biology experiments. I ended up going to Stanford, working with Peter Kim in a role somewhere between a traditional postdoc and an independent fellowship. But basically, I was retraining as a protein biochemist. I was in the lab doing experiments and very much integrated into Peter's group. In my mind, that felt like my biology postdoc.

At the same time, I also wanted to become more serious about machine learning. So, while doing that experimental work, I was also a visiting researcher at Meta AI on the ESM team, which trained large protein language models. That's where I really got a chance to see engineering applied to machine learning at a huge scale, and where I saw firsthand the value of models trained with very simple, self-supervised objectives, but on massive datasets. To me, this second experience also felt like a sort of "postdoc," but focused specifically on machine learning.

Why did you decide to set up your lab at Arc Institute and Stanford?

There are really three things you need to do top-tier work in machine learning and biology. You need compute, access to an experimental laboratory, and excellent people. Arc is unique because it's primarily a biological research institute, but they have also built out compute resources. The students at Stanford also provide really strong, unique computational and experimental capabilities. This combination makes tackling design problems more feasible and exciting.

How did the Evo project get started?

When I was thinking about potential projects for my own lab, one of the ideas came from this thought: evolution is a unifying theory of biology, so why stop machine learning models on individual proteins or single genes?

The idea was, if you trained a model purely on raw genomic sequence—just DNA—and took the scaling hypothesis seriously, you might get a model that also understands proteins essentially for free, but also captures higher-level, organism-wide biology.

Also, I wanted the model to be as general as possible; both predictive but also capable of designing new types of biology. One of the first evaluations we ran on Evo 1 was variant effect prediction. Basically, how well does the model predict mutation effects on protein fitness?

But because we also wanted to use the model for design, we built Evo using an autoregressive model instead of a masked language model, because that makes it better suited for generative tasks—even though it comes at the cost of some other abilities, like learning structural information. We know masked language models are typically better at picking up long-range contacts in proteins, whereas autoregressive models are better—or at least easier—to work with for design.

What's next for your group?

There are probably two main directions for Evo. One is just making the model better at representation learning and prediction—probably by exploring bidirectional architectures, for example. The other direction is toward design tasks beyond the molecular level. For instance, we've recently made progress on designing and controlling epigenomic states.

For bigger design tasks, though, Evo alone probably isn't sufficient. We'll need some creative new ideas—things like using machine learning for circuit design, minimal pathway engineering, or even multicellularity. How do you control cell communication or cell contact using machine learning? Or how can you design circuits involved in morphological patterning, controlling where certain biological events happen in an organism?

These are challenging but exciting questions. Some of them are more achievable right now than others. But I definitely think there's no reason these models need to stop at single molecules—so much interesting biology happens at the tissue or organismal level, and that's a big motivation behind our future modeling work.

We're not just going to train DNA language models for the next decade. The real question now is: what's the next model we need to build to achieve these more ambitious design goals?

How are you planning to improve the model going forward? Is it about algorithmic changes, or just feeding it more data?

Now that we have clear benchmarks, we know better what people really care about. For example, human variant-effect prediction matters a lot.

Right now, Evo's pretraining data intentionally downweights human sequences and overrepresents organisms like plants or insects. If we changed the data composition to sample more heavily from human evolution, we'd probably get quick improvements on human-focused tasks. The model is already state-of-the-art on many non-human benchmarks, so rebalancing our data toward human sequences would likely boost performance specifically in variant prediction for humans.

Another thing is architectural tweaks: models with bidirectional context—like masked language models—tend to be better at variant prediction. So we could optimize Evo by making small algorithmic adjustments like that. Those kinds of targeted improvements are the most immediate steps, and they'll probably happen under the Evo umbrella.

But again, Evo by itself isn't going to solve all of biology. A helpful way to think about where the lab is headed next is to ask ourselves: What can't we do with Evo—even if it became ten times better?

With a 10x improvement, maybe we could tackle minimal genome design in simpler contexts, like prokaryotes, or specific operons and pathways. But if you want to design highly synthetic circuits or biological logic that's completely novel, Evo would likely just be one piece of the puzzle—not the entire model.

Another example is natural language interaction. People increasingly care about having models they can talk to directly in plain English, especially now with chatbots being so common. Evo can't do that right now; it would require a fundamentally different kind of model. But that's exactly the kind of limitation we should be thinking about as we develop the next generation of tools.

Still, I believe these models will eventually do more than many people currently expect. For example, if we manage to make real progress on designing minimal organisms, that would serve as a powerful demonstration. Ultimately, biologists want to see real-world results. Experimental validation makes it clearer what these models can really achieve.