BioReason-Pro: Teaching AI to "Think" Like a Biologist About Proteins

The breakthrough in vision-language models came from teaching AI to understand images and reason about them. Roboticists applied this principle to machines, giving them sensors to perceive the world while using language to help them think through actions. Inspired by both approaches, we combined ESM3 (a foundation model that understands proteins at the molecular level) with Qwen3 (a language model capable of reasoning). The result is a system that can emulate how a biologist thinks while processing biological context at scales humans can't match.

We first demonstrated this with BioReason in 2025, which had an LLM use Evo 2 representations of DNA variants to predict how genetic changes lead to disease. Today, we introduce BioReason-Pro, the first multimodal reasoning model for protein function prediction. Unlike DNA, proteins have an incredible diversity of possible structures, domains, and interactions with other proteins. Predicting what a protein does requires integrating multiple types of evidence and biological knowledge. BioReason-Pro does all of this while showing its work like a peer.

Try it for yourself at https://www.bioreason.net/. You can paste in a protein sequence and BioReason-Pro will generate a full reasoning trace, functional summary, and GO term annotations, explaining not just what the protein does, but also how it reached that conclusion.

Why is this sequence a kinase?

Over 250 million protein sequences can be found in the UniProt database. Experimental functional annotations exist for fewer than 0.1% of them. We've seen AI models that are capable of protein annotations, but even when they are accurate, it isn't clear why since they don't explain how they arrived at their answer. For example, if they simply tell you a protein is a kinase, it is hard to know whether to trust that statement.

We built two systems to change this. First, we introduce GO-GPT, which predicts Gene Ontology terms with state-of-the-art accuracy, surpassing the top publicly available models in the CAFA5 (Critical Assessment of Functional Annotation) competition. Then we developed BioReason-Pro, which integrates protein embeddings from the ESM3 foundation model to generate reasoning traces explaining its predictions.

For example, instead of providing an output like "kinase" in response to a particular sequence, BioReason-Pro would walk through the evidence. "The protein contains a Pkinase domain at residues 50-300 that catalyzes phosphorylation. The SH2 domain suggests signal transduction. Given what is known about the organism and the broader biological context, it likely functions in receptor tyrosine kinase signaling at the plasma membrane."

You can then evaluate that logic, interrogate it meaningfully, and decide if it's worth experimental follow-up.

Building an AI protein expert

To train BioReason-Pro, we needed examples showing how a biologist would think step-by-step from protein features to function. The problem was that these types of analyses haven't been captured in a standardized, machine-readable format.

Our solution was to use GPT-5 as a synthetic expert. For 130,000 proteins, we gave it everything a biologist would look at, from which domains are present (like "Pkinase" or "SH2"), which proteins it interacts with, what organism it's from, and any existing GO term annotations. We then asked it to "Reason through this protein's function the way an expert would."

The results surprised us. GPT-5 has never seen a protein sequence directly, but it's absorbed millions of scientific papers. It knows that kinase domains catalyze phosphorylation, that SH2 domains mediate signaling interactions, and how these pieces typically fit together. Given the right context, it could construct biologically coherent reasoning chains, connecting domains to molecular functions, functions to processes, and processes to mechanisms.

We trained BioReason-Pro on these examples, teaching it to produce similar step-by-step reasoning. The resulting model is BioReason-Pro SFT. We then optimized it further using reinforcement learning, grading it on whether its GO term predictions matched the real biology and adjusting accordingly, producing BioReason-Pro RL. In combination, the supervised fine-tuning taught the model how to reason clearly like a biologist and reinforcement learning sharpened that reasoning to be more accurate and concise.

One unexpected finding from reinforcement learning was that the model's reasoning got shorter and more accurate at the same time. Compared to the SFT model, the RL version seems to be reasoning more efficiently rather than simply saying less. The multimodal approach was also efficient. Using ESM3 embeddings meant the language model could build on extensive pretraining across millions of protein sequences rather than learning protein biology from scratch.

BioReason-Pro at Work

A common starting point for researchers trying to determine the function of newly identified protein sequences is to use an alignment tool like BLAST to find similar sequences and domain annotations from those sequences. BioReason-Pro goes far beyond this by identifying functional modules at the domain level and using language understanding to synthesize evidence across sequences, pathways, and scientific literature.

The model excels with proteins containing identifiable domains like kinases, receptors, or DNA-binding motifs, while it struggles with irregular proteins lacking clear domain structure. When there's nothing to reference, performance drops. Still better than BLAST, but it's an area for improvement. We think this reflects training data bias, as about 95%+ of proteins have regular architectures. The model hasn't practiced reasoning about extreme novelty.

Our initial assessments of BioReason-Pro's performance relied on an LLM to judge the predictions. While these results were promising, it wasn't clear if real molecular biologists would agree. Reassuringly, when we asked 27 colleagues to compare BioReason-Pro's predictions against curated UniProt database entries, they preferred BioReason-Pro 79% of the time. Arc Core Investigator Christoph Thaiss and his lab put the model through independent evaluation and came away with similar conclusions when providing protein sequences longer than 150 amino acids. However, the model provided mixed results with sequences of roughly 100 amino acids and performed poorly with shorter peptides. Given that less than 0.5% of the training data was peptides of 50 amino acids or shorter, we knew this would be a challenge.

To dive more deeply into some of the reasoning traces, the preprint includes two examples that highlight BioReason-Pro's capabilities in more detail. For the first example, it generates de novo predictions that we were able to validate against published experimental structures. We were interested to see that the model's internal attention localized to the exact contact residues resolved in that structure. It wasn't just getting the right answer. It was looking at the right evidence.

In a second example, a protein contains a catalytic domain that would normally indicate enzymatic activity. But three residues in its active site have mutated to form a binding surface for another protein. BioReason-Pro correctly identified this as a structural scaffold rather than an enzyme. When we looked at what the model was attending to when making that prediction, its attention concentrated on exactly those three repurposed residues.

A likely use case for a model like BioReason-Pro would be to gain insight into the function of AI-designed proteins. For example, work from Brian Hie's lab used an Evo model to generate functional anti-CRISPR proteins. Two of these lacked any sequence or structural similarity to known proteins. Even there, the model generated biologically coherent hypotheses worth investigating. Collectively, these examples showcase the potential utility of BioReason-Pro for helping researchers understand the function of both natural and artificial proteins.

We are also releasing BioReason-Pro's predictions for over 240,000 proteins, including the Human Protein Atlas. Even for well-characterized proteins, the model often surfaces functional connections or mechanistic details not obvious from existing annotations. We hope this becomes a useful resource for the community.

And while BioReason-Pro was built to understand protein function, there's an equally exciting direction ahead: can a model that reasons about existing proteins become a tool for designing new ones with specific function? That's what we want to explore next.

In the conversation below, the researchers behind BioReason and BioReason-Pro discuss how these models work and what sets them apart from conventional prediction machines.

###

Fallahpour, A., Seyed-Ahmadi, A., Idehpour, P., Ibrahim, O., Gupta, P., Naimer, J., Zhu, K., Shah, A., Ma, S., Adduri, A., Güloglu, T., Liu, N., Cui, H., Jain, A., de Castro, M., Fallahpour, A., Cembellin-Prieto, A., Stiles, J. S., Nemčko, F., Nevue, A. A., Moon, H. C., Sosnick, L., Markham, O., Duan, H., Lee, M. Y. Y., Salvador, A. F. M., Maddison, C. J., Thaiss, C. A., Ricci-Tam, C., Plosky, B. S., Burke, D. P., Hsu, P. D., Goodarzi, H., & Wang, B. (2026). BioReason-Pro: Advancing Protein Function Prediction with Multimodal Biological Reasoning. BioRxiv. https://doi.org/10.64898/2026.03.19.712954




Adibvafa Fallahpour (X: @adibvafa) is an AI researcher at Arc Institute and Vector Institute studying Computer Science and Neuroscience at the University of Toronto.

Arman Seyed-Ahmadi (X: @arman1sa) is an AI Scientist at UHN.

Parsa Idehpour (X: @Radii2323) is an AI researcher at Arc Institute studying Machine Learning at the University of Pennsylvania.

Bo Wang (X: @BoWang87) is SVP and Head of Biomedical AI at Xaira Therapeutics, Professor at the University of Toronto, and CIFAR AI Chair at the Vector Institute.

Hani Goodarzi (X: @genophoria) is an Arc Institute Core Investigator and an Associate Professor of Biophysics & Biochemistry at UCSF.