Guide Labs unveils a new generation of interpretable large language model

Guide Labs introduces an interpretable large language model designed to improve transparency, explainability, and trust in AI systems for enterprise and research use.

Feb 24, 2026 - 17:39
Feb 24, 2026 - 17:40
 1
Guide Labs unveils a new generation of interpretable large language model
Image Credits: Guide Labs

One of the hardest parts of working with deep learning systems is figuring out why they behave the way they do. Whether it’s xAI repeatedly running into trouble while fine-tuning Grok’s political edge, ChatGPT drawing criticism for sycophantic behaviour, or everyday hallucinations that appear in many models, it remains difficult to “look inside” a neural network with billions of parameters and clearly explain what caused a specific output.

Guide Labs, a San Francisco startup led by CEO Julius Adebayo and chief science officer Aya Abdelsalam Ismail, says it has a solution. On Monday, the company open-sourced an 8-billion-parameter large language model called Steerling-8B, trained using a new architecture designed for interpretability. The key promise: every token generated by the model can be traced back to its origins in the model’s training data.

In practice, that traceability could mean identifying the exact reference materials behind the facts the model states, or delving much deeper into how it forms ideas about complex concepts such as humour, identity, or gender.

“If I have a trillion ways to encode gender, and I encode it in 1 billion of the 1 trillion things that I have, you have to make sure you find all those 1 billion things that I’ve encoded, and then you have to be able to turn that on, turn them off reliably,” Adebayo said. “You can do it with current models, but it’s very fragile … It’s sort of one of the holy grail questions.”

Adebayo began this line of research while working on his PhD at MIT. He co-authored a widely cited 2018 paper showing that existing methods for understanding deep learning systems were not reliable. That work eventually fed into a different approach for building LLMs. Rather than trying to interpret a black-box model after it has been trained, developers design interpretability into the architecture from the beginning.

Guide Labs’ method introduces a concept layer within the model. This layer “buckets” information into traceable categories, allowing specific outputs to be linked back to organised, labelled sources. The tradeoff is that the approach requires more up-front data annotation. But the company says it can reduce the burden by using other AI systems to assist with labelling, enabling it to scale up training. Steerling-8B is the company’s largest proof-of-concept so far.

“The kind of interpretability people do is … neuroscience on a model, and we flip that,” Adebayo said. “What we do is actually engineer the model from the ground up so that you don’t need to do neuroscience.”

A natural concern with any more controlled, structured architecture is that it might dampen the emergent behaviour that makes LLMs valuable — especially their ability to generalise to new situations or reason about concepts that were not explicitly taught during training. Adebayo argues that this kind of generalisation still appears in Sterling-8 B. His team tracks what it calls “discovered concepts,” ideas the model appears to generate on its own, such as quantum computing.

Adebayo believes interpretability will become essential across the industry. For consumer-facing LLMs, he says this approach could help model builders block or limit the use of copyrighted materials and provide stronger, more reliable control over sensitive outputs involving topics like violence or drug abuse. He also points to regulated environments — such as finance — where a model assessing loan applicants should consider legitimate factors, such as financial records, while excluding protected attributes, such as race. In these settings, controllable and transparent systems will matter more.

The company also sees interpretability as a growing need in scientific applications. Deep learning has produced breakthroughs in areas like protein folding. Still, researchers often want more clarity on why a model generated a particular prediction or why it concluded that a certain combination is promising. Guide Labs says it has also developed technology aimed at this scientific interpretability problem.

“This model demonstrates that training interpretable models is no longer a sort of science; it’s now an engineering problem,” Adebayo said. “We figured out the science, and we can scale them, and there is no reason why this kind of model wouldn’t match the performance of the frontier-level models,” even though those frontier systems typically have far more parameters.

Guide Labs claims that Sterling-8 B can achieve 90% of the capability of existing models with less training data,  attributing this to the model’s architecture. The company’s next step is to build a larger version and begin offering API access along with more agent-focused capabilities for users.

Guide Labs emerged from Y Combinator and raised a $9 million seed round led by Initialised Capital in November 2024. Adebayo says the bigger mission is to make interpretability a standard feature of advanced AI systems rather than a specialised research effort.

“The way we’re currently training models is super primitive, and so democratising inherent interpretability is actually going to be a long-term good thing for our role within the human race,” Adebayo said. “As we’re going after these models that are going to be super intelligent, you don’t want something to be making decisions on your behalf that’s sort of mysterious to you.”

What's Your Reaction?

Like Like 0
Dislike Dislike 0
Love Love 0
Funny Funny 0
Angry Angry 0
Sad Sad 0
Wow Wow 0
Shivangi Yadav Shivangi Yadav reports on startups, technology policy, and other significant technology-focused developments in India for TechAmerica.Ai. She previously worked as a research intern at ORF.