Field note

Why AI Agents Need Ontology

Give the model a map of the work before you ask it to run the work.

Agents need a working model of the business before they can act inside it.

Most AI agent failures begin one layer below the tool call.

The agent sees a CRM record, a support thread, a contract clause, a Slack update, and a few tool descriptions. Then it has to infer the business behind those fragments. Sometimes it gets lucky. In production, luck starts looking expensive.

A real business is full of overloaded words. "Account" can mean the billing entity, the strategic relationship, the workspace, the parent company, or the team currently yelling in the support queue. "Risk" can mean churn risk, security risk, implementation risk, legal risk, or a founder's private sense that this deal is going sideways.

When those meanings live in people's heads, docs, and one-off prompts, agents guess. The failure mode is rarely dramatic. The agent drafts the wrong email, moves the wrong deal stage, files the ticket under the wrong category, or asks a human for help because it cannot tell what kind of situation it is in.

That is the whole argument, compressed: action depends on meaning. An agent with fifty tools and no model of the business has capability without comprehension.

Tool access gives the model reach. Ontology gives the model a world.

The interesting question is not whether your agent has a world model. It has one the moment it acts: some picture of the business is driving every draft, every tool call, every escalation. There is early theory work pointing at how unavoidable this is; a recent DeepMind paper argues that general agents contain world models necessarily, with the formal debate still young. The operational point does not need to wait for that debate. The only thing you control is whether the model your agent acts on is implicit guesswork, reconstructed from scraps on every task, or an explicit artifact you can inspect, version, and govern.

Where ontology sits in the agent stack

Anthropic's agent guidance draws a useful line: workflows follow predefined code paths, while agents dynamically direct their own processes and tool use. That extra autonomy changes the architecture problem. Once the model chooses actions, the system has to tell it what the business objects mean and what counts as a valid move.

Layer	What it gives the agent	What remains unresolved
Prompt	Task framing, tone, broad instructions, and temporary context.	A prompt cannot govern the company's definition of Account, Renewal Risk, or Approval Required across tools.
RAG	Relevant documents, historical examples, and retrieved notes.	Retrieved text still has to be classified into business objects, states, and decisions.
Tool schema	Callable actions with parameters and descriptions.	The schema rarely explains when an action is allowed, which evidence is enough, or who must review it.
Ontology	Domain objects, relationships, states, evidence rules, and action boundaries.	The agent can connect context to an operating model instead of reconstructing the business each turn.

Prompting, retrieval, and tool definitions all matter. They solve different parts of the problem. Ontology answers the question sitting underneath them: what kind of thing is this, how does it relate to the rest of the business, and what should change now that we know it?

A good agent should spend its reasoning budget on the new situation in front of it. The stable grammar of the business belongs in the system.

The production symptoms are concrete

A claim like this has to predict failures or it is just category language. If agents really need ontology, removing it should break things in recognizable ways. It does. Here is what the breakage looks like in production.

Symptom	What the agent is missing	What ontology adds
The agent uses the wrong object.	"Account" means different things in sales, billing, product, and support.	Canonical business objects and relationships between them.
The agent calls the right tool at the wrong time.	The tool exists, but the action boundary is undefined.	State-specific action rules and approval requirements.
Memory becomes a pile of snippets.	The agent retrieves prior text without knowing what changed.	Typed memory attached to Accounts, Commitments, Risks, Issues, and Decisions.
Human review is noisy.	Every uncertain action escalates because the system cannot locate the judgment point.	Review gates tied to risk, authority, evidence, and reversibility.
Provenance is vague.	The system says it used context, but cannot show which evidence changed which conclusion.	A trace from source evidence to object state to action.

The research record agrees. When Berkeley researchers built the first comprehensive taxonomy of multi-agent failures, analyzing five frameworks across more than 150 tasks, they catalogued fourteen distinct failure modes. The first category was not model weakness. It was specification and system design: failures born before the first token, in everything the system never told the agent about the world it was operating in.

These are boring failures, which is exactly why they matter. They show up in the places where companies actually want agents: customer escalation, renewal support, implementation handoff, RFI response, account research, contract review, and internal operations. And they compound into the statistic everyone in this market should sit with: Gartner predicts over 40 percent of agentic AI projects will be canceled by the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls. Most of those projects will have had tools. Fewer will have had a world.

A worked example: customer escalation

Take a support agent assigned to customer escalations. The model can summarize a thread and draft a response. That is the easy part. The hard part is knowing what the thread means to the business.

1
Classify the situation
The thread is classified as an Implementation Blocker, Product Issue, Contract Risk, Relationship Escalation, or some combination of those types.
2
Attach it to business objects
The issue connects to an Account, Contract, Stakeholder, open Commitment, affected Product Surface, and current Implementation Phase.
3
Update state with evidence
The missed handoff, contract clause, customer quote, and prior ticket become evidence that changes the state of the Commitment or Risk.
4
Select the allowed action
The agent can draft a response, open a work item, route to an owner, or request review depending on the state and authority rules.
5
Show the trace
A human can see the source evidence, the object state that changed, and the reason the agent chose that action.

The ontology behind that workflow might be small: Account, Contract, Stakeholder, Commitment, Issue, Risk, Evidence, Action, Owner, Review Gate. Small is the point. The agent now has a world with nouns, state, and consequences.

The relationship map behind one escalation

The useful structure is the path from evidence to state to action.

AccountContractCommitmentIssueEvidenceRiskActionReview Gate

AccounthasContract

ContractdefinesCommitment

IssueproducesEvidence

EvidencechangesRisk

RiskgovernsAction

Actionmay requireReview Gate

Knowledge graphs for AI agents become useful here because the relationships carry operational meaning. Account has Contract. Contract defines Commitment. Issue produces Evidence. Evidence changes Risk. Risk governs Action. Action may require Review Gate. The graph becomes a decision substrate the agent can traverse.

The smallest useful ontology

Start smaller than a company-wide ontology project. A giant diagram usually becomes too large to verify and too abstract to operate. A minimum viable ontology starts with one workflow where the agent already has a job to do.

Core objects: the five to ten nouns experts already use when discussing the workflow.
Relationships: how those objects depend on, contain, block, create, or govern each other.
States: the meaningful statuses an object can occupy, such as Proposed, Blocked, At Risk, Approved, or Superseded.
Evidence: the source material that can change an object's state.
Actions: what the agent can do around each object.
Review gates: when a human must approve, reject, or supply judgment.
Provenance: the path from source evidence to object state to agent action.

That is enough structure to change the system. Memory can update objects instead of accumulating notes. Tools can bind to states instead of floating as generic capabilities. Review can move to the moments where judgment matters. Provenance can explain the path from evidence to action.

If the word ontology still sounds heavier than that checklist, it helps to know it has a thirty-year-old engineering definition: an "explicit specification of a conceptualization", as Tom Gruber put it in 1993. W3C standards like RDF and OWL turned that idea into heavy machinery, and most teams can ship their first agent ontology without any of it. The discipline is what matters: make the domain explicit enough for software to use.

What changes after the ontology exists

Before	After
The agent retrieves five similar notes.	The agent updates the relevant Account, Commitment, Risk, or Decision.
The prompt says to be careful with renewals.	Renewal Window, Commercial Risk, Approval Requirement, and Review Gate are defined objects.
Every exception goes to a human.	Only exceptions crossing authority, risk, or evidence thresholds require review.
The agent explains itself in prose.	The system shows the source evidence, object state, and allowed action.
Each agent needs its own pile of instructions.	Agents share the same domain model and use different tools around it.

This is what the semantic layer has to become for agents: not metric definitions in a warehouse, but runtime structure between the model and the business systems, giving the model the terms, constraints, and relationships it needs to act. The deeper argument, that one shared substrate can compile into many valid views for many consumers, is the subject of Polymantic Systems.

The market is arriving at the same conclusion from the data side. Salesforce now argues that "Agents need clear and trusted semantic definitions to translate intent and to produce accurate and relevant outputs," and the catalog vendors warn that agents without a governed context layer "routinely hallucinate, misinterpret business metrics, and violate governance rules in production." The demand side of this argument is closing fast. The open question is what the layer underneath should be.

How Penumbra builds the model behind the agent

Penumbra starts with the workflow where AI is already close to useful: proposal response, customer escalation, account intelligence, expert research, partner onboarding, RFI review, implementation handoff. The expert describes the work in business language. Penumbra captures the objects, rules, relationships, review standards, and evidence model around that workflow.

1
Capture the expert's domain commitments
What exists in this workflow? Which states matter? Which distinctions separate a good decision from a bad one?
2
Turn the commitments into a domain model
The model defines the objects, relationships, evidence, action rules, and review gates the agent will use.
3
Compile the model into working surfaces
One model compiles into every surface the work touches: extraction, memory, agent tools, APIs, guardrails, review, provenance.
4
Let people and agents share context
Humans inspect and improve the model. Agents act through it. The workflow stops depending on private tribal knowledge.

So the title question has a short answer. Agents need ontology because the model needs a governed representation of the work before it can act on the work. Without one, every agent re-derives the business from scraps, on every task, forever.

Build the model behind the work. Then give the agent tools.

Where ontology sits in the agent stack

The production symptoms are concrete

A worked example: customer escalation

Classify the situation

Attach it to business objects

Update state with evidence

Select the allowed action

Show the trace

The relationship map behind one escalation

The smallest useful ontology

What changes after the ontology exists

How Penumbra builds the model behind the agent

Capture the expert's domain commitments

Turn the commitments into a domain model

Compile the model into working surfaces

Let people and agents share context

Read next