field_note // 02 / vector V/02 / policy · safety 14 min read

Guardrails for agent behaviour.

vectorV / 02
statusin progress
updated2026.05.14
length~ 2,800 words

An agent is a piece of software with discretion. Once you give one access to a wallet, an inbox, an internal API, a payment processor — the question stops being "what can it do?" and becomes "what is it allowed to do, and how do we know?" This note describes the architectural shape of a policy and audit layer that sits between agent runtimes and the systems they touch.

The problem

Agents are getting capable enough that the rate-limiter on what they can do for a company is no longer the model — it is the trust the company is willing to extend. Today that trust is established by hand: an engineer reviews the prompts, a finance lead reviews the tool list, a security lead reviews the integrations, everyone reviews the agent's outputs for a few weeks, and then someone, eventually, takes a deep breath and turns the rate-limit up.

That process doesn't scale. As soon as a company has more than a handful of agents — some on engineers' machines, some embedded in product, some running marketing or finance ops — the answer to "what is this allowed to do?" stops being knowable. The agents drift, the prompts evolve, the tool list grows, and the audit trail is whatever someone happened to log.

Three things go missing at once: a policy expressive enough that a non-engineer can write it, an enforcement layer that runs between the agent and the systems it touches (not inside the model, which can be talked out of it), and an audit trail trustworthy enough that a finance lead, an auditor, or a regulator can read it without taking the platform's word.

The architectural shape

A policy and audit layer for autonomous agents has three load-bearing pieces. None of them, on their own, solves the problem. Together, they make the difference between "we have an agent in production" and "we can defend the agent's behaviour to anyone who asks."

Crucially, the policy is not enforced inside the model. Models can be talked out of their own rules — this is a property of the training procedure, not a bug to be patched. The enforcement layer lives outside the agent, on the path between the agent and the world. If the agent never even gets the tool descriptor for an action it is not allowed to take, the question of whether it would have called it correctly never arises.

design rule The agent does not enforce its own rules. Policy is evaluated between the agent and the world — never inside the prompt. The audit trail is a property of the gateway, not a property of the model's good behaviour.

Multiple chokepoints, one source of truth

The same policy needs to apply to a few very different kinds of agent: the coding sessions running on an engineer's laptop, the agents embedded inside a company's own product, and the autonomous workflows running on a server. Each has a different host runtime, but the question they generate is identical: "is this tool call allowed?"

The implication is that the policy layer has to ship as several integration paths over a single source of truth — close to the agent for low-stakes tool calls, out-of-process for higher-stakes ones, and identical in policy semantics no matter which path is in use. Same policy, same audit ledger, different physical chokepoints depending on how and where the agent is deployed.

Why this matters now

The current state of the art for "agent governance" is a shared document. Companies are deploying agents into production with informal rules — "don't pay anything over a threshold without checking with me first" — and no way to verify that the rules are being respected, let alone produce an audit trail an external party would accept.

The cost of doing nothing is going up quickly: every new framework drops the bar for shipping an agent into a production environment, while the cost of a single mis-aimed action has not changed. The pricing here is asymmetric and the asymmetry is widening.

Our bet is that the policy and audit layer becomes a horizontal piece of infrastructure, in the same shape that observability and feature flags became horizontal. The interesting research questions are about policy expressiveness (how do you describe acceptable behaviour without writing code?), intent inference (can the system propose policies from observed agent runs?), and graceful override (when an operator needs to grant a one-time exception, how does that get captured?).

Applied with the ecosystem

The first applications of this work live inside the Aventus ecosystem and a small group of design-partner companies. The settlement, identity, and attestation primitives in the ecosystem map cleanly to the three things the enforcement layer needs to do its job — verify who is calling, verify what is allowed, and commit the outcome to an audit trail that survives outside our infrastructure.

This isn't a finished product. It's a research direction with a few load-bearing prototypes. The open questions are the interesting ones: how expressive can a policy language be before non-engineers stop writing it; how do you keep audit cost low enough that every tool call can be logged in production; and what does the recovery shape look like when an agent does something that, in retrospect, the policy should have caught.

// vectorV / 02 — guardrails for agent behaviour
// partnersaventus ecosystem, et al.
// open questionspolicy IR · intent inference · override
// next noteverifiability of agent runs
// open a channel with the lab →