System Manual

Trust & Governance

How autonomy is earned and lost: the five trust levels, the approval gates each one unlocks, the rules that never relax, and the audit trail behind every decision.

Autonomy is earned, not granted

Each agent carries a trust profile per organization, scored on a 1–5 ladder. New agents start at level 1 (Supervised). A streak of correct decisions promotes an agent up the ladder; a human override knocks its progress back, and an error drops it a level outright. Trust is the dial that decides how much an agent can do without asking — and it is per-org, so an agent earns its standing separately in every workspace.

Promotion thresholds

Promotion is gated on a run of consecutive correct decisions, with the bar rising at each level. The harness records a correct outcome on a clean run and an error outcome on a failure or open circuit; the trust manager applies the ladder.

Level 1 → 250 consecutive correct decisions
Level 2 → 3100 consecutive correct decisions
Level 3 → 4250 consecutive correct decisions
Level 4 → 5500 consecutive correct decisions

Autonomy levels & approval gates

Each trust level maps to a set of approval gates. The higher the level, the fewer human approvals an agent needs before it can fill forms, submit them, or take destructive actions. The gates are defined centrally in AUTONOMY_GATES.

1 · SupervisedFills: approval · Submits: approval · Destructive: approval
2 · AssistedFills: auto · Submits: approval · Destructive: approval
3 · VerifiedFills: auto · Submits: auto below a value threshold (default $10,000) · Destructive: approval
4 · TrustedFills: auto · Submits: auto · Destructive: approval
5 · AutonomousFills: auto · Submits: auto · Destructive: auto

5 · Autonomous

500 correct to reach

Fills auto · Submits auto · Destructive auto

4 · Trusted

250 correct to reach

Fills auto · Submits auto · Destructive: approval

3 · Verified

100 correct to reach

Fills auto · Submits auto below $10,000 · Destructive: approval

2 · Assisted

50 correct to reach

Fills auto · Submits: approval · Destructive: approval

1 · Supervised

New agents start here

Fills: approval · Submits: approval · Destructive: approval

Destructive actions are always gated — only level 5 acts destructively without a human, and errors are never auto-suppressed.

Figure: the five-rung trust ladder — autonomy rises with each promotion, but destructive actions stay gated until level 5, and error paths always re-enter human review.
Note:At level 3 (Verified) the auto_approve_below gate is set to 10,000: submits auto-approve only under that value; above it, a human still signs off. Levels 4 and 5 carry no value ceiling.

What trust never relaxes

Trust changes how much an agent can do — never what it is allowed to attempt. Two rules hold at every level:

  • Destructive actions require approval at every level except 5. Only a fully Autonomous agent acts destructively without a human, and even then within its scope.
  • Destructive and error paths always re-enter human review. The harness will suppress a triage flag for a trusted agent only when the reason is not destructive and not an error, and only when that level's fill and submit gates are both already auto. Anything flagged destructive or as an error is never auto-suppressed.
Important:A highly trusted agent can move faster, but it can never bypass the destructive gate or silence an error. Promotion buys speed on routine work, not a waiver on the dangerous edges.

How an action gets gated

When an agent proposes an action, the harness checks the run against that agent's current gates. If the relevant gate (fill, submit, or destructive) requires approval — or the proposal exceeds the Verified value threshold — the run is held as a triage item instead of acting. The proposed action, the gate that stopped it, and the supporting context are all presented to the human.

Example · A $12,400 payment gated at Verified

An AP agent at level 3 (Verified) prepares to submit a vendor payment of $12,400. Fills and submits auto-approve at level 3 — but only below the $10,000 threshold. Because $12,400 exceeds it, the submit gate fires: the run is held in triage with the payment detail and the rule ("exceeds Verified auto-approve threshold") shown. A human approves, and the payment posts. The same agent submitting a $400 invoice would have proceeded automatically.

◳ Screenshot

An action being gated: a card showing the agent, its trust level badge (e.g. 'Verified · L3'), the proposed action ('Submit payment $12,400 to Vendor X'), the gate that fired ('Submit · exceeds $10,000 threshold'), and Approve / Reject / Redirect buttons.

The audit log

Every decision — agent and human — is written to an audit log: which agent ran, the model used, the confidence, the gate or rule that fired, who approved or rejected, and the final outcome. Retention follows the plan, from 7 days on Starter up to 365 days on Enterprise. Combined with the per-org isolation of all trust state, the audit log makes every autonomous action reconstructable after the fact.