System Manual
Trust & Governance
How autonomy is earned and lost: the five trust levels, the approval gates each one unlocks, the rules that never relax, and the audit trail behind every decision.
Autonomy is earned, not granted
Each agent carries a trust profile per organization, scored on a 1–5 ladder. New agents start at level 1 (Supervised). A streak of correct decisions promotes an agent up the ladder; a human override knocks its progress back, and an error drops it a level outright. Trust is the dial that decides how much an agent can do without asking — and it is per-org, so an agent earns its standing separately in every workspace.
Promotion thresholds
Promotion is gated on a run of consecutive correct decisions, with the bar rising at each level. The harness records a correct outcome on a clean run and an error outcome on a failure or open circuit; the trust manager applies the ladder.
| Level 1 → 2 | 50 consecutive correct decisions |
| Level 2 → 3 | 100 consecutive correct decisions |
| Level 3 → 4 | 250 consecutive correct decisions |
| Level 4 → 5 | 500 consecutive correct decisions |
Autonomy levels & approval gates
Each trust level maps to a set of approval gates. The higher the level, the fewer human approvals an agent needs before it can fill forms, submit them, or take destructive actions. The gates are defined centrally in AUTONOMY_GATES.
| 1 · Supervised | Fills: approval · Submits: approval · Destructive: approval |
| 2 · Assisted | Fills: auto · Submits: approval · Destructive: approval |
| 3 · Verified | Fills: auto · Submits: auto below a value threshold (default $10,000) · Destructive: approval |
| 4 · Trusted | Fills: auto · Submits: auto · Destructive: approval |
| 5 · Autonomous | Fills: auto · Submits: auto · Destructive: auto |
5 · Autonomous
500 correct to reach
Fills auto · Submits auto · Destructive auto
4 · Trusted
250 correct to reach
Fills auto · Submits auto · Destructive: approval
3 · Verified
100 correct to reach
Fills auto · Submits auto below $10,000 · Destructive: approval
2 · Assisted
50 correct to reach
Fills auto · Submits: approval · Destructive: approval
1 · Supervised
New agents start here
Fills: approval · Submits: approval · Destructive: approval
Destructive actions are always gated — only level 5 acts destructively without a human, and errors are never auto-suppressed.
auto_approve_below gate is set to 10,000: submits auto-approve only under that value; above it, a human still signs off. Levels 4 and 5 carry no value ceiling.What trust never relaxes
Trust changes how much an agent can do — never what it is allowed to attempt. Two rules hold at every level:
- Destructive actions require approval at every level except 5. Only a fully Autonomous agent acts destructively without a human, and even then within its scope.
- Destructive and error paths always re-enter human review. The harness will suppress a triage flag for a trusted agent only when the reason is not destructive and not an error, and only when that level's fill and submit gates are both already auto. Anything flagged destructive or as an error is never auto-suppressed.
How an action gets gated
When an agent proposes an action, the harness checks the run against that agent's current gates. If the relevant gate (fill, submit, or destructive) requires approval — or the proposal exceeds the Verified value threshold — the run is held as a triage item instead of acting. The proposed action, the gate that stopped it, and the supporting context are all presented to the human.
Example · A $12,400 payment gated at Verified
An AP agent at level 3 (Verified) prepares to submit a vendor payment of $12,400. Fills and submits auto-approve at level 3 — but only below the $10,000 threshold. Because $12,400 exceeds it, the submit gate fires: the run is held in triage with the payment detail and the rule ("exceeds Verified auto-approve threshold") shown. A human approves, and the payment posts. The same agent submitting a $400 invoice would have proceeded automatically.
◳ Screenshot
An action being gated: a card showing the agent, its trust level badge (e.g. 'Verified · L3'), the proposed action ('Submit payment $12,400 to Vendor X'), the gate that fired ('Submit · exceeds $10,000 threshold'), and Approve / Reject / Redirect buttons.
The audit log
Every decision — agent and human — is written to an audit log: which agent ran, the model used, the confidence, the gate or rule that fired, who approved or rejected, and the final outcome. Retention follows the plan, from 7 days on Starter up to 365 days on Enterprise. Combined with the per-org isolation of all trust state, the audit log makes every autonomous action reconstructable after the fact.