The requirement. A common enterprise security control for agentic systems: "The gateway must enforce per-session/goal budgets: max turns, max tool calls, max token-in/out, max chaining depth, and repetition detection (same tool + same args, A2A ping-pong). It must cut off deterministically with a controlled outcome."
Five limits in one sentence, but they fall into two camps. Tokens
are a billing-shaped budget — they belong with cost rate-limiting, and
the sibling agentic-budgets-kind
lab covers that today with RateLimitConfig type:TOKEN over
usage.total_tokens. The other four are
behavioural — they catch agents that have entered a loop, gone off
the rails, or are quietly retrying the same call. Those four are this
lab's job.
The trick is making the cut-off controlled. A generic 429 is
not useful to an agent — it doesn't tell it which budget tripped, by
how much, or what to do next. Every deny here returns a structured
JSON body with reason_code, limit,
observed, session, tool, and
controlled_cutoff: true. The agent gets a signal it can
read, parse, and react to without re-implementing the budget logic.
The four budget counters
Counter 1 · max tool calls
Total tools/call per session
Simple INCR tool_calls:<sid> on every
tools/call. Compared against
maxToolCalls (default 10). Catches the
stuck-in-a-loop failure mode where an agent keeps
calling tools without making progress.
Counter 2 · max turns
Distinct X-Goal-Turn values per session
The orchestrator marks each goal-turn boundary with a header. The
ext-auth bumps the turn counter when the value changes and resets
chain_depth at the same time. Catches
goal-level flapping — an agent that loops between
goals without finishing any.
Counter 3 · max chain depth
Consecutive tools/call within one turn
chain_depth:<sid> is reset on each new turn
header and incremented on every call in between. Compared against
maxChainDepth (default 4). Catches agents that chain
tools too deeply within a single goal — distinct from
total call count.
Counter 4 · repetition
Duplicate (tool, args-hash) in last N calls
The args are SHA-256-hashed (sorted keys for canonical serialisation), the pair pushed to an LRU list, and the next call checks for duplicates within the window. Catches both stuck-on-the-same-call agents and amplification attacks where one query gets fired repeatedly.
The budget config
This ConfigMap is the source of truth. The ext-auth reloads it on every
Check call, so editing the ConfigMap takes effect on the
very next request — no pod restart.
YAMLyaml/budgets/budgets-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: budgets
namespace: runaway-containment
data:
budgets.yaml: |
maxToolCalls: 10 # total tools/call per session
maxTurns: 5 # distinct X-Goal-Turn values per session
maxChainDepth: 4 # consecutive calls within one turn
repetitionWindow: 3 # how many recent calls to remember
repetitionMaxDups: 1 # deny on the first duplicate in the window
sessionTTLSec: 600 # Redis keys expire after this idle
Topology
runaway-inspector-ui (localhost:8090)
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Enterprise agentgateway │
│ jwt-auth (Strict — validates iss/aud) │
│ budget-extauth (extAuth.grpc, forwardBody.maxSize=8192) │
└──────────────────────────────────────────────────────────────┘
│ MCP (Streamable HTTP)
▼
runaway-mcp — 4 cheap tools: search, fetch, calculate, summarize
budget-extauth (gRPC :9001) ←── AGW forwards JSON-RPC body
│
├── reads /etc/budgets/budgets.yaml (mounted ConfigMap)
└── reads/writes redis:
INCR tool_calls:<sid>
INCR turns:<sid> (on X-Goal-Turn change)
INCR chain_depth:<sid>
LPUSH+LTRIM recent_calls:<sid>
Build it
export AGENTGATEWAY_LICENSE_KEY=... # or use SECRETS_FILE
./scripts/quick.sh up # ~8-12 min first time
./scripts/port-forward.sh # leave running
# Inspector UI → http://localhost:8090
The four scenarios
The inspector UI has four scenario buttons. Click them in order from a fresh session (click Reset session first if you've been poking around) — each one isolates a different budget firing.
S1 · well-behaved task
S2 · max tool calls cap
reason_code,
limit, and decide to back off, ask the user, or abandon
the goal. Compare with a generic 429 that gives the agent no
actionable signal.
S3 · max chain depth cap
S4 · repetition
Live edit — registry as authority
The budgets file is just a ConfigMap. Edit it; the next call sees the new limits.
# Drop maxToolCalls from 10 to 5
kubectl -n runaway-containment edit configmap budgets
# Re-run S2 — it now cuts off on call 6 instead of call 11.
# No restart, no deploy, no policy CR edit.
Same pattern as the sibling agentic-tool-curation-kind lab: one source of truth, hot-reloaded by the ext-auth on every Check.
Notes on the demo
The requirement as commonly worded by enterprise security teams, quoted verbatim, mapped row by row to where each part lives in the demo:
Loop / runaway containment — The gateway must enforce per-session / goal budgets: max turns, max tool calls, max token-in/out, max chaining depth, and repetition detection (same tool + same args, A2A ping-pong). It must cut off deterministically with a controlled outcome.
tools/call per
session via INCR tool_calls:<sid>. Default
limit 10.
See: Scenario S2.
X-Goal-Turn header value. Default limit 5. The header
is the orchestrator's signal — see CLAUDE.md for the trust-boundary
note.
Visible in the live counter card and in
S1 / S2's turn bump.
tools/call requests within a single turn. Resets when
a new turn header arrives.
See: Scenario S3 —
distinct from total tool-call count.
reason_code, limit, observed,
session, tool, and
controlled_cutoff: true. The agent parses it and
decides what to do next — back off, ask the user, abandon the
goal.
See: any of S2, S3, S4 deny bodies above.
RateLimitConfig type:TOKEN over the LLM's
usage.total_tokens field. Pair the two labs to cover
all five sub-asks.
(caller, callee, args-hash) in a session-scoped LRU
— but the build is bigger. Flagged for a follow-up lab.
Teardown
./scripts/quick.sh teardown
See also
- Sibling — agentic-budgets-kind — covers the token-in/out half of this requirement with per-team Solo Enterprise rate limits.
- Sibling — agentic-tool-curation-kind — same ext-auth + Redis pattern, different policy (curated tool allow-list).
- Reference — securing-mcp-agentic-systems — field guide mapping common agentic security controls to OWASP Agentic / NIST AI RMF / MITRE ATLAS.
Versions
Built and verified on both editions:
v1.3.0v1.5.1v2.3.4v1.4.0