MastertheMesh
Solo · agentgateway · MCP · ext-auth · Redis · runaway containment · kind
Built · Runs on kind

Loop and Runaway Containment at the Gateway

max turns + max tool calls + max chain depth + repetition detection · gateway-enforced, deterministic cut-off
TO
Tom O'Rourke
EMEA Field CTO · Solo.io

Four counters per MCP session, kept in Redis, enforced at every tools/call by a tiny gRPC ext-auth wired to Solo Enterprise agentgateway via traffic.extAuth. When any counter hits its limit the gateway returns a structured JSON body with reason_code, limit, observed, and controlled_cutoff: true — a deterministic signal the agent can parse and back off on.

ext-auth · forwardBody · Redis max turns · max tool calls max chain depth · repetition controlled cut-off · structured reason_code kind

The requirement. A common enterprise security control for agentic systems: "The gateway must enforce per-session/goal budgets: max turns, max tool calls, max token-in/out, max chaining depth, and repetition detection (same tool + same args, A2A ping-pong). It must cut off deterministically with a controlled outcome."

Five limits in one sentence, but they fall into two camps. Tokens are a billing-shaped budget — they belong with cost rate-limiting, and the sibling agentic-budgets-kind lab covers that today with RateLimitConfig type:TOKEN over usage.total_tokens. The other four are behavioural — they catch agents that have entered a loop, gone off the rails, or are quietly retrying the same call. Those four are this lab's job.

The trick is making the cut-off controlled. A generic 429 is not useful to an agent — it doesn't tell it which budget tripped, by how much, or what to do next. Every deny here returns a structured JSON body with reason_code, limit, observed, session, tool, and controlled_cutoff: true. The agent gets a signal it can read, parse, and react to without re-implementing the budget logic.

The four budget counters

Counter 1 · max tool calls

Total tools/call per session

Simple INCR tool_calls:<sid> on every tools/call. Compared against maxToolCalls (default 10). Catches the stuck-in-a-loop failure mode where an agent keeps calling tools without making progress.

Counter 2 · max turns

Distinct X-Goal-Turn values per session

The orchestrator marks each goal-turn boundary with a header. The ext-auth bumps the turn counter when the value changes and resets chain_depth at the same time. Catches goal-level flapping — an agent that loops between goals without finishing any.

Counter 3 · max chain depth

Consecutive tools/call within one turn

chain_depth:<sid> is reset on each new turn header and incremented on every call in between. Compared against maxChainDepth (default 4). Catches agents that chain tools too deeply within a single goal — distinct from total call count.

Counter 4 · repetition

Duplicate (tool, args-hash) in last N calls

The args are SHA-256-hashed (sorted keys for canonical serialisation), the pair pushed to an LRU list, and the next call checks for duplicates within the window. Catches both stuck-on-the-same-call agents and amplification attacks where one query gets fired repeatedly.

The budget config

This ConfigMap is the source of truth. The ext-auth reloads it on every Check call, so editing the ConfigMap takes effect on the very next request — no pod restart.

YAMLyaml/budgets/budgets-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: budgets
  namespace: runaway-containment
data:
  budgets.yaml: |
    maxToolCalls: 10            # total tools/call per session
    maxTurns: 5                 # distinct X-Goal-Turn values per session
    maxChainDepth: 4            # consecutive calls within one turn
    repetitionWindow: 3         # how many recent calls to remember
    repetitionMaxDups: 1        # deny on the first duplicate in the window
    sessionTTLSec: 600          # Redis keys expire after this idle

Topology

   runaway-inspector-ui (localhost:8090)
        │
        ▼
   ┌──────────────────────────────────────────────────────────────┐
   │ Enterprise agentgateway                                      │
   │   jwt-auth      (Strict — validates iss/aud)                 │
   │   budget-extauth (extAuth.grpc, forwardBody.maxSize=8192)    │
   └──────────────────────────────────────────────────────────────┘
        │ MCP (Streamable HTTP)
        ▼
   runaway-mcp — 4 cheap tools: search, fetch, calculate, summarize

   budget-extauth (gRPC :9001) ←── AGW forwards JSON-RPC body
       │
       ├── reads /etc/budgets/budgets.yaml (mounted ConfigMap)
       └── reads/writes redis:
             INCR tool_calls:<sid>
             INCR turns:<sid>        (on X-Goal-Turn change)
             INCR chain_depth:<sid>
             LPUSH+LTRIM recent_calls:<sid>

Build it

export AGENTGATEWAY_LICENSE_KEY=...        # or use SECRETS_FILE
./scripts/quick.sh up                       # ~8-12 min first time
./scripts/port-forward.sh                   # leave running
# Inspector UI → http://localhost:8090

The four scenarios

The inspector UI has four scenario buttons. Click them in order from a fresh session (click Reset session first if you've been poking around) — each one isolates a different budget firing.

S1 · well-behaved task

Click the green S1 · well-behaved task button.

UI sends
tools/call search args={"q":"kubernetes pods crashlooping"} X-Goal-Turn: 2 tools/call fetch args={"url":"https://docs.example.com/kube-debug"} tools/call calculate args={"expr":"restart_count * 5"} tools/call summarize args={"text":"Kubelet OOM-killed three pods..."} tools/call search args={"q":"OOM kill remediation"}
Result
all 5 calls → 200 / allowed counters at end: tool_calls=5, turns=2, chain_depth=5
The gateway is selective, not blanket-deny. Five distinct calls in a single turn — well under every limit — pass cleanly. The audience sees the live counter cards fill up to 5/10 calls, 2/5 turns, 5 recent — none in the red zone.

S2 · max tool calls cap

Click the red S2 · max tool calls cap button.

UI sends
12 × tools/call search args={"q":"loop-N"} (N = 0..11)
ext-auth
calls 1..10 → INCR tool_calls:<sid> → allowed call 11 → INCR → 11, exceeds maxToolCalls=10 → DENY (controlled cut-off)
Deny body
{ "reason_code": "max_tool_calls_exceeded", "limit": 10, "observed": 11, "session": "...", "tool": "search", "controlled_cutoff": true }
This is the controlled cut-off. The agent receives structured JSON — it can read reason_code, limit, and decide to back off, ask the user, or abandon the goal. Compare with a generic 429 that gives the agent no actionable signal.

S3 · max chain depth cap

Click the red S3 · max chain depth cap button — note this scenario does NOT bump the turn at start.

UI sends
6 × tools/call (no new X-Goal-Turn header sent) search, fetch, calculate, search, fetch, summarize
ext-auth
no new turn → chain_depth is NOT reset calls 1..4 → INCR chain_depth → allowed call 5 → chain_depth=5, exceeds maxChainDepth=4 → DENY
Deny body
{ "reason_code": "max_chain_depth_exceeded", "limit": 4, "observed": 5, "controlled_cutoff": true }
Distinct from total call count. S2 fired on call 11 of 12. This one fires on call 5 of 6 — way fewer calls, but they're all chained inside one turn with no boundary marker. The two counters catch different failure modes.

S4 · repetition

Click the red S4 · repetition button.

UI sends
tools/call search args={"q":"same"} tools/call search args={"q":"same"} ← exact duplicate
ext-auth
call 1 → push (search|hash) to recent_calls — allow call 2 → scan recent_calls window=3 → 1 duplicate found → DENY (repetitionMaxDups=1)
Deny body
{ "reason_code": "repetition_detected", "limit": 1, "observed": 2, "tool": "search", "detail": "same call (tool=search, args-hash=4f3a2b9c) repeated within last 3 calls", "controlled_cutoff": true }
Catches stuck loops AND amplification. An agent might call the same tool with the same args because it's confused about whether the last call succeeded. An attacker might do the same to amplify a single query into many backend hits. The same rule catches both — the gateway doesn't have to know which.

Live edit — registry as authority

The budgets file is just a ConfigMap. Edit it; the next call sees the new limits.

# Drop maxToolCalls from 10 to 5
kubectl -n runaway-containment edit configmap budgets

# Re-run S2 — it now cuts off on call 6 instead of call 11.
# No restart, no deploy, no policy CR edit.

Same pattern as the sibling agentic-tool-curation-kind lab: one source of truth, hot-reloaded by the ext-auth on every Check.

Notes on the demo

The requirement as commonly worded by enterprise security teams, quoted verbatim, mapped row by row to where each part lives in the demo:

Loop / runaway containment — The gateway must enforce per-session / goal budgets: max turns, max tool calls, max token-in/out, max chaining depth, and repetition detection (same tool + same args, A2A ping-pong). It must cut off deterministically with a controlled outcome.
MAX TOOL CALLS
"max tool calls"
Counter 1. Total tools/call per session via INCR tool_calls:<sid>. Default limit 10. See: Scenario S2.
MAX TURNS
"max turns"
Counter 2. Bumped on every distinct X-Goal-Turn header value. Default limit 5. The header is the orchestrator's signal — see CLAUDE.md for the trust-boundary note. Visible in the live counter card and in S1 / S2's turn bump.
MAX CHAIN DEPTH
"max chaining depth"
Counter 3. Consecutive tools/call requests within a single turn. Resets when a new turn header arrives. See: Scenario S3 — distinct from total tool-call count.
REPETITION
"repetition detection (same tool + same args)"
Counter 4. SHA-256 hash over sorted args; LRU of recent calls per session; deny on first duplicate in the window. See: Scenario S4.
CONTROLLED OUTCOME
"cut off deterministically with a controlled outcome"
Structured JSON deny. Every deny carries reason_code, limit, observed, session, tool, and controlled_cutoff: true. The agent parses it and decides what to do next — back off, ask the user, abandon the goal. See: any of S2, S3, S4 deny bodies above.
MAX TOKEN-IN/OUT
"max token-in/out"
Out of scope here — covered by the sibling lab. agentic-budgets-kind implements per-team token budgets using Solo Enterprise's RateLimitConfig type:TOKEN over the LLM's usage.total_tokens field. Pair the two labs to cover all five sub-asks.
A2A PING-PONG
"A2A ping-pong"
Future work. Catching two agents calling each other in a loop requires a second agent and an A2A backend. The pattern would be the same as repetition — track (caller, callee, args-hash) in a session-scoped LRU — but the build is bigger. Flagged for a follow-up lab.

Teardown

./scripts/quick.sh teardown

See also

Versions

Built and verified on both editions:

OSS
agentgateway (OSS)v1.3.0
Gateway APIv1.5.1
Enterprise
Solo Enterprise for agentgatewayv2.3.4
Gateway APIv1.4.0