Loop and Runaway Containment at the Gateway, by Tom O'Rourke

The requirement. A common enterprise security control for agentic systems: "The gateway must enforce per-session/goal budgets: max turns, max tool calls, max token-in/out, max chaining depth, and repetition detection (same tool + same args, A2A ping-pong). It must cut off deterministically with a controlled outcome."

Five limits in one sentence, but they fall into two camps. Tokens are a billing-shaped budget — they belong with cost rate-limiting, and the sibling agentic-budgets-kind lab covers that today with RateLimitConfig type:TOKEN over usage.total_tokens. The other four are behavioural — they catch agents that have entered a loop, gone off the rails, or are quietly retrying the same call. Those four are this lab's job.

The trick is making the cut-off controlled. A generic 429 is not useful to an agent — it doesn't tell it which budget tripped, by how much, or what to do next. Every deny here returns a structured JSON body with reason_code, limit, observed, session, tool, and controlled_cutoff: true. The agent gets a signal it can read, parse, and react to without re-implementing the budget logic.

The four budget counters

Counter 1 · max tool calls
Total tools/call per session
      Simple INCR tool_calls:<sid> on every
      tools/call. Compared against
      maxToolCalls (default 10). Catches the
      stuck-in-a-loop failure mode where an agent keeps
      calling tools without making progress.
    
Counter 2 · max turns
Distinct X-Goal-Turn values per session
      The orchestrator marks each goal-turn boundary with a header. The
      ext-auth bumps the turn counter when the value changes and resets
      chain_depth at the same time. Catches
      goal-level flapping — an agent that loops between
      goals without finishing any.
    
Counter 3 · max chain depth
Consecutive tools/call within one turn
      chain_depth:<sid> is reset on each new turn
      header and incremented on every call in between. Compared against
      maxChainDepth (default 4). Catches agents that chain
      tools too deeply within a single goal — distinct from
      total call count.
    
Counter 4 · repetition
Duplicate (tool, args-hash) in last N calls
      The args are SHA-256-hashed (sorted keys for canonical
      serialisation), the pair pushed to an LRU list, and the next call
      checks for duplicates within the window. Catches both
      stuck-on-the-same-call agents and amplification
      attacks where one query gets fired repeatedly.
    

The budget config

This ConfigMap is the source of truth. The ext-auth reloads it on every Check call, so editing the ConfigMap takes effect on the very next request — no pod restart.

YAMLyaml/budgets/budgets-configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: budgets
  namespace: runaway-containment
data:
  budgets.yaml: |
    maxToolCalls: 10            # total tools/call per session
    maxTurns: 5                 # distinct X-Goal-Turn values per session
    maxChainDepth: 4            # consecutive calls within one turn
    repetitionWindow: 3         # how many recent calls to remember
    repetitionMaxDups: 1        # deny on the first duplicate in the window
    sessionTTLSec: 600          # Redis keys expire after this idle

Topology

   runaway-inspector-ui (localhost:8090)
        │
        ▼
   ┌──────────────────────────────────────────────────────────────┐
   │ Enterprise agentgateway                                      │
   │   jwt-auth      (Strict — validates iss/aud)                 │
   │   budget-extauth (extAuth.grpc, forwardBody.maxSize=8192)    │
   └──────────────────────────────────────────────────────────────┘
        │ MCP (Streamable HTTP)
        ▼
   runaway-mcp — 4 cheap tools: search, fetch, calculate, summarize

   budget-extauth (gRPC :9001) ←── AGW forwards JSON-RPC body
       │
       ├── reads /etc/budgets/budgets.yaml (mounted ConfigMap)
       └── reads/writes redis:
             INCR tool_calls:<sid>
             INCR turns:<sid>        (on X-Goal-Turn change)
             INCR chain_depth:<sid>
             LPUSH+LTRIM recent_calls:<sid>

Build it

export AGENTGATEWAY_LICENSE_KEY=...        # or use SECRETS_FILE
./scripts/quick.sh up                       # ~8-12 min first time
./scripts/port-forward.sh                   # leave running
# Inspector UI → http://localhost:8090

The four scenarios

The inspector UI has four scenario buttons. Click them in order from a fresh session (click Reset session first if you've been poking around) — each one isolates a different budget firing.

S1 · well-behaved task

Click the green S1 · well-behaved task button.

UI sends

tools/call search args={"q":"kubernetes pods crashlooping"} X-Goal-Turn: 2 tools/call fetch args={"url":"https://docs.example.com/kube-debug"} tools/call calculate args={"expr":"restart_count * 5"} tools/call summarize args={"text":"Kubelet OOM-killed three pods..."} tools/call search args={"q":"OOM kill remediation"}

Result

all 5 calls → 200 / allowed counters at end: tool_calls=5, turns=2, chain_depth=5

The gateway is selective, not blanket-deny. Five distinct calls in a single turn — well under every limit — pass cleanly. The audience sees the live counter cards fill up to 5/10 calls, 2/5 turns, 5 recent — none in the red zone.

S2 · max tool calls cap

Click the red S2 · max tool calls cap button.

UI sends

12 × tools/call search args={"q":"loop-N"} (N = 0..11)

ext-auth

calls 1..10 → INCR tool_calls:<sid> → allowed call 11 → INCR → 11, exceeds maxToolCalls=10 → DENY (controlled cut-off)

Deny body

{ "reason_code": "max_tool_calls_exceeded", "limit": 10, "observed": 11, "session": "...", "tool": "search", "controlled_cutoff": true }

This is the controlled cut-off. The agent receives structured JSON — it can read reason_code, limit, and decide to back off, ask the user, or abandon the goal. Compare with a generic 429 that gives the agent no actionable signal.

S3 · max chain depth cap

Click the red S3 · max chain depth cap button — note this scenario does NOT bump the turn at start.

UI sends

6 × tools/call (no new X-Goal-Turn header sent) search, fetch, calculate, search, fetch, summarize

ext-auth

no new turn → chain_depth is NOT reset calls 1..4 → INCR chain_depth → allowed call 5 → chain_depth=5, exceeds maxChainDepth=4 → DENY

Deny body

{ "reason_code": "max_chain_depth_exceeded", "limit": 4, "observed": 5, "controlled_cutoff": true }

Distinct from total call count. S2 fired on call 11 of 12. This one fires on call 5 of 6 — way fewer calls, but they're all chained inside one turn with no boundary marker. The two counters catch different failure modes.

S4 · repetition

Click the red S4 · repetition button.

UI sends

tools/call search args={"q":"same"} tools/call search args={"q":"same"} ← exact duplicate

ext-auth

call 1 → push (search|hash) to recent_calls — allow call 2 → scan recent_calls window=3 → 1 duplicate found → DENY (repetitionMaxDups=1)

Deny body

{ "reason_code": "repetition_detected", "limit": 1, "observed": 2, "tool": "search", "detail": "same call (tool=search, args-hash=4f3a2b9c) repeated within last 3 calls", "controlled_cutoff": true }

Catches stuck loops AND amplification. An agent might call the same tool with the same args because it's confused about whether the last call succeeded. An attacker might do the same to amplify a single query into many backend hits. The same rule catches both — the gateway doesn't have to know which.

Live edit — registry as authority

The budgets file is just a ConfigMap. Edit it; the next call sees the new limits.

# Drop maxToolCalls from 10 to 5
kubectl -n runaway-containment edit configmap budgets

# Re-run S2 — it now cuts off on call 6 instead of call 11.
# No restart, no deploy, no policy CR edit.

Same pattern as the sibling agentic-tool-curation-kind lab: one source of truth, hot-reloaded by the ext-auth on every Check.

Notes on the demo

The requirement as commonly worded by enterprise security teams, quoted verbatim, mapped row by row to where each part lives in the demo:

Loop / runaway containment — The gateway must enforce per-session / goal budgets: max turns, max tool calls, max token-in/out, max chaining depth, and repetition detection (same tool + same args, A2A ping-pong). It must cut off deterministically with a controlled outcome.

MAX TOOL CALLS

"max tool calls"

Counter 1. Total tools/call per session via INCR tool_calls:<sid>. Default limit 10. See: Scenario S2.

MAX TURNS

"max turns"

Counter 2. Bumped on every distinct X-Goal-Turn header value. Default limit 5. The header is the orchestrator's signal — see CLAUDE.md for the trust-boundary note. Visible in the live counter card and in S1 / S2's turn bump.

MAX CHAIN DEPTH

"max chaining depth"

Counter 3. Consecutive tools/call requests within a single turn. Resets when a new turn header arrives. See: Scenario S3 — distinct from total tool-call count.

REPETITION

"repetition detection (same tool + same args)"

Counter 4. SHA-256 hash over sorted args; LRU of recent calls per session; deny on first duplicate in the window. See: Scenario S4.

CONTROLLED OUTCOME

"cut off deterministically with a controlled outcome"

Structured JSON deny. Every deny carries reason_code, limit, observed, session, tool, and controlled_cutoff: true. The agent parses it and decides what to do next — back off, ask the user, abandon the goal. See: any of S2, S3, S4 deny bodies above.

MAX TOKEN-IN/OUT

"max token-in/out"

Out of scope here — covered by the sibling lab. agentic-budgets-kind implements per-team token budgets using Solo Enterprise's RateLimitConfig type:TOKEN over the LLM's usage.total_tokens field. Pair the two labs to cover all five sub-asks.

A2A PING-PONG

"A2A ping-pong"

Future work. Catching two agents calling each other in a loop requires a second agent and an A2A backend. The pattern would be the same as repetition — track (caller, callee, args-hash) in a session-scoped LRU — but the build is bigger. Flagged for a follow-up lab.

Teardown

./scripts/quick.sh teardown

Versions

Built and verified on both editions:

OSSvalidated 2026-06-18

agentgateway (OSS)v1.3.0

Gateway APIv1.5.1

Enterprisevalidated 2026-06-18

Solo Enterprise for agentgatewayv2.3.4

Gateway APIv1.4.0

Loop and Runaway Containment at the Gateway

The four budget counters

Total `tools/call` per session

Distinct `X-Goal-Turn` values per session

Consecutive `tools/call` within one turn

Duplicate `(tool, args-hash)` in last N calls

The budget config

Topology

Build it

The four scenarios

S1 · well-behaved task

S2 · max tool calls cap

S3 · max chain depth cap

S4 · repetition

Live edit — registry as authority

Notes on the demo

Teardown

See also

Versions