Point an MCP client at a big API and it gets a big list of tools, one per
operation. The whole catalogue is pasted into the model's context every turn,
and a job that touches five operations becomes five separate tool calls with the
model copying data between them by hand. Code mode turns that
around. The same OpenAPI backend is exposed as a single run_code
tool whose description is a generated TypeScript API. The client (the
model or agent calling the gateway, not the end user) writes one small JavaScript
program against that API, the gateway runs it in a sandbox, makes the upstream
REST calls, and returns only what the program decided to return. The end user
just asks a question in natural language; the model writes the code. This lab puts a
petstore behind agentgateway in toolMode: Code on kind and shows the
whole thing live: the single tool and its generated API, a raw
run_code call you drive by hand, and Claude reading the API and
writing the JavaScript itself.
The problem code mode solves
An MCP server that fronts a real API exposes a tool per operation. That is fine for three tools and painful for thirty: every tool schema is in the model's context on every turn, and a task that lists, filters, looks up detail and aggregates becomes a back-and-forth of one tool call per step, with every intermediate result making the full round trip into the model and back out again. The model ends up being the glue code, paying tokens to shuttle JSON it never needed to see.
agentgateway exposes an MCP backend in one of three toolModes. The
same petstore OpenAPI looks completely different to the client depending on which
one you pick:
default
Every operation is its own tool: addPet,
findPetsByStatus, getPetById, deletePet.
Simple, but the whole catalogue sits in context and each step is a round trip.
progressive disclosure
The client gets get_tool and invoke_tool instead of
the full list, and discovers operations on demand. Keeps context small when the
tool count is large.
run_code
One tool, run_code. Its description is a generated TypeScript
API, one async function per operation. The client writes JavaScript; the
gateway runs it and makes the calls. This lab.
The flow
Standard mode would put all nineteen of the petstore's operations in the client's context as separate tools and make the client orchestrate a round trip per step. Code mode sends one program, fans the REST calls out inside the gateway, and returns only the answer.
The setup
Four objects on a kind cluster running Solo Enterprise for agentgateway. The
OpenAPI document goes in a ConfigMap; the backend turns it into MCP
tools and collapses them with toolMode: Code; a Gateway and an
HTTPRoute expose the MCP endpoint at /mcp.
yamlyaml/backend.yaml — the code-mode backend
apiVersion: enterpriseagentgateway.solo.io/v1alpha1
kind: EnterpriseAgentgatewayBackend
metadata:
name: petstore-codemode
namespace: agentgateway-system
spec:
entMcp:
toolMode: Code # one run_code tool instead of one tool per operation
codeMode:
timeout: 10s # how long a single run_code program may run
sessionRouting: Stateless
failureMode: FailClosed
targets:
- name: petstore
static:
host: petstore3.swagger.io
port: 443
protocol: OpenAPI
policies:
tls: {} # the petstore is HTTPS; without this every call 400s
openAPI:
schemaRef:
name: petstore-openapi # ConfigMap built from the published spec
The backend's schemaRef points at a ConfigMap whose
data.schema key holds the API's OpenAPI 3.0 document. You do not
write that document. The API publishes its own, and you load the published spec
into the ConfigMap as-is. The petstore serves its at
/api/v3/openapi.json, so the whole step is one command:
bashbuild the ConfigMap from the published spec
kubectl create configmap petstore-openapi -n agentgateway-system \
--from-file=schema=<(curl -s https://petstore3.swagger.io/api/v3/openapi.json) \
--dry-run=client -o yaml | kubectl apply -f -
This is config-time setup, owned by whoever owns the gateway config (a platform
team, or the API's owner), on the same lifecycle as the Backend and the Route,
and it belongs in git / GitOps. The MCP client never sees it.
For an internal API the spec usually comes straight from the framework that
serves it (a /openapi.json on a FastAPI or Spring service, say), and
a pipeline loads each published version into the ConfigMap; for a third-party API
you take the vendor's published spec. Nobody hand-edits the JSON.
03-backend-route.sh runs exactly the command above, falling back to a
pinned yaml/petstore-openapi.json when the URL is unreachable
(airgap).
Every operation in the spec becomes one function in the generated API. The
petstore's published spec has nineteen, so a standard-mode
client would see nineteen separate tools; code mode turns all of them into the
single run_code tool. Here is a trimmed look at the document that
lands in data.schema:
jsonexcerpt of the published petstore openapi.json
{
"openapi": "3.0.4",
"info": { "title": "Swagger Petstore - OpenAPI 3.0", "version": "1.0.27" },
"servers": [ { "url": "/api/v3" } ],
"paths": {
"/pet/findByStatus": {
"get": {
"operationId": "findPetsByStatus",
"parameters": [
{ "name": "status", "in": "query",
"schema": { "type": "string", "default": "available",
"enum": ["available", "pending", "sold"] } }
]
}
},
"/pet/{petId}": { "get": { "operationId": "getPetById" } }
// ... 17 more operations: pets, store orders, users ...
},
"components": { "schemas": { "Pet": {}, "Order": {}, "User": {} } }
}
yamlyaml/gateway.yaml + yaml/httproute.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: code-mode-gateway
namespace: agentgateway-system
spec:
gatewayClassName: enterprise-agentgateway
listeners:
- name: http
protocol: HTTP
port: 80
allowedRoutes: { namespaces: { from: All } }
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: petstore-mcp
namespace: agentgateway-system
spec:
parentRefs:
- name: code-mode-gateway
rules:
- matches:
- path: { type: PathPrefix, value: /mcp }
backendRefs:
- group: enterpriseagentgateway.solo.io
kind: EnterpriseAgentgatewayBackend
name: petstore-codemode
What the client actually sees
List the tools on the MCP endpoint and there is exactly one, even though the spec has nineteen operations. The tool's description is the contract: the rules for writing the JavaScript, a couple of worked examples, and then the Available API, which is the whole petstore turned into typed async functions. This is what a model reads before it writes anything.
text./scripts/show-tools.sh — captured live
The gateway exposes 1 MCP tool(s):
• run_code
input: { code }
run_code description — the generated TypeScript API the client writes against:
Execute code to achieve a goal.
Write JavaScript. The code runs as a top-level script, not inside a function.
Top-level await is available. The final expression becomes the result.
Do not use `return` at top level. ...
Available API:
```js
// Add a new pet to the store.
// type Input = { body: { id?: number, name: string, category?: { id?: number, name?: string }, photoUrls: Array<string>, tags?: Array<({ id?: number, name?: string })>, status?: "available" | "pending" | "sold" } }
async function addPet(input: Input): Promise<unknown>;
// Multiple status values can be provided with comma separated strings.
// type Input = { query: { status: "available" | "pending" | "sold" } }
async function findPetsByStatus(input: Input): Promise<unknown>;
// Returns a single pet.
// type Input = { path: { petId: number } }
async function getPetById(input: Input): Promise<unknown>;
// ... 16 more: findPetsByTags, updatePet, deletePet, getInventory,
// placeOrder, getOrderById, createUser, loginUser, getUserByName, ...
```
Two things to notice. The parameters are grouped by where they live in the HTTP
request, so a path parameter is getPetById({ path: { petId } }) and a
body is addPet({ body: { … } }). And the enums survive the round trip
from OpenAPI into TypeScript, so the model knows status is one of
three strings without guessing. All nineteen functions are in this one tool's
description; none of them is a separate tool in the client's context.
Calling run_code directly (no model)
run_code tool
receives and returns with nothing else in the way. It's a plumbing check, not the
customer experience.
run-code.sh sends a JavaScript program as the tool's code
argument. The program below lists the available pets, groups them by category,
fetches detail for the first few in parallel, and returns a small summary. Every
await is a REST call the gateway makes to the petstore; the counting
and shaping happen inside the sandbox, so only the summary comes back.
javascriptthe program sent to run_code
// (OpenAPI list responses come back wrapped as { data: [...] } - unwrap it.)
const res = await findPetsByStatus({ query: { status: "available" } });
const pets = res.data ?? res;
// Fetch full detail for the first few, in parallel, in the same call.
const sample = pets.filter((p) => Number.isSafeInteger(p.id)).slice(0, 3);
const detailed = await Promise.all(sample.map((p) => getPetById({ path: { petId: p.id } })));
({
availableCount: pets.length,
byCategory: pets.reduce((acc, p) => {
const c = (p.category && p.category.name) || "uncategorised";
acc[c] = (acc[c] || 0) + 1;
return acc;
}, {}),
sampleDetail: detailed.map((d) => {
const p = d.data ?? d;
return { id: p.id, name: p.name, category: (p.category && p.category.name) || null };
}),
})
jsonrun_code returned — captured live
{
"success": {
"availableCount": 134,
"byCategory": {
"Dogs": 31,
"uncategorised": 91,
"Cats": 1,
"gen": 1
},
"sampleDetail": [
{ "id": 4334, "name": "Biscuit", "category": "Dogs" },
{ "id": 295, "name": "dens", "category": null },
{ "id": 233, "name": "dog", "category": null }
]
}
}
134 pets and three detail lookups went out from the gateway; one short object came
back. A standard-mode client would have pulled the whole 134-pet array into the
model's context just to count it. run_code always answers with
{ "success": … } or { "error": { "message": … } }, so a
caller can branch on which key is present.
Letting Claude drive it
Now hand the whole thing to a model. You run one command with a question in natural language:
bashwhat you type
./scripts/ask-llm.sh "How many pets are available, broken down by category?
Show me three example available pets with their category."
and you get one answer back:
textwhat you get back (captured live, claude-sonnet-4-6)
Available pets by category:
Uncategorized 91
Dogs 31
狗 (Dogs, zh) 8
Cats 1
…
Total 134
Three examples: Biscuit (Dogs), dens (Uncategorized), zcqAtJBiMX (tcwLeEooaR).
狗 and
犬类 are just "dog" and "canines" written in Chinese by some other
tester, and a pet named zcqAtJBiMX in a category called
tcwLeEooaR is random junk someone left behind. The three examples
line reads as name (category): Biscuit is in the
Dogs category, dens has no category, and
zcqAtJBiMX is the junk one. That mess is the point: the model
filtered and grouped it inside the gateway and handed back a clean summary,
instead of dumping 134 raw records for you to sort out.
That is the whole experience for whoever asks: one question, one answer. They
never see run_code, the JavaScript, or the petstore. Everything below
is what happened in between, which the script also prints so you can watch it.
What happens in between
ask-llm.sh gives Claude exactly one tool, run_code, with
the generated API as its description, and lets it work in a loop. Each
step is the same exchange: Claude writes a small JavaScript
program and calls run_code with it, the gateway runs the program and
returns the result, and Claude reads that result and decides what to do next. It
repeats until it can answer, then writes the natural-language reply above. The generated
API tells it to inspect an unfamiliar response before trusting it, so here the
first few steps are Claude probing the shape, and the last is the real program.
Step 1 · Claude → run_code assumes a plain array
const pets = await findPetsByStatus({ query: { status: "available" } });
pets.slice(0, 3).map(p => ({ id: p.id, name: p.name, category: p.category }));Step 1 · run_code → Claude the result is not an array
{"error":{"message":"Error: not a function\n at <eval> (eval_script:3:6)"}}Step 2 · Claude → run_code tries the full program anyway
const pets = await findPetsByStatus({ query: { status: "available" } });
const categoryCount = {};
for (const pet of pets) { /* ...group by category... */ }
({ total: pets.length, categoryCount });Step 2 · run_code → Claude still wrong: pets is not iterable
{"error":{"message":"Error: value is not iterable\n at <eval> (eval_script:2:20)"}}Step 3 · Claude → run_code stops guessing and inspects the shape
const response = await findPetsByStatus({ query: { status: "available" } });
JSON.stringify(response).slice(0, 500);Step 3 · run_code → Claude the list is wrapped in { data: [...] }
{"success":"{\"data\":[{\"id\":4334,\"category\":{\"name\":\"Dogs\"},\"name\":\"Biscuit\", ..."}Step 4 · Claude → run_code now the real program: unwrap, group, sample
const response = await findPetsByStatus({ query: { status: "available" } });
const pets = response.data; // <- the fix it just learned
const categoryCount = {};
for (const pet of pets) {
const c = pet.category?.name || "Uncategorized";
categoryCount[c] = (categoryCount[c] || 0) + 1;
}
const sorted = Object.entries(categoryCount)
.sort((a, b) => b[1] - a[1])
.map(([category, count]) => ({ category, count }));
const examples = pets.slice(0, 3).map(p => ({
name: p.name, category: p.category?.name || "Uncategorized",
}));
({ total: pets.length, sorted, examples });Step 4 · run_code → Claude one small summary (Claude turns this into the answer above)
{"success":{"total":134,
"sorted":[{"category":"Uncategorized","count":91},{"category":"Dogs","count":31},
{"category":"狗","count":8},{"category":"Cats","count":1}, ...],
"examples":[{"name":"Biscuit","category":"Dogs"},
{"name":"dens","category":"Uncategorized"}, ...]}}
Four steps, each one run_code call, and the heavy data never left the
gateway: the 134-pet list was counted and grouped inside the sandbox, and what
crossed into the model was a 500-character sample to learn the shape and then the
small summary. In standard mode the same task is a round trip per
list-then-detail step with the full array landing in the model's context each
time. The wrong guesses in steps 1 and 2 are the honest part: the model recovers
in the same loop, because each result comes straight back to it.
How it runs the code
The JavaScript runs in a sandbox inside the gateway, not in the client and not in
the petstore. A program is a top-level script: top-level await is
available, the final expression becomes the result, and there is no top-level
return. The functions in the generated API are the only way out to
the network; a program cannot reach anything the backend did not expose. Each run
is bounded by codeMode.timeout from the backend spec, with a memory
ceiling and a cap on how many upstream calls one program may make, so a runaway
or abusive program fails closed instead of hammering the upstream.
Where the program actually executes: only in the gateway's sandbox. The client writes it and the petstore serves the REST calls, but neither runs the code — so a program can reach nothing the backend did not expose.
Each call is compiled and run in a fresh sandbox: the gateway keeps no cache of
programs and no memory of the last one, so the code is generated dynamically every
time (any reuse would be the client's own doing). The gateway does not log the
program itself, but the generated code is visible at the client where it's
written, for example ask-llm.sh prints every program the model sends.
petstore3.swagger.io is a
shared demo and its write path is flaky (addPet was returning
500 while this was captured), so the lab leans on the read and
aggregate operations, which is where code mode earns its keep anyway. Two real
details show through and are worth keeping: the upstream is HTTPS, so the target
needs policies.tls or every call returns 400; and the
OpenAPI list response comes back wrapped as { data: [...] }, which is
exactly the kind of shape the model is told to inspect before trusting.
Run it yourself
You need Docker, kind, kubectl, helm and
uv (for the Python MCP client), a Solo Enterprise for agentgateway
license, and an ANTHROPIC_API_KEY for the Claude step. There are two
ways to drive the tool, and only the second is what a real user does:
run-code.sh lets you hand a JavaScript program to the tool to
see the plumbing, and ask-llm.sh is the real flow where you ask in
natural language and the model writes and runs the JavaScript for you.
bashquickstart
export AGENTGATEWAY_LICENSE_KEY=... # Solo Enterprise for agentgateway
export ANTHROPIC_API_KEY=... # for ask-llm.sh
# bring up kind + agentgateway + the code-mode backend
./scripts/quick.sh up
# what an MCP client sees: one run_code tool + its generated TypeScript API
./scripts/show-tools.sh
# plumbing check (no model): YOU hand a JS program to the tool, get a summary back
./scripts/run-code.sh
./scripts/run-code.sh 'const r = await findPetsByStatus({ query: { status: "sold" } }); (r.data ?? r).length'
# the real flow: you ask in natural language, the MODEL writes + runs the JavaScript
./scripts/ask-llm.sh "which categories have the most available pets?"
./scripts/quick.sh teardown
Observing it
From the operator's side the gateway's logs show the call coming in and the REST
calls going out, with more detail as you turn the level up. At the default
info level the access log already records every inbound
run_code call:
textaccess log (info) — the inbound run_code call
request route=agentgateway-system/petstore-mcp http.path=/mcp http.status=200
protocol=mcp mcp.method.name=tools/call mcp.target=code_mode
gen_ai.tool.name=run_code mcp.session.id=… duration=956ms
That tells you run_code ran and how long it took, but not the calls it
fanned out to the petstore. Turn the data plane up to debug at runtime
through its admin endpoint (no restart) and each upstream REST call the sandbox
makes shows up as its own line. observe.sh does the port-forward, sets
the level, tails the logs, and resets to info when you stop it:
bashturn the level up and watch
# one terminal: raise the level and tail (Ctrl-C resets it to info)
./scripts/observe.sh debug
# another terminal: make a call
./scripts/run-code.sh
# or by hand against the admin endpoint:
kubectl -n agentgateway-system port-forward <gateway-pod> 15900:15000 &
curl -X POST "http://127.0.0.1:15900/logging?level=debug" # reset with level=info
textdebug — the call from the gateway to the petstore (one per await)
upstream request target=petstore3.swagger.io:443 endpoint=32.196.215.190:443
transport=tls http.method=GET http.host=petstore3.swagger.io
http.path=/api/v3/pet/findByStatus http.version=HTTP/2.0 http.status=200 duration=785ms
At trace you get the full outbound request, query string and headers
included (uri: …/api/v3/pet/findByStatus?status=pending). The access
log also carries trace.id / span.id, so with
OpenTelemetry enabled (the gateway reads the standard OTEL_* env vars)
one run_code call becomes a single trace with its petstore calls as
child spans, and Prometheus metrics are exposed on the pod's metrics port.
run-code.sh and
ask-llm.sh already print. So the operator sees that
run_code ran and every REST call it caused, but not the code itself.
Extending it
- See the contrast. Apply
yaml/backend-standard.yaml(the same petstore with notoolMode) and re-runshow-tools.shagainst it: four separate tools instead of onerun_code. - Swap the upstream. The backend is just an OpenAPI target.
Point
schemaRefat a different document and the generated API changes to match, no client changes needed. - Add a real MCP server. A target can speak
StreamableHTTPinstead ofOpenAPI; code mode then wraps that server's tools as the generated functions. - Put policy in front of it. The MCP endpoint is an ordinary
HTTPRoute, so JWT auth, rate limits and the rest of the agentgateway policy
surface apply to
run_codethe same as any route.
See also
- The Model Context Protocol
- The OpenAPI Specification
- Sibling lab — agent-to-agent delegation in kagent, captured live
Versions
Built and verified on:
v2026.5.2v1.4.0