The ask. "We want our team to use Claude Code, but the model has to be one we run, not Anthropic's. Our model is served behind an OpenAI-style API. We also want the gateway to hold the credentials, decide who is allowed to call it, and give us per-request visibility, rather than handing the raw key to every developer."
This is the same idea as pointing Claude Code at a non-Anthropic model the way a translation library such as LiteLLM does it. The difference here is where the translation and the controls live. Instead of a per-developer proxy with the model key baked in, agentgateway does the protocol translation in the data plane, holds the key in the cluster, and enforces identity and authorization on the way in. The same gateway that swaps the model also issues the access policy.
What you'll build
client
Claude Code
Sends the Anthropic Messages API at /v1/messages. Its
ANTHROPIC_BASE_URL points at the gateway, and its
ANTHROPIC_AUTH_TOKEN is a JWT this lab mints, not an
Anthropic key.
gateway · agentgateway-system
agentgateway
Validates the JWT, runs the CEL authorization rule, then translates the Anthropic request to OpenAI and the OpenAI reply back to Anthropic. Holds the OpenAI key in a Secret.
model
OpenAI
Reached as an AgentgatewayBackend with
provider: openai. Swap the provider block or the host/port
to point at an on-prem OpenAI-compatible server instead.
One kind cluster. The Anthropic Messages API goes in the front door, an OpenAI model answers at the back, and the client gets an Anthropic-shaped reply. The OpenAI key lives only in a Kubernetes Secret that the backend references, so it is never on a developer laptop and never in the request.
org, team); it never holds the model key. The
gateway validates the JWT, authorizes on the claims, translates Anthropic to
OpenAI, and attaches the OpenAI key from a cluster Secret on the upstream hop
alone. The reply is translated back to an Anthropic message.
How the translation works
agentgateway decides what to do with an LLM request from two things: the API
format it sees coming in, and the provider configured on the backend it routes
to. The incoming format is set per path with the ai.routes map on
the backend. Mapping the Anthropic path to the Messages format, on a backend
whose provider is OpenAI, is what turns on the two-way translation: the
Anthropic Messages request becomes an OpenAI chat-completions call on the way
in, and the OpenAI chat-completions reply becomes an Anthropic Messages
response on the way out.
ai.routes entry the gateway falls back to passing the body
through, and the client gets raw OpenAI JSON that Claude Code cannot read. I
hit exactly that on the first run: a clean 200, but a
chat.completion object instead of an Anthropic message. Adding
"/v1/messages": Messages to the backend fixed it, and the reply
came back in Anthropic shape.
Prerequisites
kind,kubectl,helm,docker,openssl,xxd,jq, and an authenticatedgcloudfor the public chart registry.- A Solo Enterprise agentgateway license key in
AGENTGATEWAY_LICENSE_KEY(export it, or pointSECRETS_FILEat a sourceable file). - An OpenAI API key in
OPENAI_API_KEY, or in a file the scripts read. It only ever becomes a cluster Secret.
Steps
Step 1 — Bring it up
One command creates the cluster, installs agentgateway, wires the OpenAI backend and route, and applies the JWT plus authorization policy. It is idempotent.
bash scripts/quick.sh up
export AGENTGATEWAY_LICENSE_KEY="your-license-key"
export OPENAI_API_KEY="sk-..." # or drop it in a file the scripts read
./scripts/quick.sh up
# then, in one shell:
./scripts/quick.sh demo # port-forward to localhost:8080
# and in another:
./scripts/quick.sh test # the three scenarios below
Step 2 — The cluster
A single kind cluster with the Gateway API CRDs applied. Nothing agentgateway-specific yet.
bash scripts/01-cluster.sh (the core)
kind create cluster --config ./kind/cluster.yaml
kubectl apply -f \
https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.5.1/standard-install.yaml
Step 3 — Install agentgateway
Two charts from the public OCI registry. agentgateway chart versions carry a
v prefix. Creating the Gateway with
gatewayClassName: enterprise-agentgateway auto-provisions the
proxy.
bash scripts/02-agentgateway.sh (the helm calls)
REG=oci://us-docker.pkg.dev/solo-public/enterprise-agentgateway/charts
helm upgrade --install enterprise-agentgateway-crds \
$REG/enterprise-agentgateway-crds --version v2.3.4 \
--namespace agentgateway-system --create-namespace --wait
helm upgrade --install enterprise-agentgateway \
$REG/enterprise-agentgateway --version v2.3.4 \
--namespace agentgateway-system --create-namespace \
--set licensing.licenseKey="$AGENTGATEWAY_LICENSE_KEY" --wait
yaml yaml/gateway.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: agentgateway-proxy
namespace: agentgateway-system
spec:
gatewayClassName: enterprise-agentgateway
listeners:
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All
Step 4 — The model behind the API
The backend names the provider and the model, and references the Secret that
holds the OpenAI key. The ai.routes entry is what reads the
Anthropic path as Messages input, so the gateway translates in both
directions. To run an on-prem model instead, keep this shape and point
host and port at your own OpenAI-compatible server,
or change the provider block.
yaml yaml/backend.yaml
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
name: openai
namespace: agentgateway-system
spec:
ai:
provider:
openai:
model: gpt-4o-mini
policies:
auth:
secretRef:
name: openai-secret # holds "Authorization: Bearer sk-..."
ai:
routes:
"/v1/messages": Messages # read this path as Anthropic Messages -> translate
bash scripts/03-backend.sh (the Secret)
kubectl -n agentgateway-system create secret generic openai-secret \
--from-literal="Authorization=Bearer ${OPENAI_API_KEY}" \
--dry-run=client -o yaml | kubectl apply -f -
Step 5 — The route
Send the Anthropic path to the OpenAI backend. Routing
/v1/messages to a backend whose provider is OpenAI is what pairs
with the ai.routes entry above.
yaml yaml/httproute.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: claude-to-openai
namespace: agentgateway-system
spec:
parentRefs:
- name: agentgateway-proxy
namespace: agentgateway-system
rules:
- matches:
- path:
type: PathPrefix
value: /v1/messages
backendRefs:
- name: openai
namespace: agentgateway-system
group: agentgateway.dev
kind: AgentgatewayBackend
Step 6 — Identity, RBAC and observability Optional
This step is optional. Steps 1 to 5 are the whole translation: Claude Code already talks to the model and gets Anthropic-shaped replies. You can stop there. This step is the layer a data platform team asked for on top of it, and it is where agentgateway earns its place over a plain translation proxy.
What is happening. The team wants to let people use Claude Code against a model the platform runs, without handing the model credential to anyone, while controlling who is allowed and seeing how it is used. On the same gateway, three things now happen to every request before the model is ever reached: the caller's identity is verified from a JWT, that identity is checked against an authorization rule, and the call is recorded. The model key stays in the gateway throughout, so the question "who can use the model" is answered by policy, not by who has a copy of the key.
Why this pattern is worth it.
- The model key lives in one place. Rotate it once on the gateway instead of on every laptop, and a leaked client token never exposes it. Developers hold only a short-lived JWT.
- Access is by identity, not a shared secret. Add or remove a team by changing a claim rule or revoking a token, with no change to the model or to Claude Code.
- Entitlement can reach down to the model. The same gateway
can let one team reach
gpt-4o-miniand another a larger model, decided from the claims in their token. - Usage is attributable. Every call logs the model and token counts against the caller, so cost attribution and audit come from the gateway, with nothing to instrument in the client.
EnterpriseAgentgatewayPolicy on the Gateway carries all of it.
Credentials: set jwtAuthentication to
mode: Strict with a provider (issuer, audience, and a JWKS,
inline here or a real OIDC endpoint). Callers present a short-lived JWT, never
the model key. RBAC: set authorization with a CEL
matchExpressions rule over the claims (here jwt.org
and jwt.team; extend it to the llms entitlement
claim to gate which models a team may call). Observability: nothing
to configure, the proxy already logs the model, token counts and outcome of
every call, shown in the next section.
The gateway issues the access contract. A single
EnterpriseAgentgatewayPolicy on the Gateway requires a valid JWT
(mode Strict, validated against an inline JWKS) and then runs a
CEL rule over the token claims. Only org=acme and
team=data-platform get through. The OpenAI key is not part of
this at all: the client only ever holds its own JWT.
yaml yaml/rbac-policy.yaml
apiVersion: enterpriseagentgateway.solo.io/v1alpha1
kind: EnterpriseAgentgatewayPolicy
metadata:
name: claude-code-rbac
namespace: agentgateway-system
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: agentgateway-proxy
traffic:
jwtAuthentication:
mode: Strict
providers:
- issuer: claude-code-lab
audiences:
- anthropic-api
jwks:
inline: |
{ ... public JWKS, filled in by 04-rbac.sh ... }
authorization:
action: Allow
policy:
matchExpressions:
- 'jwt.org == "acme" && jwt.team == "data-platform"'
Two keys, and a token. 04-rbac.sh generates one
RSA keypair. The private key never leaves .gen/ and is
used only to sign tokens. The public half is the inline JWKS on the
policy, and the gateway uses it only to check signatures. A public key can
verify a token but cannot mint one, so the JWKS is safe in config: it is not a
secret, and it is not handed to clients. Clients never hold a key at all. They
hold a signed JWT, and that JWT is the Bearer they send.
How a minted token lines up with the policy. The mint script and the policy are the two ends of that one keypair. Minting signs with the private key; the gateway verifies with the public JWKS. The token is accepted only when three things agree, and they are hard-coded to match on both sides:
| Gateway checks | scripts/mint-token.sh | EnterpriseAgentgatewayPolicy |
|---|---|---|
| Signature | signs with .gen/jwt-private.pem, header kid: claude-code-key | verifies against the public jwks.inline (same kid) |
| Issuer | iss: claude-code-lab | issuer: claude-code-lab |
| Audience | aud: anthropic-api | audiences: [anthropic-api] |
That is authentication: is the token genuine, from the right issuer, for this
audience. The team argument touches none of it.
./scripts/mint-token.sh marketing signs with the same key and the
same iss / aud, so it is a perfectly valid token; it
only changes the team claim. It clears authentication and is then
turned away by authorization, the separate CEL rule
jwt.team == "data-platform". That split is why a missing token is
a 401 (authentication) while the wrong team is a 403 (authorization). The
llms claim rides along the same way, logged on every call and
ready for a finer-grained rule on the same CEL surface.
The mint script is a mock IdP. It holds the private key and
stamps the claims, which is exactly what an identity provider does when a user
logs in. It just keeps the lab self-contained, with no IdP to stand up. In
your own environment you would not mint by hand: point the provider's
jwks at your IdP's OIDC endpoint (Okta, Keycloak, Entra,
Frontegg) instead of inline, and the user's normal login issues the token. The
gateway still checks the same signature, issuer and audience, so nothing else
in the lab changes. For wiring real identity providers at the gateway and
validating their tokens, see
RFC 8693 token exchange across
identity providers.
bash scripts/mint-token.sh (the claims)
# default: org=acme, team=data-platform (authorized)
# ./scripts/mint-token.sh
# wrong team:
# ./scripts/mint-token.sh marketing
payload='{"iss":"claude-code-lab","aud":"anthropic-api",
"org":"acme","team":"data-platform",
"llms":{"openai":["gpt-4o-mini"]},
"iat":,"exp":}'
# signed RS256 with the key whose public JWKS is on the policy
Run it
Port-forward the gateway to localhost:8080 and send the same
Anthropic Messages body Claude Code would send. Three cases, all run live on
this lab.
| Request | Result | Why |
|---|---|---|
No Authorization header |
401 | authentication failure: no bearer token found — JWT is required (mode Strict). |
JWT with team=marketing |
403 | authorization failed — the CEL rule allows only team=data-platform. |
JWT with org=acme, team=data-platform |
200 | Translated to OpenAI, answered by gpt-4o-mini, returned in Anthropic format. |
bash the authorized call
TOKEN=$(./scripts/mint-token.sh) # org=acme, team=data-platform
curl -s http://localhost:8080/v1/messages \
-H 'content-type: application/json' \
-H 'anthropic-version: 2023-06-01' \
-H "Authorization: Bearer $TOKEN" \
-d '{"model":"claude-3-5-sonnet-20241022","max_tokens":128,
"messages":[{"role":"user","content":"In one sentence, what is an API gateway?"}]}'
The body asked for claude-3-5-sonnet. What came back is an
Anthropic message, with type: message, a content
block array, stop_reason, and Anthropic-style
usage. The model field gives the game away: it was
gpt-4o-mini that actually answered. Claude Code reads this as a
normal Anthropic response.
json the 200 response (real output)
{
"id": "chatcmpl-Drj14BgFrnw6vszvHCynSypQlsoww",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "An API gateway is a server that acts as an intermediary between clients and backend services, managing requests, routing them to the appropriate services, handling authentication, and aggregating results."
}
],
"model": "gpt-4o-mini-2024-07-18",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 17,
"output_tokens": 35,
"service_tier": "default"
}
}
What the gateway sees
Every call lands in the proxy access log as a structured line. The denials carry the reason, and the authorized call carries the provider, the model that actually served it, and the token counts, as OpenTelemetry GenAI attributes. This is the per-request visibility the team asked for, without instrumenting Claude Code or the model.
text agentgateway proxy access log (trimmed, real output)
route=claude-to-openai http.path=/v1/messages http.status=401
error="authentication failure: no bearer token found" reason=JwtAuth
route=claude-to-openai http.path=/v1/messages http.status=403
error="authorization failed" reason=Authorization
route=claude-to-openai endpoint=api.openai.com:443 http.path=/v1/messages http.status=200
protocol=llm gen_ai.provider.name=openai
gen_ai.request.model=gpt-4o-mini gen_ai.response.model=gpt-4o-mini-2024-07-18
gen_ai.usage.input_tokens=17 gen_ai.usage.output_tokens=33
gen_ai.request.max_tokens=128 duration=2776ms
Point Claude Code at it
Claude Code reads ANTHROPIC_BASE_URL and
ANTHROPIC_AUTH_TOKEN. Set the base URL to the gateway and the
token to a minted JWT — ANTHROPIC_AUTH_TOKEN is sent as
Authorization: Bearer, which is what the gateway's JWT policy
validates (ANTHROPIC_API_KEY would go as x-api-key
and bypass it). From there Claude Code is talking to gpt-4o-mini
while believing it is talking to Anthropic.
bash client environment
export ANTHROPIC_BASE_URL=http://localhost:8080
export ANTHROPIC_AUTH_TOKEN=$(./scripts/mint-token.sh) # the gateway JWT, sent as Authorization: Bearer
Swap in your own model
Real OpenAI is the backend in this lab because it is the quickest thing to
prove against. The shape is the same for a model you host. Keep the backend,
the route, and the policy as they are, and change only the provider block: set
host and port to your in-cluster
OpenAI-compatible server, such as vLLM or Ollama, and the
ai.routes translation keeps working. The client contract does not
change, and the credential still lives in the Secret rather than on the
client.
Why put this in the gateway
A translation library gets Claude Code talking to another model. Doing it in agentgateway gets you three more things in the same place. The model credential stays in the cluster and is handed to no one; access is gated by an identity you issue and a CEL rule you control, so the answer to "who can use the model" is policy, not a shared key; and every call is visible with the provider, model, and token counts already in the log. It is one control plane for the AI traffic, and the same gateway that swaps the model also governs who reaches it.
Versions
Built and verified on both editions:
v1.3.0v1.5.1v2.3.4v1.5.1