Claude Code on a non-Anthropic model, through agentgateway, by Tom O'Rourke

The ask. "We want our team to use Claude Code, but the model has to be one we run, not Anthropic's. Our model is served behind an OpenAI-style API. We also want the gateway to hold the credentials, decide who is allowed to call it, and give us per-request visibility, rather than handing the raw key to every developer."

This is the same idea as pointing Claude Code at a non-Anthropic model the way a translation library such as LiteLLM does it. The difference here is where the translation and the controls live. Instead of a per-developer proxy with the model key baked in, agentgateway does the protocol translation in the data plane, holds the key in the cluster, and enforces identity and authorization on the way in. The same gateway that swaps the model also issues the access policy.

What you'll build

client

Claude Code

Sends the Anthropic Messages API at /v1/messages. Its ANTHROPIC_BASE_URL points at the gateway, and its ANTHROPIC_AUTH_TOKEN is a JWT this lab mints, not an Anthropic key.

gateway · agentgateway-system

agentgateway

Validates the JWT, runs the CEL authorization rule, then translates the Anthropic request to OpenAI and the OpenAI reply back to Anthropic. Holds the OpenAI key in a Secret.

model

OpenAI

Reached as an AgentgatewayBackend with provider: openai. Swap the provider block or the host/port to point at an on-prem OpenAI-compatible server instead.

One kind cluster. The Anthropic Messages API goes in the front door, an OpenAI model answers at the back, and the client gets an Anthropic-shaped reply. The OpenAI key lives only in a Kubernetes Secret that the backend references, so it is never on a developer laptop and never in the request.

The credential boundary. Claude Code authenticates to the gateway with a short-lived JWT that carries only its identity (org, team); it never holds the model key. The gateway validates the JWT, authorizes on the claims, translates Anthropic to OpenAI, and attaches the OpenAI key from a cluster Secret on the upstream hop alone. The reply is translated back to an Anthropic message.

How the translation works

agentgateway decides what to do with an LLM request from two things: the API format it sees coming in, and the provider configured on the backend it routes to. The incoming format is set per path with the ai.routes map on the backend. Mapping the Anthropic path to the Messages format, on a backend whose provider is OpenAI, is what turns on the two-way translation: the Anthropic Messages request becomes an OpenAI chat-completions call on the way in, and the OpenAI chat-completions reply becomes an Anthropic Messages response on the way out.

This step is the whole trick. Without the ai.routes entry the gateway falls back to passing the body through, and the client gets raw OpenAI JSON that Claude Code cannot read. I hit exactly that on the first run: a clean 200, but a chat.completion object instead of an Anthropic message. Adding "/v1/messages": Messages to the backend fixed it, and the reply came back in Anthropic shape.

Prerequisites

kind, kubectl, helm, docker, openssl, xxd, jq, and an authenticated gcloud for the public chart registry.
A Solo Enterprise agentgateway license key in AGENTGATEWAY_LICENSE_KEY (export it, or point SECRETS_FILE at a sourceable file).
An OpenAI API key in OPENAI_API_KEY, or in a file the scripts read. It only ever becomes a cluster Secret.

Steps

Step 1 — Bring it up

One command creates the cluster, installs agentgateway, wires the OpenAI backend and route, and applies the JWT plus authorization policy. It is idempotent.

bash scripts/quick.sh up

export AGENTGATEWAY_LICENSE_KEY="your-license-key"
export OPENAI_API_KEY="sk-..."          # or drop it in a file the scripts read
./scripts/quick.sh up

# then, in one shell:
./scripts/quick.sh demo                 # port-forward to localhost:8080
# and in another:
./scripts/quick.sh test                 # the three scenarios below

Step 2 — The cluster

A single kind cluster with the Gateway API CRDs applied. Nothing agentgateway-specific yet.

bash scripts/01-cluster.sh (the core)

kind create cluster --config ./kind/cluster.yaml

kubectl apply -f \
  https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.5.1/standard-install.yaml

Step 3 — Install agentgateway

Two charts from the public OCI registry. agentgateway chart versions carry a v prefix. Creating the Gateway with gatewayClassName: enterprise-agentgateway auto-provisions the proxy.

bash scripts/02-agentgateway.sh (the helm calls)

REG=oci://us-docker.pkg.dev/solo-public/enterprise-agentgateway/charts

helm upgrade --install enterprise-agentgateway-crds \
  $REG/enterprise-agentgateway-crds --version v2.3.4 \
  --namespace agentgateway-system --create-namespace --wait

helm upgrade --install enterprise-agentgateway \
  $REG/enterprise-agentgateway --version v2.3.4 \
  --namespace agentgateway-system --create-namespace \
  --set licensing.licenseKey="$AGENTGATEWAY_LICENSE_KEY" --wait

yaml yaml/gateway.yaml

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: agentgateway-proxy
  namespace: agentgateway-system
spec:
  gatewayClassName: enterprise-agentgateway
  listeners:
    - name: http
      protocol: HTTP
      port: 80
      allowedRoutes:
        namespaces:
          from: All

Step 4 — The model behind the API

The backend names the provider and the model, and references the Secret that holds the OpenAI key. The ai.routes entry is what reads the Anthropic path as Messages input, so the gateway translates in both directions. To run an on-prem model instead, keep this shape and point host and port at your own OpenAI-compatible server, or change the provider block.

yaml yaml/backend.yaml

apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  name: openai
  namespace: agentgateway-system
spec:
  ai:
    provider:
      openai:
        model: gpt-4o-mini
  policies:
    auth:
      secretRef:
        name: openai-secret          # holds "Authorization: Bearer sk-..."
    ai:
      routes:
        "/v1/messages": Messages      # read this path as Anthropic Messages -> translate

bash scripts/03-backend.sh (the Secret)

kubectl -n agentgateway-system create secret generic openai-secret \
  --from-literal="Authorization=Bearer ${OPENAI_API_KEY}" \
  --dry-run=client -o yaml | kubectl apply -f -

Step 5 — The route

Send the Anthropic path to the OpenAI backend. Routing /v1/messages to a backend whose provider is OpenAI is what pairs with the ai.routes entry above.

yaml yaml/httproute.yaml

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: claude-to-openai
  namespace: agentgateway-system
spec:
  parentRefs:
    - name: agentgateway-proxy
      namespace: agentgateway-system
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /v1/messages
      backendRefs:
        - name: openai
          namespace: agentgateway-system
          group: agentgateway.dev
          kind: AgentgatewayBackend

Step 6 — Identity, RBAC and observability Optional

This step is optional. Steps 1 to 5 are the whole translation: Claude Code already talks to the model and gets Anthropic-shaped replies. You can stop there. This step is the layer a data platform team asked for on top of it, and it is where agentgateway earns its place over a plain translation proxy.

What is happening. The team wants to let people use Claude Code against a model the platform runs, without handing the model credential to anyone, while controlling who is allowed and seeing how it is used. On the same gateway, three things now happen to every request before the model is ever reached: the caller's identity is verified from a JWT, that identity is checked against an authorization rule, and the call is recorded. The model key stays in the gateway throughout, so the question "who can use the model" is answered by policy, not by who has a copy of the key.

Why this pattern is worth it.

The model key lives in one place. Rotate it once on the gateway instead of on every laptop, and a leaked client token never exposes it. Developers hold only a short-lived JWT.
Access is by identity, not a shared secret. Add or remove a team by changing a claim rule or revoking a token, with no change to the model or to Claude Code.
Entitlement can reach down to the model. The same gateway can let one team reach gpt-4o-mini and another a larger model, decided from the claims in their token.
Usage is attributable. Every call logs the model and token counts against the caller, so cost attribution and audit come from the gateway, with nothing to instrument in the client.

What you configure. One EnterpriseAgentgatewayPolicy on the Gateway carries all of it. Credentials: set jwtAuthentication to mode: Strict with a provider (issuer, audience, and a JWKS, inline here or a real OIDC endpoint). Callers present a short-lived JWT, never the model key. RBAC: set authorization with a CEL matchExpressions rule over the claims (here jwt.org and jwt.team; extend it to the llms entitlement claim to gate which models a team may call). Observability: nothing to configure, the proxy already logs the model, token counts and outcome of every call, shown in the next section.

The gateway issues the access contract. A single EnterpriseAgentgatewayPolicy on the Gateway requires a valid JWT (mode Strict, validated against an inline JWKS) and then runs a CEL rule over the token claims. Only org=acme and team=data-platform get through. The OpenAI key is not part of this at all: the client only ever holds its own JWT.

yaml yaml/rbac-policy.yaml

apiVersion: enterpriseagentgateway.solo.io/v1alpha1
kind: EnterpriseAgentgatewayPolicy
metadata:
  name: claude-code-rbac
  namespace: agentgateway-system
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: agentgateway-proxy
  traffic:
    jwtAuthentication:
      mode: Strict
      providers:
        - issuer: claude-code-lab
          audiences:
            - anthropic-api
          jwks:
            inline: |
              { ... public JWKS, filled in by 04-rbac.sh ... }
    authorization:
      action: Allow
      policy:
        matchExpressions:
          - 'jwt.org == "acme" && jwt.team == "data-platform"'

Two keys, and a token. 04-rbac.sh generates one RSA keypair. The private key never leaves .gen/ and is used only to sign tokens. The public half is the inline JWKS on the policy, and the gateway uses it only to check signatures. A public key can verify a token but cannot mint one, so the JWKS is safe in config: it is not a secret, and it is not handed to clients. Clients never hold a key at all. They hold a signed JWT, and that JWT is the Bearer they send.

How a minted token lines up with the policy. The mint script and the policy are the two ends of that one keypair. Minting signs with the private key; the gateway verifies with the public JWKS. The token is accepted only when three things agree, and they are hard-coded to match on both sides:

Gateway checks	scripts/mint-token.sh	EnterpriseAgentgatewayPolicy
Signature	signs with `.gen/jwt-private.pem`, header `kid: claude-code-key`	verifies against the public `jwks.inline` (same `kid`)
Issuer	`iss: claude-code-lab`	`issuer: claude-code-lab`
Audience	`aud: anthropic-api`	`audiences: [anthropic-api]`

That is authentication: is the token genuine, from the right issuer, for this audience. The team argument touches none of it. ./scripts/mint-token.sh marketing signs with the same key and the same iss / aud, so it is a perfectly valid token; it only changes the team claim. It clears authentication and is then turned away by authorization, the separate CEL rule jwt.team == "data-platform". That split is why a missing token is a 401 (authentication) while the wrong team is a 403 (authorization). The llms claim rides along the same way, logged on every call and ready for a finer-grained rule on the same CEL surface.

The mint script is a mock IdP. It holds the private key and stamps the claims, which is exactly what an identity provider does when a user logs in. It just keeps the lab self-contained, with no IdP to stand up. In your own environment you would not mint by hand: point the provider's jwks at your IdP's OIDC endpoint (Okta, Keycloak, Entra, Frontegg) instead of inline, and the user's normal login issues the token. The gateway still checks the same signature, issuer and audience, so nothing else in the lab changes. For wiring real identity providers at the gateway and validating their tokens, see RFC 8693 token exchange across identity providers.

bash scripts/mint-token.sh (the claims)

# default: org=acme, team=data-platform  (authorized)
#   ./scripts/mint-token.sh
# wrong team:
#   ./scripts/mint-token.sh marketing

payload='{"iss":"claude-code-lab","aud":"anthropic-api",
  "org":"acme","team":"data-platform",
  "llms":{"openai":["gpt-4o-mini"]},
  "iat":,"exp":}'
# signed RS256 with the key whose public JWKS is on the policy

Run it

Port-forward the gateway to localhost:8080 and send the same Anthropic Messages body Claude Code would send. Three cases, all run live on this lab.

Request	Result	Why
No `Authorization` header	401	`authentication failure: no bearer token found` — JWT is required (mode Strict).
JWT with `team=marketing`	403	`authorization failed` — the CEL rule allows only `team=data-platform`.
JWT with `org=acme`, `team=data-platform`	200	Translated to OpenAI, answered by `gpt-4o-mini`, returned in Anthropic format.

bash the authorized call

TOKEN=$(./scripts/mint-token.sh)        # org=acme, team=data-platform

curl -s http://localhost:8080/v1/messages \
  -H 'content-type: application/json' \
  -H 'anthropic-version: 2023-06-01' \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"model":"claude-3-5-sonnet-20241022","max_tokens":128,
       "messages":[{"role":"user","content":"In one sentence, what is an API gateway?"}]}'

The body asked for claude-3-5-sonnet. What came back is an Anthropic message, with type: message, a content block array, stop_reason, and Anthropic-style usage. The model field gives the game away: it was gpt-4o-mini that actually answered. Claude Code reads this as a normal Anthropic response.

json the 200 response (real output)

{
  "id": "chatcmpl-Drj14BgFrnw6vszvHCynSypQlsoww",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "An API gateway is a server that acts as an intermediary between clients and backend services, managing requests, routing them to the appropriate services, handling authentication, and aggregating results."
    }
  ],
  "model": "gpt-4o-mini-2024-07-18",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 17,
    "output_tokens": 35,
    "service_tier": "default"
  }
}

What the gateway sees

Every call lands in the proxy access log as a structured line. The denials carry the reason, and the authorized call carries the provider, the model that actually served it, and the token counts, as OpenTelemetry GenAI attributes. This is the per-request visibility the team asked for, without instrumenting Claude Code or the model.

text agentgateway proxy access log (trimmed, real output)

route=claude-to-openai http.path=/v1/messages http.status=401
  error="authentication failure: no bearer token found" reason=JwtAuth

route=claude-to-openai http.path=/v1/messages http.status=403
  error="authorization failed" reason=Authorization

route=claude-to-openai endpoint=api.openai.com:443 http.path=/v1/messages http.status=200
  protocol=llm gen_ai.provider.name=openai
  gen_ai.request.model=gpt-4o-mini gen_ai.response.model=gpt-4o-mini-2024-07-18
  gen_ai.usage.input_tokens=17 gen_ai.usage.output_tokens=33
  gen_ai.request.max_tokens=128 duration=2776ms

Point Claude Code at it

Claude Code reads ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN. Set the base URL to the gateway and the token to a minted JWT — ANTHROPIC_AUTH_TOKEN is sent as Authorization: Bearer, which is what the gateway's JWT policy validates (ANTHROPIC_API_KEY would go as x-api-key and bypass it). From there Claude Code is talking to gpt-4o-mini while believing it is talking to Anthropic.

bash client environment

export ANTHROPIC_BASE_URL=http://localhost:8080
export ANTHROPIC_AUTH_TOKEN=$(./scripts/mint-token.sh)   # the gateway JWT, sent as Authorization: Bearer

Swap in your own model

Real OpenAI is the backend in this lab because it is the quickest thing to prove against. The shape is the same for a model you host. Keep the backend, the route, and the policy as they are, and change only the provider block: set host and port to your in-cluster OpenAI-compatible server, such as vLLM or Ollama, and the ai.routes translation keeps working. The client contract does not change, and the credential still lives in the Secret rather than on the client.

Why put this in the gateway

A translation library gets Claude Code talking to another model. Doing it in agentgateway gets you three more things in the same place. The model credential stays in the cluster and is handed to no one; access is gated by an identity you issue and a CEL rule you control, so the answer to "who can use the model" is policy, not a shared key; and every call is visible with the provider, model, and token counts already in the log. It is one control plane for the AI traffic, and the same gateway that swaps the model also governs who reaches it.

Versions

Built and verified on both editions:

OSSvalidated 2026-06-18

agentgateway (OSS)v1.3.0

Gateway APIv1.5.1

Enterprisevalidated 2026-06-18

Solo Enterprise for agentgatewayv2.3.4

Gateway APIv1.5.1