Virtual keys in agentgateway, by Tom O'Rourke

What a virtual key actually is

In LiteLLM and Portkey, a virtual key is just a token string you give to a person, a team, or an app. They paste it into whatever makes the request, a curl command, Claude Code, an internal service, and send it on every call. The gateway checks the string, works out who it belongs to, and applies that owner's rules: which models they can use, how fast they can call, how many tokens they get. Then it swaps in your real OpenAI or Anthropic key on the way out, so the person sending requests never sees the real key. One real key sits behind many of these strings, and you can switch any one of them off without touching the real key or anyone else.

So a virtual key is really doing three jobs at once: name who is calling, cap what they can spend, and keep the real provider key hidden. Hold on to those three jobs. They are exactly how agentgateway rebuilds the feature.

The two keys people confuse

Most of the confusion comes from the word "key" doing double duty. When a workshop tells you to "set up a key" for an LLM backend, that is the provider credential. It is not a virtual key. They live in different places and do opposite jobs.

The provider credential (gateway → LLM)

The real Anthropic or OpenAI key. It lives in a Secret and is referenced from the backend at spec.policies.auth.secretRef. agentgateway injects it on the upstream hop on its way out to the provider, and the route strips any client-supplied Authorization or x-api-key header so callers never see or set it.

One credential, shared by everything routing through that backend. It carries no caller identity and no budget. It is a secret to be hidden, not a handle to be issued.

The virtual key (client → gateway)

The token string you give to a user, team, or app. They send it in Authorization: Bearer on every request, the gateway validates it, maps it to an identity, and enforces that identity's budget and limits. It resolves to the provider credential above without ever exposing it.

Many of these, one per consumer. Each carries identity and budget, and each is revocable on its own.

	Provider credential	Virtual key
Who it authenticates	Gateway → LLM provider	Client → Gateway
Where it lives	`backend.policies.auth.secretRef`	API-key Secret + `apiKeyAuthentication`
Carries identity?	No	Yes (`metadata.user_id`)
Carries budget / limits?	No	Yes (token-based rate limit, keyed by identity)
How many	One, shared by the backend	Many, one per consumer
Hides the provider key?	It is the provider key	Yes, the client never sees it

How agentgateway delivers virtual keys

agentgateway builds virtual keys out of three first-class capabilities it already gives you: API-key authentication, token-based rate limiting, and per-key observability. Each is a standard policy you configure on its own, and together they produce the full virtual-key experience: a per-consumer key that identifies the caller, carries its own budget, and resolves to a hidden provider credential.

Composing it from these three pieces is the strength here. The auth, the budgets, and the metrics are independent, so you tune each one to the policy you actually want, layer them with the rest of agentgateway (JWT, prompt guards, model routing) in the same place, and keep everything in plain Kubernetes resources you already manage with GitOps. You get the virtual-key outcome without inheriting one vendor's fixed, all-in-one shape.

The three capabilities map straight onto the three jobs of a virtual key:

Job of a virtual key	agentgateway capability	Where it is configured
Identify the caller	API-key authentication	`traffic.apiKeyAuthentication` on `EnterpriseAgentgatewayPolicy`, backed by a Secret of per-user keys + metadata
Cap what they spend	Token-based rate limiting	`traffic.rateLimit.global`, descriptor keyed on the API key's metadata
Track per-key usage	Observability metrics	`agentgateway_gen_ai_client_token_usage` in Prometheus, broken down by `user_id`

How a request flows

A request arrives with a virtual key in Authorization: Bearer.
agentgateway validates the key against the API-key Secret. An invalid key is rejected before anything else runs, so it never touches a budget.
The caller's user_id is read from that key's metadata.
The request is checked against that user's token budget on the rate-limit server.
If budget remains, the request goes upstream. agentgateway injects the hidden provider credential and strips the client's inbound auth header.
Token usage from the response is deducted from the user's budget and recorded in metrics.
If the budget is exhausted, the request is rejected with 429 Too Many Requests.
Budgets refill on the configured interval: daily, hourly, whatever you set.

Mind the evaluation order

Authentication runs before rate limiting, so an unauthenticated request never consumes quota. But rate limiting runs before prompt guards. A request that is later blocked by a content guard with a 403 has already drawn down the user's token budget. Worth knowing before you reason about why a budget moved on a request that "failed".

Build it: per-user budgets

This is the smallest complete setup: two virtual keys, for Alice and Bob, each with an independent daily budget of 100,000 tokens. Every field below is from the current EnterpriseAgentgatewayPolicy schema.

1. Issue the virtual keys

Each entry in stringData is one virtual key. The key is the bearer token the client sends. The metadata is what the gateway uses downstream to identify and meter the caller.

apiVersion: v1
kind: Secret
metadata:
  name: llm-api-keys
  namespace: agentgateway-system
type: Opaque
stringData:
  alice: |
    {
      "key": "sk-alice-abc123def456",
      "metadata": { "user_id": "alice" }
    }
  bob: |
    {
      "key": "sk-bob-xyz789uvw012",
      "metadata": { "user_id": "bob" }
    }

2. Require the virtual key

ENT Attach API-key authentication to the Gateway so every route demands a valid key. mode: Strict rejects anything without one.

apiVersion: enterpriseagentgateway.solo.io/v1alpha1
kind: EnterpriseAgentgatewayPolicy
metadata:
  name: api-key-auth
  namespace: agentgateway-system
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: agentgateway-proxy
  traffic:
    apiKeyAuthentication:
      mode: Strict
      secretRef:
        name: llm-api-keys

3. Attach the per-key budget

ENT The descriptor entry pulls user_id out of the API key's metadata with a CEL expression, and unit: Tokens meters tokens rather than requests. Each user gets their own bucket.

apiVersion: enterpriseagentgateway.solo.io/v1alpha1
kind: EnterpriseAgentgatewayPolicy
metadata:
  name: daily-token-budget
  namespace: agentgateway-system
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: agentgateway-proxy
  traffic:
    rateLimit:
      global:
        domain: token-budgets
        backendRef:
          kind: Service
          name: rate-limit-server
          namespace: agentgateway-system
          port: 8081
        descriptors:
        - entries:
          - name: user_id
            expression: 'apiKey.metadata.user_id'
          unit: Tokens

4. Set the actual limit on the rate-limit server

The policy says "meter per user_id in tokens"; the rate-limit server holds the number. domain and the descriptor key must match the policy above.

apiVersion: v1
kind: ConfigMap
metadata:
  name: rate-limit-config
  namespace: agentgateway-system
data:
  config.yaml: |
    domain: token-budgets
    descriptors:
    - key: user_id
      rate_limit:
        unit: day
        requests_per_unit: 100000   # 100k tokens/day per user

5. The backend, where the provider credential hides

This is the other key. openai-secret holds the real provider credential and never leaves the gateway.

apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  name: openai
  namespace: agentgateway-system
spec:
  ai:
    provider:
      openai:
        model: gpt-3.5-turbo
  policies:
    auth:
      secretRef:
        name: openai-secret

An HTTPRoute sending /openai to that backend finishes the wiring. Then it behaves exactly like a virtual-key system: Alice's key works until her 100,000 tokens are gone and she starts getting 429s, while Bob keeps going on his own untouched budget.

# Alice's virtual key
curl "$INGRESS_GW_ADDRESS/openai" \
  -H "Authorization: Bearer sk-alice-abc123def456" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-3.5-turbo","messages":[{"role":"user","content":"Hello!"}]}'

# Once her budget is spent:
# HTTP/1.1 429 Too Many Requests
# x-ratelimit-limit: 100000
# x-ratelimit-remaining: 0

Tiered budgets by user type

Because identity is just metadata on the key, you scale budgets by adding a field. Tag each key with a tier, then add it as a second descriptor entry. Free, standard, and premium users now draw from different daily limits.

Add the tier to the key, then to the descriptor

# In the Secret
alice: |
  { "key": "sk-alice-abc123def456",
    "metadata": { "user_id": "alice", "tier": "premium" } }

# In the rateLimit descriptor
descriptors:
- entries:
  - name: tier
    expression: 'apiKey.metadata.tier'
  - name: user_id
    expression: 'apiKey.metadata.user_id'
  unit: Tokens

Tier limits on the rate-limit server

domain: token-budgets
descriptors:
- key: tier
  value: "free"
  descriptors:
  - key: user_id
    rate_limit: { unit: day, requests_per_unit: 10000 }
- key: tier
  value: "premium"
  descriptors:
  - key: user_id
    rate_limit: { unit: day, requests_per_unit: 500000 }

Multi-tenant virtual keys

The same move scopes keys to a tenant as well as a user. Add a tenant_id to the key metadata and lead the descriptor with it, so every user's budget is nested under their tenant.

# In the rateLimit descriptor
descriptors:
- entries:
  - name: tenant_id
    expression: 'apiKey.metadata.tenant_id'
  - name: user_id
    expression: 'apiKey.metadata.user_id'
  unit: Tokens

Shorter budget windows

The refresh interval lives on the rate-limit server, not the policy. Drop unit from day to hour (or minute, second) for tighter control, so a runaway key can only burn its allowance for an hour before it refills rather than a whole day.

# In the rate-limit-config ConfigMap
domain: token-budgets
descriptors:
- key: user_id
  rate_limit:
    unit: hour
    requests_per_unit: 10000   # 10k tokens/hour per user

Track per-key spend

The third job, knowing what each key actually costs, comes from the token-usage metric, broken down by the same user_id. agentgateway records input and output tokens per request, so you can total a user's daily consumption and turn it into money with your provider's pricing.

# Total tokens per user over the last 24h
sum by (user_id) (
  increase(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="input"}[24h]) +
  increase(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="output"}[24h])
)

# Percentage of a 100k daily budget used, per user
(sum by (user_id) (
  increase(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="input"}[24h]) +
  increase(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="output"}[24h])
) / 100000) * 100

# Cost per user (example: $0.50 / 1M input, $1.50 / 1M output)
sum by (user_id) (
  ((rate(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="input"}[24h])  / 1e6) * 0.50) +
  ((rate(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="output"}[24h]) / 1e6) * 1.50)
)

One policy area per policy

If two EnterpriseAgentgatewayPolicy resources target the same Gateway or route with overlapping backend.ai fields, one silently overwrites the other based on creation order, and both still report ACCEPTED and ATTACHED. Keep auth, rate limiting, and guards in separate policies, as the examples above do, so they compose instead of clobbering each other.

Checklist

Standing up virtual keys

Two different keys: the provider credential hides at backend.policies.auth.secretRef, the virtual key is the client-facing handle in the API-key Secret. Don't conflate them.
Identity rides on key metadata. metadata.user_id is what the rate limiter and the metrics both key on, pulled out with apiKey.metadata.user_id in CEL.
Budgets need both halves: the rateLimit policy declares what to meter (unit: Tokens, keyed by user), the rate-limit-server ConfigMap holds how much. The domain and descriptor key must match across the two.
Use mode: Strict on apiKeyAuthentication so a missing key is rejected, not waved through.
Rate limiting runs before prompt guards, so a request blocked by a guard has still spent budget. Authentication runs before rate limiting, so a bad key spends nothing.
Scale by adding metadata, not new machinery: a tier field gives you tiered budgets, a tenant_id field gives you multi-tenant keys, both as extra descriptor entries.
Split auth, rate limiting, and guards into separate policies. Overlapping backend.ai fields in two policies silently overwrite by creation order while both show healthy status.
Keep metric label cardinality sane. user_id is fine for spend tracking; resist adding high-cardinality labels that will overwhelm Prometheus.