MastertheMesh
agentgateway · LLM consumption · reference
Reference

Virtual keys in agentgateway

TO
Tom O'Rourke
EMEA Field CTO · Solo.io

"Virtual key" is a LiteLLM and Portkey term, so the first question people ask is how agentgateway does it. The answer: agentgateway delivers virtual keys by composing three capabilities it already gives you, API-key authentication, token-based rate limiting, and per-key observability. This page explains what a virtual key actually is, why the workshop's provider key is a different thing entirely, and how those three pieces produce a per-consumer key with its own budget and a hidden provider credential. All YAML is drop-in.

Virtual keys API-key auth Token budgets Rate limiting Observability Multi-tenant LLM

What a virtual key actually is

In LiteLLM and Portkey, a virtual key is just a token string you give to a person, a team, or an app. They paste it into whatever makes the request, a curl command, Claude Code, an internal service, and send it on every call. The gateway checks the string, works out who it belongs to, and applies that owner's rules: which models they can use, how fast they can call, how many tokens they get. Then it swaps in your real OpenAI or Anthropic key on the way out, so the person sending requests never sees the real key. One real key sits behind many of these strings, and you can switch any one of them off without touching the real key or anyone else.

So a virtual key is really doing three jobs at once: name who is calling, cap what they can spend, and keep the real provider key hidden. Hold on to those three jobs. They are exactly how agentgateway rebuilds the feature.

The two keys people confuse

Most of the confusion comes from the word "key" doing double duty. When a workshop tells you to "set up a key" for an LLM backend, that is the provider credential. It is not a virtual key. They live in different places and do opposite jobs.

The provider credential (gateway → LLM)

The real Anthropic or OpenAI key. It lives in a Secret and is referenced from the backend at spec.policies.auth.secretRef. agentgateway injects it on the upstream hop on its way out to the provider, and the route strips any client-supplied Authorization or x-api-key header so callers never see or set it.

One credential, shared by everything routing through that backend. It carries no caller identity and no budget. It is a secret to be hidden, not a handle to be issued.

The virtual key (client → gateway)

The token string you give to a user, team, or app. They send it in Authorization: Bearer on every request, the gateway validates it, maps it to an identity, and enforces that identity's budget and limits. It resolves to the provider credential above without ever exposing it.

Many of these, one per consumer. Each carries identity and budget, and each is revocable on its own.

Provider credential Virtual key
Who it authenticates Gateway → LLM provider Client → Gateway
Where it lives backend.policies.auth.secretRef API-key Secret + apiKeyAuthentication
Carries identity? No Yes (metadata.user_id)
Carries budget / limits? No Yes (token-based rate limit, keyed by identity)
How many One, shared by the backend Many, one per consumer
Hides the provider key? It is the provider key Yes, the client never sees it

How agentgateway delivers virtual keys

agentgateway builds virtual keys out of three first-class capabilities it already gives you: API-key authentication, token-based rate limiting, and per-key observability. Each is a standard policy you configure on its own, and together they produce the full virtual-key experience: a per-consumer key that identifies the caller, carries its own budget, and resolves to a hidden provider credential.

Composing it from these three pieces is the strength here. The auth, the budgets, and the metrics are independent, so you tune each one to the policy you actually want, layer them with the rest of agentgateway (JWT, prompt guards, model routing) in the same place, and keep everything in plain Kubernetes resources you already manage with GitOps. You get the virtual-key outcome without inheriting one vendor's fixed, all-in-one shape.

The three capabilities map straight onto the three jobs of a virtual key:

Job of a virtual key agentgateway capability Where it is configured
Identify the caller API-key authentication traffic.apiKeyAuthentication on EnterpriseAgentgatewayPolicy, backed by a Secret of per-user keys + metadata
Cap what they spend Token-based rate limiting traffic.rateLimit.global, descriptor keyed on the API key's metadata
Track per-key usage Observability metrics agentgateway_gen_ai_client_token_usage in Prometheus, broken down by user_id

How a request flows

  1. A request arrives with a virtual key in Authorization: Bearer.
  2. agentgateway validates the key against the API-key Secret. An invalid key is rejected before anything else runs, so it never touches a budget.
  3. The caller's user_id is read from that key's metadata.
  4. The request is checked against that user's token budget on the rate-limit server.
  5. If budget remains, the request goes upstream. agentgateway injects the hidden provider credential and strips the client's inbound auth header.
  6. Token usage from the response is deducted from the user's budget and recorded in metrics.
  7. If the budget is exhausted, the request is rejected with 429 Too Many Requests.
  8. Budgets refill on the configured interval: daily, hourly, whatever you set.

Mind the evaluation order

Authentication runs before rate limiting, so an unauthenticated request never consumes quota. But rate limiting runs before prompt guards. A request that is later blocked by a content guard with a 403 has already drawn down the user's token budget. Worth knowing before you reason about why a budget moved on a request that "failed".

Build it: per-user budgets

This is the smallest complete setup: two virtual keys, for Alice and Bob, each with an independent daily budget of 100,000 tokens. Every field below is from the current EnterpriseAgentgatewayPolicy schema.

1. Issue the virtual keys

Each entry in stringData is one virtual key. The key is the bearer token the client sends. The metadata is what the gateway uses downstream to identify and meter the caller.

apiVersion: v1
kind: Secret
metadata:
  name: llm-api-keys
  namespace: agentgateway-system
type: Opaque
stringData:
  alice: |
    {
      "key": "sk-alice-abc123def456",
      "metadata": { "user_id": "alice" }
    }
  bob: |
    {
      "key": "sk-bob-xyz789uvw012",
      "metadata": { "user_id": "bob" }
    }

2. Require the virtual key

ENT Attach API-key authentication to the Gateway so every route demands a valid key. mode: Strict rejects anything without one.

apiVersion: enterpriseagentgateway.solo.io/v1alpha1
kind: EnterpriseAgentgatewayPolicy
metadata:
  name: api-key-auth
  namespace: agentgateway-system
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: agentgateway-proxy
  traffic:
    apiKeyAuthentication:
      mode: Strict
      secretRef:
        name: llm-api-keys

3. Attach the per-key budget

ENT The descriptor entry pulls user_id out of the API key's metadata with a CEL expression, and unit: Tokens meters tokens rather than requests. Each user gets their own bucket.

apiVersion: enterpriseagentgateway.solo.io/v1alpha1
kind: EnterpriseAgentgatewayPolicy
metadata:
  name: daily-token-budget
  namespace: agentgateway-system
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: agentgateway-proxy
  traffic:
    rateLimit:
      global:
        domain: token-budgets
        backendRef:
          kind: Service
          name: rate-limit-server
          namespace: agentgateway-system
          port: 8081
        descriptors:
        - entries:
          - name: user_id
            expression: 'apiKey.metadata.user_id'
          unit: Tokens

4. Set the actual limit on the rate-limit server

The policy says "meter per user_id in tokens"; the rate-limit server holds the number. domain and the descriptor key must match the policy above.

apiVersion: v1
kind: ConfigMap
metadata:
  name: rate-limit-config
  namespace: agentgateway-system
data:
  config.yaml: |
    domain: token-budgets
    descriptors:
    - key: user_id
      rate_limit:
        unit: day
        requests_per_unit: 100000   # 100k tokens/day per user

5. The backend, where the provider credential hides

This is the other key. openai-secret holds the real provider credential and never leaves the gateway.

apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  name: openai
  namespace: agentgateway-system
spec:
  ai:
    provider:
      openai:
        model: gpt-3.5-turbo
  policies:
    auth:
      secretRef:
        name: openai-secret

An HTTPRoute sending /openai to that backend finishes the wiring. Then it behaves exactly like a virtual-key system: Alice's key works until her 100,000 tokens are gone and she starts getting 429s, while Bob keeps going on his own untouched budget.

# Alice's virtual key
curl "$INGRESS_GW_ADDRESS/openai" \
  -H "Authorization: Bearer sk-alice-abc123def456" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-3.5-turbo","messages":[{"role":"user","content":"Hello!"}]}'

# Once her budget is spent:
# HTTP/1.1 429 Too Many Requests
# x-ratelimit-limit: 100000
# x-ratelimit-remaining: 0

Tiered budgets by user type

Because identity is just metadata on the key, you scale budgets by adding a field. Tag each key with a tier, then add it as a second descriptor entry. Free, standard, and premium users now draw from different daily limits.

Add the tier to the key, then to the descriptor

# In the Secret
alice: |
  { "key": "sk-alice-abc123def456",
    "metadata": { "user_id": "alice", "tier": "premium" } }

# In the rateLimit descriptor
descriptors:
- entries:
  - name: tier
    expression: 'apiKey.metadata.tier'
  - name: user_id
    expression: 'apiKey.metadata.user_id'
  unit: Tokens

Tier limits on the rate-limit server

domain: token-budgets
descriptors:
- key: tier
  value: "free"
  descriptors:
  - key: user_id
    rate_limit: { unit: day, requests_per_unit: 10000 }
- key: tier
  value: "premium"
  descriptors:
  - key: user_id
    rate_limit: { unit: day, requests_per_unit: 500000 }

Multi-tenant virtual keys

The same move scopes keys to a tenant as well as a user. Add a tenant_id to the key metadata and lead the descriptor with it, so every user's budget is nested under their tenant.

# In the rateLimit descriptor
descriptors:
- entries:
  - name: tenant_id
    expression: 'apiKey.metadata.tenant_id'
  - name: user_id
    expression: 'apiKey.metadata.user_id'
  unit: Tokens

Shorter budget windows

The refresh interval lives on the rate-limit server, not the policy. Drop unit from day to hour (or minute, second) for tighter control, so a runaway key can only burn its allowance for an hour before it refills rather than a whole day.

# In the rate-limit-config ConfigMap
domain: token-budgets
descriptors:
- key: user_id
  rate_limit:
    unit: hour
    requests_per_unit: 10000   # 10k tokens/hour per user

Track per-key spend

The third job, knowing what each key actually costs, comes from the token-usage metric, broken down by the same user_id. agentgateway records input and output tokens per request, so you can total a user's daily consumption and turn it into money with your provider's pricing.

# Total tokens per user over the last 24h
sum by (user_id) (
  increase(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="input"}[24h]) +
  increase(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="output"}[24h])
)

# Percentage of a 100k daily budget used, per user
(sum by (user_id) (
  increase(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="input"}[24h]) +
  increase(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="output"}[24h])
) / 100000) * 100

# Cost per user (example: $0.50 / 1M input, $1.50 / 1M output)
sum by (user_id) (
  ((rate(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="input"}[24h])  / 1e6) * 0.50) +
  ((rate(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="output"}[24h]) / 1e6) * 1.50)
)

One policy area per policy

If two EnterpriseAgentgatewayPolicy resources target the same Gateway or route with overlapping backend.ai fields, one silently overwrites the other based on creation order, and both still report ACCEPTED and ATTACHED. Keep auth, rate limiting, and guards in separate policies, as the examples above do, so they compose instead of clobbering each other.

Checklist

Standing up virtual keys