ExtAuth answers one question: should this request proceed? ExtProc answers a different one:
what should this request or response actually look like? Same external-service pattern, very
different surface area. This page walks the six processing phases, the
GatewayExtension + traffic-policy wiring on both kgateway and agentgateway, and a small Python
(and Go) server you can deploy in a kind cluster in under a minute.
Companion reference: Solo external auth service, the bouncer at the door. ExtProc is what runs after the bouncer waves the request through.
1 · What ExtProc can do
ExtProc is defined by envoy.service.ext_proc.v3.ExternalProcessor.
Envoy holds a single bidirectional gRPC stream open per HTTP request and sends a
ProcessingRequest message at each enabled phase. Your server replies with a
ProcessingResponse that can carry a header mutation, a body mutation, an immediate response
(terminate the request with a status code), or a dynamic-metadata write. The stream stays open until the
request ends.
| Capability | Direction | Notes |
|---|---|---|
| Add, set, remove, append headers | req & resp | Same shape as ExtAuth's header mutation, but on every phase. |
| Replace the body | req & resp | Buffered or streamed. The buffer cap is configurable on the gateway. |
| Inspect trailers | req & resp | Useful for HTTP/2 gRPC where status arrives in trailers. |
| Immediate response | any phase | Terminate the request from the ExtProc server (status, headers, body). The way to short-circuit on a guardrail hit. |
| Dynamic metadata | any phase | Write structured metadata that downstream filters (auth, rate-limit, telemetry) can read. |
| Dynamic routing decisions | request headers / body | Change the route cluster or override weights before Envoy commits. |
| Async observability | any phase | Just acknowledge the message, do the work in the background. The stream stays open and Envoy keeps going. |
2 · The processing flow
Envoy can send your server up to six phases per request. You opt into each phase via
processingMode on the GatewayExtension. Skip the phases you don't need, every enabled
phase is a gRPC round-trip.
REQ Request headers requestHeaderMode
Fires after Envoy parses the request line and headers, before route selection. Your reply can mutate headers, override the route cluster, or short-circuit with an immediate response.
REQ Request body requestBodyMode
BUFFERED sends the whole body in one message, STREAMED sends it chunk by chunk,
BUFFERED_PARTIAL sends up to the configured cap. NONE skips the phase.
REQ Request trailers requestTrailerMode
Only fires when the request has trailers (HTTP/2 gRPC, chunked uploads with trailers).
RESP Response headers responseHeaderMode
Fires after the upstream responds, before bytes are sent downstream. Same mutation surface as the request side.
RESP Response body responseBodyMode
This is the LLM-streaming phase. STREAMED lets you act on SSE chunks as they arrive.
BUFFERED defeats streaming, the client sees nothing until the whole response is collected.
RESP Response trailers responseTrailerMode
gRPC status lives here. Inspect to record success/failure metrics.
Each enabled phase is a separate ProcessingRequest / ProcessingResponse exchange on
the same stream. The server can also tell Envoy "skip the rest" with
processing_mode_override, useful when the request-headers phase already told you nothing else
needs inspection.
SKIP / NONE and
turn on only what you need. A response-body filter that defaults to BUFFERED on a streaming LLM
endpoint will silently break SSE and add seconds of latency before anyone notices. Be explicit.
3 · How it's wired in each product
Both products expose the ExtProc gRPC backend via a traffic policy that attaches to an
HTTPRoute. agentgateway folds everything (backend + scope) into one
AgentgatewayPolicy and ships with sensible streaming defaults; kgateway splits the
wiring across a GatewayExtension + a TrafficPolicy and gives you full
per-phase control via Envoy-native processingMode. Pick the tab for the product you're
on — each has its own knobs (or absence of knobs) to be aware of.
One CRD does the whole wiring. AgentgatewayPolicy (group
agentgateway.dev/v1alpha1, the OSS form) attaches to an HTTPRoute via
targetRefs and points at your ExtProc Service with a plain
backendRef. As of v2026.5.1 the
extProc block accepts backendRef and an optional
conditional[] for CEL-based backend switching. There is no per-phase
opt-in/skip field in this release: the binary streams every phase by default. Setup is
minimal and response-body buffering on streaming endpoints cannot occur as a
misconfiguration. The trade-off is that phases you don't need cannot yet be skipped on
the policy — see the "Coming next release" callout below.
apiVersion: v1
kind: Service
metadata:
name: redact-extproc
namespace: agentgateway-system
spec:
selector: { app: redact-extproc }
ports:
- { port: 4444, targetPort: 18080, appProtocol: kubernetes.io/h2c }
---
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
name: redact
namespace: agentgateway-system
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: llm-openai
traffic:
extProc:
backendRef:
name: redact-extproc
port: 4444
# No processingMode equivalent shipping today; see preview below for the next release.
processingOptions block under
extProc is planned for a future release, providing the agentgateway analogue
of kgateway's processingMode with PascalCase enum values:
traffic:
extProc:
backendRef: { name: redact-extproc, port: 4444 }
processingOptions:
requestHeaderMode: Skip # Send (default) | Skip
responseHeaderMode: Send
requestBodyMode: None # FullDuplexStreamed (default) | Buffered | BufferedPartial | None
responseBodyMode: None # Buffered modes cap at 8 KB
requestTrailerMode: Skip
responseTrailerMode: Skip
allowModeOverride: false # honour mode_override responses from the server
Body modes will default to FullDuplexStreamed, so SSE-streaming LLM responses
keep working without explicit configuration. Check the agentgateway release notes for
availability in your target version before relying on these fields.
Enterprise variant. EnterpriseAgentgatewayPolicy (group
enterpriseagentgateway.solo.io/v1alpha1) wraps the same extProc
shape and adds the conditional[] list at the policy level for CEL-gated
backend switching. Field shape is identical to the OSS form on the same release.
Three resources, in this order: a Service for the ExtProc backend, a
GatewayExtension that describes the upstream + processingMode +
failure mode, and a TrafficPolicy that wraps the extension and gets attached to
specific routes via an ExtensionRef filter. processingMode here is
Envoy-native and uses UPPERCASE enum values (NONE / STREAMED /
BUFFERED / BUFFERED_PARTIAL). The full surface is configurable,
including the response-body buffering behaviour that needs explicit attention on streaming
endpoints — see §6 for the recommended defaults.
1 / 3 Deploy the ExtProc gRPC service
Plain Deployment + Service, appProtocol: kubernetes.io/h2c so kgateway speaks HTTP/2 to it.
apiVersion: v1
kind: Service
metadata:
name: redact-extproc
namespace: kgateway-system
spec:
selector: { app: redact-extproc }
ports:
- port: 4444
targetPort: 18080
protocol: TCP
appProtocol: kubernetes.io/h2c
2 / 3 GatewayExtension
One resource describes the upstream, the processingMode, and the failure mode.
apiVersion: gateway.kgateway.dev/v1alpha1
kind: GatewayExtension
metadata:
name: redact
namespace: kgateway-system
spec:
type: ExtProc
extProc:
grpcService:
backendRef:
name: redact-extproc
port: 4444
# Scope the stream. Default-deny: turn on only what the server needs.
processingMode:
requestHeaderMode: SKIP # UPPERCASE enums (Envoy native)
responseHeaderMode: SEND
responseBodyMode: NONE # NONE | STREAMED | BUFFERED | BUFFERED_PARTIAL
failOpen: true # if ExtProc is down, forward the request unmodified
messageTimeout: 200ms # per-message deadline
3 / 3 EnterpriseKgatewayTrafficPolicy & attach
Wrap the extension in a policy, then reference the policy from an HTTPRoute filter so it scopes to specific routes (not the whole Gateway).
apiVersion: enterprisekgateway.solo.io/v1alpha1
kind: EnterpriseKgatewayTrafficPolicy
metadata:
name: redact
namespace: kgateway-system
spec:
extProc:
extensionRef:
name: redact
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: api
namespace: kgateway-system
spec:
parentRefs:
- name: http
rules:
- matches:
- path: { type: PathPrefix, value: /v1 }
filters:
- type: ExtensionRef
extensionRef:
group: enterprisekgateway.solo.io
kind: EnterpriseKgatewayTrafficPolicy
name: redact
backendRefs:
- name: my-app
port: 80
4 · Sample, a runnable server
Smallest server that does something useful: strip any response header whose name looks like a credential, and surface the redaction as a counter header. Same logic, shown in Python and Go — pick whichever language fits your stack.
Demo What the filter does — before & after
Same upstream, same path. The only difference is whether the ExtProc filter is attached to the route.
# BEFORE — route /v1 with no ExtProc filter attached
$ curl -si http://gateway.local/v1/echo -H 'Host: api.local'
HTTP/1.1 200 OK
content-type: application/json
x-api-key: sk-live-7f9c2a1e4d8b
authorization: Bearer eyJhbGciOiJIUzI1NiIsInR...
x-internal-secret: rotate-me-2026
server: upstream/1.0
{"msg":"hi"}
# AFTER — same route with the redact ExtensionRef filter attached
$ curl -si http://gateway.local/v1/echo -H 'Host: api.local'
HTTP/1.1 200 OK
content-type: application/json
x-redacted-count: 3
server: upstream/1.0
{"msg":"hi"}
The three credential-shaped headers (x-api-key, authorization,
x-internal-secret) are stripped before bytes leave the gateway, replaced with a single
x-redacted-count: 3 so monitoring can alert when an upstream starts leaking secrets.
The body is untouched (responseBodyMode: NONE in the GatewayExtension),
so streaming responses pass through unchanged and no buffering latency is added.
Server extproc.py
import grpc, re
from concurrent import futures
from envoy.service.ext_proc.v3 import (
external_processor_pb2 as pb,
external_processor_pb2_grpc as svc,
)
from envoy.config.core.v3.base_pb2 import HeaderValue, HeaderValueOption
KEY_RE = re.compile(r"(api[-_]?key|token|bearer|secret)", re.I)
class Proc(svc.ExternalProcessorServicer):
def Process(self, request_iterator, ctx):
for req in request_iterator:
resp = pb.ProcessingResponse()
if req.HasField("response_headers"):
rm = resp.response_headers.response.header_mutation
removed = 0
for h in req.response_headers.headers.headers:
if KEY_RE.search(h.key):
rm.remove_headers.append(h.key)
removed += 1
if removed:
rm.set_headers.add(
header=HeaderValue(key="x-redacted-count",
raw_value=str(removed).encode())
)
yield resp
if __name__ == "__main__":
server = grpc.server(futures.ThreadPoolExecutor(max_workers=8))
svc.add_ExternalProcessorServicer_to_server(Proc(), server)
server.add_insecure_port("0.0.0.0:18080")
server.start(); server.wait_for_termination()
Image Dockerfile
Single-stage Alpine build, ~80 MB final.
FROM python:3.12-alpine
RUN pip install --no-cache-dir grpcio envoy-extproc-sdk
COPY extproc.py /app/extproc.py
EXPOSE 18080
CMD ["python", "/app/extproc.py"]
Server extproc.go
Same behaviour, Go flavour. Single static binary, faster cold start than the Python build.
package main
import (
"io"
"log"
"net"
"regexp"
"strconv"
corev3 "github.com/envoyproxy/go-control-plane/envoy/config/core/v3"
extproc "github.com/envoyproxy/go-control-plane/envoy/service/ext_proc/v3"
"google.golang.org/grpc"
)
var keyRE = regexp.MustCompile(`(?i)(api[-_]?key|token|bearer|secret)`)
type server struct {
extproc.UnimplementedExternalProcessorServer
}
func (server) Process(stream extproc.ExternalProcessor_ProcessServer) error {
for {
req, err := stream.Recv()
if err == io.EOF {
return nil
}
if err != nil {
return err
}
resp := &extproc.ProcessingResponse{}
if rh := req.GetResponseHeaders(); rh != nil {
mut := &corev3.HeaderMutation{}
removed := 0
for _, h := range rh.GetHeaders().GetHeaders() {
if keyRE.MatchString(h.GetKey()) {
mut.RemoveHeaders = append(mut.RemoveHeaders, h.GetKey())
removed++
}
}
if removed > 0 {
mut.SetHeaders = append(mut.SetHeaders, &corev3.HeaderValueOption{
Header: &corev3.HeaderValue{
Key: "x-redacted-count",
RawValue: []byte(strconv.Itoa(removed)),
},
})
}
resp.Response = &extproc.ProcessingResponse_ResponseHeaders{
ResponseHeaders: &extproc.HeadersResponse{
Response: &extproc.CommonResponse{HeaderMutation: mut},
},
}
}
if err := stream.Send(resp); err != nil {
return err
}
}
}
func main() {
lis, err := net.Listen("tcp", "0.0.0.0:18080")
if err != nil {
log.Fatal(err)
}
s := grpc.NewServer()
extproc.RegisterExternalProcessorServer(s, server{})
log.Println("extproc listening on :18080")
log.Fatal(s.Serve(lis))
}
Image Dockerfile
Multi-stage build on distroless, ~12 MB final.
FROM golang:1.23-alpine AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /out/extproc .
FROM gcr.io/distroless/static-debian12
COPY --from=build /out/extproc /extproc
EXPOSE 18080
ENTRYPOINT ["/extproc"]
Build, push to any registry your cluster can reach, then point the
Deployment from §3 at the new image. The Service stays the same.
envoy-extproc-sdk (Python) and go-control-plane (Go)?
They're the maintained protobuf bindings for envoy.service.ext_proc.v3, so you avoid wiring
up the full Envoy protoc chain yourself. Plain grpcio / google.golang.org/grpc
serves the stream, the SDK gives you the message types.
5 · Verifying
Apply the chart, then curl through the gateway. With the redact filter attached to /v1,
any response header whose name matches the credential pattern should disappear and a
x-redacted-count header should appear instead.
$ curl -si http://gateway.local/v1/echo \
-H 'Host: api.local'
HTTP/1.1 200 OK
content-type: application/json
x-redacted-count: 2
...
$ kubectl -n kgateway-system logs deploy/redact-extproc
INFO: Process stream opened
INFO: response_headers: removed api-key, bearer-token (2)
INFO: Process stream closed
Compare against the same call on a route without the ExtensionRef filter, the
credential headers come back unchanged. That's the cleanest before/after demo, same upstream, same path
prefix, only the filter chain differs.
6 · The processingMode settings
This is where ExtProc deployments live or die. Wrong defaults on
processingMode waste latency, break streaming, or starve the server. Read this once before
shipping anything.
| Setting | Values | When to change the default |
|---|---|---|
requestHeaderMode | DEFAULT SEND SKIP |
SKIP when the server only inspects responses. Saves one round-trip per request. |
requestBodyMode |
NONE STREAMED BUFFERED BUFFERED_PARTIAL |
STREAMED for large uploads. BUFFERED only if the server needs the whole body
before deciding, and the body is small. BUFFERED_PARTIAL when you want to inspect the
first N bytes (typical for content-type sniffing). |
responseHeaderMode | DEFAULT SEND SKIP |
SEND if you mutate response headers, SKIP otherwise. |
responseBodyMode |
NONE STREAMED BUFFERED BUFFERED_PARTIAL |
LLM endpoints: STREAMED only.
BUFFERED defeats SSE and the client waits for the whole response — the most
common ExtProc misconfiguration on streaming endpoints. |
failOpen | true false |
true for observability filters (don't block traffic on the side-car being down),
false for security filters (PII redaction, prompt-guard, signing). |
messageTimeout | duration | Per-message deadline. Tight (50ms-200ms) for header-only servers, looser (1s+) for body-buffering servers that call out to slow upstreams. |
maxMessageTimeout | duration, default 0s (off) |
Upper bound on per-message timeout overrides the server can set via
override_message_timeout. Enable if you trust the server to extend its own deadline. |
Watch out Response-body buffering on streaming LLMs
OpenAI-shape /v1/chat/completions with "stream": true returns SSE chunks. If
responseBodyMode: BUFFERED is set, Envoy collects every chunk before sending anything to the
client. The user sees a long pause, then the entire response at once. Cursor, Continue, and most LLM
clients will time out.
Fix: set responseBodyMode: STREAMED, write the server to process each chunk independently.
If you genuinely need the whole response before deciding (e.g. a moderation pass that requires final
output), apply the policy only to the non-streaming route or non-streaming model variant.
7 · When to reach for ExtProc vs ExtAuth
The two filters look superficially similar (gRPC sidecar called by Envoy on every request), but they occupy different jobs and different points in the filter chain. Pick wrong and you either pay for capability you don't need, or you build code to do something a stock config already handles.
Is it allow/deny (with optional header injection) based on request attributes only? → ExtAuth, and you can almost certainly do it with stock
AuthConfig plugins, no code.Both apply? → run them both. ExtAuth first, ExtProc second.
How to read it. One request, two filters. ExtAuth runs first — a single
Check() RPC over request attributes. On deny, the gateway short-circuits and the
upstream never sees the call (amber dashed). On allow, ExtAuth optionally injects headers and
hands off to ExtProc, which opens a bidirectional gRPC stream and can read/mutate any enabled
phase on the way out and on the way back. The response only ever goes through ExtProc, never
ExtAuth — that's why response-body redaction and LLM-streaming filters have to be ExtProc.
The capability matrix
| Dimension | ExtAuth | ExtProc |
|---|---|---|
| Position in chain | Before route selection, before the body is read | Through the request/response lifecycle, after route selection |
| Round-trips per request | Exactly 1 (single Check RPC) |
Up to 6 (one per enabled processingMode phase) |
| Sees the request body | No (attributes only — method, path, headers, peer) | Yes, buffered or streamed |
| Sees the response | No | Yes — headers, body, trailers, streamed if you want |
| Can mutate headers | Yes, on the request only (via OkResponse) |
Yes, on every enabled phase, both directions |
| Can mutate the body | No | Yes |
| Can short-circuit the request | Yes (DeniedResponse — that's its whole job) |
Yes (ImmediateResponse at any phase) |
| Ships with out-of-the-box logic | Yes. Solo's ext-auth-service covers OIDC, OAuth2, OPA, API key, basic auth,
LDAP, JWT, passthrough — all driven by AuthConfig YAML, no code |
No. Always a BYO gRPC server (your Python / Go / Rust / whatever). |
| Typical added latency | 1 sidecar hop, usually <5 ms | 1 hop per enabled phase. Cheap if you only enable response-headers, expensive if you stream the body |
| Solo CRD wrapper | AuthConfig + RouteOption / EnterpriseKgatewayTrafficPolicy.spec.extAuth |
GatewayExtension (kgateway) or direct backendRef (agentgateway) + the same traffic-policy CRDs |
Use-case picker
| Need | Use | Why |
|---|---|---|
| Validate a JWT, check an OAuth scope, decide allow/deny | ExtAuth | One call, no body access needed. The AuthConfig CRD already covers this without writing code. |
Inject verified-claim headers (e.g. x-tenant-id) |
ExtAuth | Header injection is the response shape ExtAuth's OkResponse is designed for. See JWT claims to headers. |
| Redact secrets from a response body | ExtProc | Needs body access. ExtAuth can't see the body, only request attributes. |
| Inspect LLM output for PII or prompt-injection markers | ExtProc | Streamed response body. The Solo agentgateway prompt-guard filter is internally an ExtProc-shaped service. |
| Semantic cache lookup keyed off the request body | ExtProc | Need to read the request body, possibly short-circuit with an immediate response from cache. |
| Sign a response, or strip a trailer based on body content | ExtProc | Trailer mutation and body inspection both need the streaming protocol. |
| Route to cluster A vs B based on a request-body field | ExtProc | Routing decisions on request-body content aren't possible in plain HTTPRoute matchers. |
| Aggregate two upstream responses into one | ExtProc | The Solo KB API Aggregation use case. |
| OPA-policy decision over the full request context | ExtAuth+OPA plugin | Solo's ext-auth-service ships an OPA plugin out of the box, no ExtProc needed unless you
also want body mutation. See Solo ext-auth-service. |
x-user-id into a header, ExtProc reads that header in its request-body phase and
decides whether to allow the prompt through, redact it, or reject with an immediate response.
Where to go next
- Solo external auth service, the ExtAuth side of the same story.
- JWT claims to HTTP headers, why ExtAuth fronting ExtProc is the pattern, and how the verified-claim handoff works.
- kgateway ExtProc docs, official reference including the aggregation and dynamic-routing use cases.
- Envoy ExtProc proto, the message-by-message spec.