MastertheMesh
kagent · EKS · how-to
How-to

Deploying kagent on EKS

TO
Tom O'Rourke
EMEA Field CTO · Solo.io

Solo Enterprise for kagent installs and runs on EKS the same way it does anywhere else, with one setting that EKS makes necessary. The controller validates the service-account token an agent presents on its callbacks, and on EKS those tokens are signed by the cluster's external OIDC provider. That breaks the controller's default local verification, so the fix is to switch it to the Kubernetes TokenReview API. This page is the why and the one-line how.

kagent enterprise EKS service-account token TokenReview OIDC K8S_TOKEN_REVIEW

The short version. On EKS, set K8S_TOKEN_REVIEW: "true" in the kagent-enterprise-config ConfigMap and restart the controller. Everything below is why that setting exists and how to confirm it took.

The symptom

You install Solo Enterprise for kagent on EKS with OIDC and on-behalf-of delegation turned on. The front of the flow looks healthy. A user dispatches a task, the controller validates the user's OIDC token, maps the caller to a role, and mints an on-behalf-of token for the agent. The agent starts running. Then the moment the agent calls back to the controller (to read or update its own task state, for example) the request returns 401 Unauthorized, and there is nothing useful in the logs explaining why.

The agent's callback carries its Kubernetes service-account token, not the OIDC token. So the failure is narrow: the controller accepts OIDC tokens but rejects the agent's service-account token. The token is structurally valid. Its issuer, signing key, and audience are all correct. It still gets rejected. That combination is the EKS signature of this problem.

How the controller validates a service-account token

The kagent controller can authenticate a service-account token one of two ways, and the right choice depends on the cluster.

Verify it locally (JWKS)

The controller fetches the cluster's public signing keys, then checks the token's signature and issuer itself. This is the default, and it is the right mode on a cluster whose API server both signs the projected tokens and serves the matching public keys, which covers most self-managed and local clusters.

Wrong fit on EKS

Ask the API server (TokenReview)

Instead of fetching keys, the controller hands the token to the Kubernetes API server through the TokenReview API and asks whether it is valid for the kagent audience. The API server is the authority that issued the token, so it validates natively, no key fetch involved.

Right fit on EKS

Why local verification is the wrong fit on EKS

EKS does service-account token signing differently from a typical cluster. Each EKS cluster has its own OIDC provider, hosted by AWS at a public URL, and the projected service-account tokens your pods receive are signed by a key published at that provider's JWKS endpoint. The token's issuer points at that AWS-hosted provider, and the public key that verifies it lives there too, on the public internet behind an AWS TLS certificate.

That setup leaves the local-verification path with no good source of keys, for two separate reasons:

So local verification is left choosing between a key set that is unreachable and one that has the wrong key. Neither validates the token, which is why it fails even though the token itself is perfectly good. No JWKS URL, issuer string, or audience override changes that, because the problem is the validation mode, not its parameters.

The fix: TokenReview validation

Switch the controller to TokenReview validation. It then hands each service-account token to the EKS API server and trusts the API server's answer. The API server issued the token, so it validates it natively and sidesteps the whole external-key question. The setting is a single key on the controller's config:

# kagent-enterprise-config ConfigMap, in the controller's namespace
data:
  K8S_TOKEN_REVIEW: "true"

Apply it to the kagent-enterprise-config ConfigMap that the controller reads its validation settings from, then restart the controller so it picks up the change:

kubectl -n kagent patch configmap kagent-enterprise-config \
  --type merge -p '{"data":{"K8S_TOKEN_REVIEW":"true"}}'

kubectl -n kagent rollout restart deploy/kagent-controller

Keep this one in your upgrade runbook: a helm upgrade of the controller re-renders its config, so re-apply the patch and restart after an upgrade to keep TokenReview validation in place.

Two things make this work out of the box, with nothing else to wire up:

This is EKS-specific. On a kind, k3s, or typical self-managed cluster the API server both signs the projected tokens and serves the matching keys, so local verification works and you can leave this setting off. EKS (and any cluster that signs service-account tokens with an external OIDC provider) is where TokenReview earns its place. The trade-off is one TokenReview call to the API server per validation, which is cheap and the documented mode for exactly this situation.

Confirming the callback path is healthy

After the controller restarts, run the same flow that failed. Dispatch a task, let the agent start, and watch the agent's callback to the controller. The request that was returning 401 should now return 200, and the task should progress to completion instead of stalling at the first callback.

If it still returns 401 or 403 after the flip, check that the controller pod actually restarted and re-read the ConfigMap, and that the ConfigMap value is the string "true" and not a boolean. Those are the two things that keep the new mode from taking effect.

Checklist

kagent on EKS, the short version