MastertheMesh
Solo · Cloud Connectivity Lab · Ambient · agentgateway · kind
Lab · Builds on the standup

Cloud Connectivity Lab — failover, waypoint, egress

TO
Tom O'Rourke
EMEA Field CTO · Solo.io

Three labs running on the multicluster Ambient setup you stood up in the standup lab: cross-cluster mesh failover via the global *.mesh.internal hostname, in-cluster L7 routing through an enterprise-agentgateway-waypoint, and egress traffic control via a dedicated waypoint in istio-egress.

cross-cluster failover enterprise-agentgateway-waypoint HTTPRoute egress SPIFFE authz

Each lab below is self-contained — you can run them in order, or jump straight to whichever scenario you want to demo. They share two assumptions:

Labs

LAB 0

Prerequisite — deploy Bookinfo + the agentgateway ingress

The standup lab installs the platform only — no test workloads. This lab's demos all use Bookinfo as the test app, so deploy it on both clusters, label its namespace ambient, mark productpage as a global multicluster service, and apply the bookinfo-gateway Gateway + HTTPRoute on east.

Deploy Bookinfo to both clusters

About — what this does & why

What: Creates a bookinfo namespace on both clusters, enrols it into Ambient (istio.io/dataplane-mode=ambient), labels it with each cluster's network (topology.istio.io/network=east-ag / west-ag), and installs the productpage / details / reviews / ratings sample workloads from the upstream Istio repo.

Why: The standup lab deploys platform only — these are the test workloads everything below operates on. The topology.istio.io/network label is the signal istiod uses to classify pods by cluster; without it, cross-cluster endpoint rewriting (LAB 1) never fires.

for CTX in $CLUSTER1 $CLUSTER2; do
  kubectl --context $CTX create namespace bookinfo 2>/dev/null || true
  kubectl --context $CTX label namespace bookinfo \
    istio.io/dataplane-mode=ambient \
    topology.istio.io/network=${CTX#kind-} --overwrite

  BOOKINFO=https://raw.githubusercontent.com/istio/istio/release-1.24/samples/bookinfo/platform/kube
  kubectl --context $CTX apply -n bookinfo -f $BOOKINFO/bookinfo.yaml
  kubectl --context $CTX apply -n bookinfo -f $BOOKINFO/bookinfo-versions.yaml

  kubectl --context $CTX -n bookinfo wait \
    --for=condition=Ready pod -l app=productpage --timeout=180s
done

Label productpage as a global multicluster service

About — what this does & why

What: Adds the solo.io/service-scope=global label to the productpage Service in both clusters.

Why: This one label tells Solo Istio's peering controller to create a synthetic global hostname productpage.bookinfo.mesh.internal with a synthetic VIP 240.240.0.2 that fronts the Service across all peered clusters. Without it, cross-cluster failover in LAB 1 won't work — the global hostname won't exist. Multicluster also has to be unlocked on istiod via an lt: ent licence in SOLO_LICENSE_KEY; trial JWTs leave the feature gate closed.

for CTX in $CLUSTER1 $CLUSTER2; do
  kubectl --context $CTX label svc productpage -n bookinfo \
    solo.io/service-scope=global --overwrite
done

# Confirm: istioctl multicluster check should now show "1 globally shared service"
istioctl --context $CLUSTER1 multicluster check | grep "Shared Services"

Apply the agentgateway ingress (Gateway + HTTPRoute) on east

About — what this does & why

What: Creates a north-south Gateway using gatewayClassName: enterprise-agentgateway plus an HTTPRoute whose backend is the synthetic global hostname productpage.bookinfo.mesh.internal (via the Solo kind: Hostname, group: networking.istio.io Gateway-API extension). The Solo controller provisions a backing agentgateway Deployment + Service automatically.

Why: Routing to the *.mesh.internal hostname rather than the local Service is what enables cross-cluster failover in LAB 1 — the global hostname's endpoint set includes the remote east-west gateway, so when local productpage scales to zero traffic continues to flow.

kubectl --context $CLUSTER1 apply -f - <<'EOF'
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: bookinfo-gateway
  namespace: bookinfo
spec:
  gatewayClassName: enterprise-agentgateway
  listeners:
  - name: http
    port: 8080
    protocol: HTTP
    allowedRoutes: { namespaces: { from: Same } }
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: productpage
  namespace: bookinfo
spec:
  parentRefs:
  - name: bookinfo-gateway
  rules:
  - backendRefs:
    - kind: Hostname               # Solo Gateway-API extension
      group: networking.istio.io
      name: productpage.bookinfo.mesh.internal
      port: 9080
EOF
Why kind: Hostname instead of kind: Service. Pointing the backendRef at the regular productpage Service routes Envoy to the productpage.bookinfo.svc.cluster.local cluster, which only has local endpoints — cross-cluster failover (LAB 1) won't work. The Solo Gateway-API extension kind: Hostname, group: networking.istio.io routes at the synthetic *.mesh.internal hostname instead. That hostname's endpoint set includes the remote cluster's east-west gateway, so when local scales to zero the ingress transparently fails over.

Wait for the controller to create + ready the backing pod

About — what this does & why

What: First waits for the bookinfo-gateway Gateway to be Programmed (controller has reconciled), then waits for the agentgateway pod backing it to be Ready.

Why: The agentgateway controller has a short reconcile delay between Gateway creation and pod creation. If you kubectl wait --for=condition=Ready pod ... too soon, you get error: no matching resources found because the pod doesn't exist yet. Waiting on the Gateway's Programmed condition first ensures the controller has finished reconciling and the pod is on its way.

# 1. Wait for the Gateway itself to be Programmed (controller has reconciled)
kubectl --context $CLUSTER1 -n bookinfo wait \
  --for=condition=Programmed gateway/bookinfo-gateway --timeout=60s

# 2. Now the backing pod exists — wait for it to be Ready
kubectl --context $CLUSTER1 -n bookinfo wait \
  --for=condition=Ready pod -l gateway.networking.k8s.io/gateway-name=bookinfo-gateway \
  --timeout=120s

Verify — port-forward and curl through the ingress

About — what this does & why

What: Port-forwards the ingress Service to localhost:8080, curls /productpage, expects HTTP 200, then kills the port-forward.

Why: Confirms the end-to-end ingress path works: client → port-forward → agentgateway pod → HTTPRoute → productpage. If this returns anything other than 200, the rest of the labs won't work — productpage isn't reachable from outside the mesh.

kubectl --context $CLUSTER1 -n bookinfo port-forward svc/bookinfo-gateway 8080:8080 &
curl -s -o /dev/null -w "ingress: HTTP %{http_code}\n" http://localhost:8080/productpage
kill %1
# → ingress: HTTP 200
LAB 1

Failover — mesh-layer cross-cluster failover

This step demonstrates Solo Istio's global service failover at the mesh layer. When productpage-v1 is scaled to zero on east, calls to the global hostname productpage.bookinfo.mesh.internal transparently route to west via the east-west HBONE gateway.

Why this isn't tested through the agentgateway ingress. Even with the HTTPRoute correctly pointing at the *.mesh.internal hostname, the agentgateway dataplane (Rust proxy) still NACKs istiod's synthetic cross-cluster WorkloadEntry with "unknown address type", so the route ends up with zero healthy backends and the ingress returns 503. The native istio Gateway (Envoy) on the istio-gw-multi-cluster-kind standup with the same HTTPRoute pattern does fail over correctly — this is an agentgateway-specific dataplane gap, not a Solo Istio control-plane issue. Ingress-initiated failover on agentgateway is on the roadmap (agentgateway PR #1566 was the proxy-side step). The mesh fabric itself always handles this — ztunnel resolves the synthetic VIP (e.g. 240.240.0.x) and tunnels via HBONE to the peer cluster's east-west GW, which is what the in-mesh curl below proves.

Scale productpage down on east

About — what this does & why

What: Sets productpage-v1 to 0 replicas on east-ag and waits for the pod to terminate.

Why: Simulates a cluster-local failure so we can prove the global hostname seamlessly fails over to west. Equivalent to a region outage or a bad deployment.

kubectl --context $CLUSTER1 scale deploy productpage-v1 -n bookinfo --replicas=0
kubectl --context $CLUSTER1 -n bookinfo wait --for=delete pod -l app=productpage --timeout=30s

Confirm there are no local endpoints, but the global VIP is healthy

About — what this does & why

What: Asks ztunnel on east what it knows about productpage. Three rows come back — the synthetic global Service has a healthy remote endpoint (1/1), while the regular local Service has zero endpoints (0/0).

Why: Confirms istiod has rewritten the global Service to point at west's east-west gateway, while the regular Service stays cluster-scoped. This is the proof that the failover plumbing is in place before we actually call it.

istioctl --context $CLUSTER1 ztunnel-config service | grep productpage
# bookinfo  autogen.bookinfo.productpage  240.240.0.2,2001:2::2  None  1/1   ← cross-cluster
# bookinfo  productpage                   10.96.x.x              None  0/0   ← local svc, no eps
# bookinfo  productpage-v1                10.96.x.x              None  0/0

Curl through the mesh using the global hostname

Run a curl pod inside the ambient mesh so its outbound call is intercepted by ztunnel. The synthetic VIP 240.240.0.2 resolves to the global hostname, which istiod has programmed with the remote (west) endpoint.

About — what this does & why

What: Runs a one-shot curl pod inside the bookinfo namespace and calls productpage.bookinfo.mesh.internal. Because the pod is enrolled in Ambient, ztunnel intercepts the outbound call.

Why: This is the actual end-to-end failover test. The synthetic VIP 240.240.0.2 routes through HBONE to west's east-west gateway, then west's ztunnel delivers to a live productpage pod. The result is HTTP 200 from a service that has zero local endpoints — transparent cross-cluster failover with no client-side logic.

kubectl --context $CLUSTER1 -n bookinfo run mesh-curl \
  --rm -i --restart=Never --image=curlimages/curl:8.5.0 -- \
  curl -sS -o /dev/null -w "HTTP %{http_code} via %{remote_ip}\n" \
    http://productpage.bookinfo.mesh.internal:9080/productpage
# → HTTP 200 via 240.240.0.2

Compare — calling the regular short hostname does NOT fail over

This is by design: the regular Service VIP only includes local endpoints. Locality-aware failover only kicks in for the *.mesh.internal global hostname.

About — what this does & why

What: Same curl pod and setup — but calls the regular Kubernetes short hostname productpage:9080.

Why: Shows the contrast. Short-form names resolve to the local Service VIP, which has 0 endpoints, so the call fails with HTTP 000 / connection reset. Failover is opt-in by hostname — clients that want it must use *.mesh.internal, which makes the boundary explicit in code review.

kubectl --context $CLUSTER1 -n bookinfo run mesh-curl \
  --rm -i --restart=Never --image=curlimages/curl:8.5.0 -- \
  curl -sS -o /dev/null -w "HTTP %{http_code} via %{remote_ip}\n" \
    http://productpage:9080/productpage
# → HTTP 000 (connection reset — local VIP 10.96.x.x has 0 endpoints)

Scale productpage back up on east

About — what this does & why

What: Restores productpage-v1 to 1 replica on east-ag and waits for the pod to be Ready.

Why: Returns the cluster to a known-good baseline before LAB 2 starts — subsequent labs assume productpage is running locally.

kubectl --context $CLUSTER1 scale deploy productpage-v1 -n bookinfo --replicas=1
kubectl --context $CLUSTER1 -n bookinfo wait --for=condition=Ready pod -l app=productpage --timeout=60s
LAB 2

L7 policy — enterprise-agentgateway-waypoint

Deploy an AgentGateway waypoint proxy for the bookinfo namespace, then use an HTTPRoute to pin reviews traffic to a specific version. ztunnel transparently routes all bookinfo traffic through the waypoint for L7 processing.

Two keys for enterprise-agentgateway-waypoint to work as an Istio ambient waypoint:
1. Listener must use protocol: HBONE, port: 15008
2. EnterpriseAgentgatewayParameters must set istioClusterId matching istiod's --clusterID flag (sets the CLUSTER_ID env var that istiod uses to validate the pod's JWT).

Create EnterpriseAgentgatewayParameters and waypoint Gateway

About — what this does & why

What: Creates an EnterpriseAgentgatewayParameters CR (provides CLUSTER_ID + trust domain) and a Gateway with gatewayClassName: enterprise-agentgateway-waypoint. The Solo controller deploys an agentgateway pod that acts as an Ambient L7 waypoint.

Why: Ambient's L4 dataplane (ztunnel) handles identity and mTLS but cannot make L7 decisions. To match on HTTP headers, JWT claims, or do header-based routing, traffic has to traverse a waypoint. This is the same agentgateway binary used for the ingress and MCP routing — just in a different role.

kubectl --context $CLUSTER1 apply -f - <<'EOF'
apiVersion: enterpriseagentgateway.solo.io/v1alpha1
kind: EnterpriseAgentgatewayParameters
metadata:
  name: waypoint-params
  namespace: bookinfo
spec:
  istioClusterId: east-ag        # must match istiod --clusterID
  ca:
    trustDomain: cluster.local   # must match mesh trustDomain
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: agw-waypoint
  namespace: bookinfo
spec:
  gatewayClassName: enterprise-agentgateway-waypoint
  listeners:
    - name: proxy
      port: 15008
      protocol: HBONE
  infrastructure:
    parametersRef:
      group: enterpriseagentgateway.solo.io
      kind: EnterpriseAgentgatewayParameters
      name: waypoint-params
EOF

Label the namespace and verify ztunnel sees the waypoint

About — what this does & why

What: Labels the bookinfo namespace with istio.io/use-waypoint=agw-waypoint, then queries ztunnel-config to confirm.

Why: This is the switch that redirects all traffic into bookinfo through the waypoint. Without the label, the waypoint sits idle — ztunnel keeps doing direct L4 delivery. After labelling, every bookinfo Service shows agw-waypoint in ztunnel's view.

kubectl --context $CLUSTER1 label namespace bookinfo \
  istio.io/use-waypoint=agw-waypoint --overwrite

# Wait ~5s, then verify all bookinfo services show "agw-waypoint"
istioctl --context $CLUSTER1 ztunnel-config service | grep bookinfo
# → bookinfo   reviews   10.x.x.x   agw-waypoint   3/3

Apply header-based reviews routing

Sends user jason to reviews-v2 (black stars); everyone else gets reviews-v1 (no stars). The Service parentRef is the Istio Ambient waypoint pattern.

About — what this does & why

What: Applies an HTTPRoute that targets the reviews Service. Requests with header end-user: jason route to reviews-v2 (black stars); everyone else routes to reviews-v1 (no stars).

Why: L7 policy that only the waypoint can enforce — ztunnel alone has no L7 visibility. The parentRefs.kind: Service pattern is the Ambient way to attach an HTTPRoute to a waypoint (vs. the ingress pattern which uses parentRefs.kind: Gateway).

kubectl --context $CLUSTER1 apply -f - <<'EOF'
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: reviews
  namespace: bookinfo
spec:
  parentRefs:
  - group: ""
    kind: Service
    name: reviews
    port: 9080
  rules:
  - matches:
    - headers:
      - name: end-user
        value: jason
    backendRefs:
    - name: reviews-v2
      port: 9080
  - backendRefs:
    - name: reviews-v1
      port: 9080
EOF

Test — default user (no stars, reviews-v1)

About — what this does & why

What: Port-forwards the ingress and curls productpage as an anonymous user (no end-user header). Greps the response for the star glyph that reviews-v2 renders.

Why: Verifies the default route — anonymous traffic falls through to reviews-v1 (no stars). Empty grep output means the header-match rule is working as intended.

kubectl --context $CLUSTER1 -n bookinfo \
  port-forward svc/bookinfo-gateway 8080:8080 &

curl -s http://localhost:8080/productpage | grep "glyphicon-star"
# → no output — locked to reviews-v1 (no ratings)

kill %1

Test — user jason via browser (black stars, reviews-v2)

Open http://localhost:8080/productpage in a browser and log in as jason / jason. productpage sets a session cookie with end-user: jason which the waypoint sees on the internal call to reviews.

Confirm traffic went through the agentgateway waypoint

About — what this does & why

What: Tails the waypoint pod's logs for the last 5 requests.

Why: Proves traffic actually traversed the waypoint (not the prior direct ztunnel path). Each log line shows the listener, the matched route, and the upstream HTTP status — the evidence that L7 policy enforcement happened on the wire.

kubectl --context $CLUSTER1 -n bookinfo \
  logs -l gateway.networking.k8s.io/gateway-name=agw-waypoint \
  --tail=5
# → gateway=bookinfo/agw-waypoint listener=waypoint route=bookinfo/reviews http.status=200

Clean up routing

About — what this does & why

What: Deletes the reviews HTTPRoute, the waypoint Gateway, and removes the istio.io/use-waypoint label from the namespace.

Why: Resets bookinfo to its post-LAB-1 state so LAB 3 starts clean. The in-cluster waypoint from LAB 2 would intercept egress traffic and complicate the egress-waypoint demo if left in place.

kubectl --context $CLUSTER1 delete httproute reviews -n bookinfo
kubectl --context $CLUSTER1 delete gateway agw-waypoint -n bookinfo
kubectl --context $CLUSTER1 label namespace bookinfo \
  istio.io/use-waypoint- --overwrite
LAB 3

Egress gateway — enterprise-agentgateway-waypoint

Controls which workloads can reach external services using an enterprise-agentgateway-waypoint in a dedicated egress namespace.

Create namespace and deploy egress gateway

About — what this does & why

What: Creates an istio-egress namespace and deploys a second agentgateway-waypoint there. Labels the namespace with istio.io/use-waypoint=egress-gateway so ztunnel knows to forward through it.

Why: Egress traffic control wants a dedicated waypoint outside the workload namespace so policy lives separate from the application — the same pattern as a network DMZ. This is the Ambient egress design: one chokepoint, identity-aware, per cluster.

kubectl --context $CLUSTER1 create namespace istio-egress

kubectl --context $CLUSTER1 apply -f - <<'EOF'
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: egress-gateway
  namespace: istio-egress
  labels:
    istio.io/waypoint-for: service
spec:
  gatewayClassName: enterprise-agentgateway-waypoint
  listeners:
  - name: mesh
    port: 15088
    protocol: HTTP
    allowedRoutes:
      namespaces:
        from: All
EOF

kubectl --context $CLUSTER1 label ns istio-egress \
  istio.io/use-waypoint=egress-gateway

Define httpbin.org as an external service and apply AuthorizationPolicy

About — what this does & why

What: Declares httpbin.org as a known external ServiceEntry, routes it through the egress waypoint, terminates TLS at the waypoint (DestinationRule with tls.mode: SIMPLE), then applies an AuthorizationPolicy that allows only the bookinfo-ratings SPIFFE identity to reach it.

Why: This is identity-aware egress. The authz decision is made on the workload's SPIFFE ID (cryptographically verified by ztunnel), not IP addresses or pod labels — so the rule survives pod rescheduling, IP changes, and impersonation attempts. The same pattern customers use to lock down third-party API access in regulated environments.

kubectl --context $CLUSTER1 apply -f - <<'EOF'
apiVersion: networking.istio.io/v1
kind: ServiceEntry
metadata:
  name: httpbin.org
  namespace: bookinfo
  labels:
    istio.io/use-waypoint: egress-gateway
    istio.io/use-waypoint-namespace: istio-egress
spec:
  hosts:
  - httpbin.org
  ports:
  - number: 80
    name: http
    protocol: HTTP
    targetPort: 443
  resolution: DNS
---
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: httpbin.org-tls
  namespace: bookinfo
spec:
  host: httpbin.org
  trafficPolicy:
    tls:
      mode: SIMPLE
---
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: ratings-to-httpbin
  namespace: bookinfo
spec:
  targetRefs:
  - kind: ServiceEntry
    group: networking.istio.io
    name: httpbin.org
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        - "cluster.local/ns/bookinfo/sa/bookinfo-ratings"
EOF

Test — ratings should succeed

About — what this does & why

What: Execs a curl from inside the ratings pod against httpbin.org/get.

Why: Verifies the allow path. The ratings pod runs as the bookinfo-ratings ServiceAccount, ztunnel attaches its SPIFFE identity to the request, the egress waypoint matches the AuthorizationPolicy, and the call is proxied to the real httpbin.org.

kubectl --context $CLUSTER1 exec -it \
  $(kubectl --context $CLUSTER1 get pod -l app=ratings -n bookinfo \
    -o jsonpath='{.items[0].metadata.name}') \
  -n bookinfo -- curl -s httpbin.org/get | head -5

Test — reviews should be blocked

About — what this does & why

What: Same curl from inside the reviews pod. The response should contain RBAC: access denied.

Why: Verifies the deny path. Reviews runs as bookinfo-reviews — a different SPIFFE identity that doesn't match the policy. Ztunnel rejects the request before it ever leaves the cluster, so httpbin.org never sees a packet. This is identity-based egress, not IP allow-listing.

kubectl --context $CLUSTER1 exec -it \
  $(kubectl --context $CLUSTER1 get pod -l app=reviews -n bookinfo \
    -o jsonpath='{.items[0].metadata.name}') \
  -n bookinfo -- curl -sv httpbin.org/get 2>&1 | grep -E "RBAC|403|denied"

Cleanup

None of these labs leave a state that breaks the standup. To wipe everything added by these labs without affecting the standup:

About — what this does & why

What: Deletes everything LAB 2 and LAB 3 added — the in-cluster waypoint Gateway + Parameters CR + HTTPRoute + waypoint label, plus the egress namespace and external ServiceEntry. Bookinfo and the ingress stay.

Why: Returns the cluster to the post-LAB-0 baseline so you can re-run any of the labs from a clean state, or hand the cluster off to the agentic / MCP lab.

# LAB 2 — remove the agw-waypoint and the reviews HTTPRoute
kubectl --context $CLUSTER1 delete httproute reviews -n bookinfo --ignore-not-found
kubectl --context $CLUSTER1 delete gateway agw-waypoint -n bookinfo --ignore-not-found
kubectl --context $CLUSTER1 delete enterpriseagentgatewayparameters waypoint-params -n bookinfo --ignore-not-found
kubectl --context $CLUSTER1 label namespace bookinfo istio.io/use-waypoint- --overwrite

# LAB 3 — remove the egress namespace + ServiceEntry + AuthorizationPolicy
kubectl --context $CLUSTER1 delete namespace istio-egress --ignore-not-found
kubectl --context $CLUSTER1 delete serviceentry httpbin.org -n bookinfo --ignore-not-found
kubectl --context $CLUSTER1 delete destinationrule httpbin.org-tls -n bookinfo --ignore-not-found
kubectl --context $CLUSTER1 delete authorizationpolicy ratings-to-httpbin -n bookinfo --ignore-not-found

# LAB 0 — remove the bookinfo namespace + agentgateway ingress on both clusters
for CTX in $CLUSTER1 $CLUSTER2; do
  kubectl --context $CTX delete namespace bookinfo --ignore-not-found
done

Where to next