This page is the platform standup, infrastructure only: two ambient kind clusters peered over HBONE, Solo Enterprise agentgateway installed + registered as a Gateway API class, Gloo UI on top. Inspired by Ram Vennam's ambient-multicluster-workshop but with every Istio gateway type replaced by Solo Enterprise agentgateway.
No test workloads are deployed here. Each dedicated lab at the bottom of the page (cross-cluster connectivity, or agentic / MCP) installs its own workloads (bookinfo, MCP servers, etc). Stand the platform up once, then run any lab on it.
All steps use helm and kubectl inline. Want it automated end-to-end?
Run ./scripts/quick.sh — it does steps 00–13 idempotently. ./scripts/quick.sh teardown cleans up.
What you'll build
Architecture
External traffic External traffic
│ │
▼ ▼
┌─────────────────────────────────────┐ ┌─────────────────────────────────────┐
│ kind-east-ag │ │ kind-west-ag │
│ pods 10.10.0.0/16 │ │ pods 10.20.0.0/16 │
│ svcs 10.96.0.0/16 │ │ svcs 10.97.0.0/16 │
│ lb .100 – .110 │ │ lb .120 – .130 │
│ │ │ │
│ ┌───────────────────────────────┐ │ │ ┌───────────────────────────────┐ │
│ │ enterprise-agentgateway │ │ │ │ enterprise-agentgateway │ │
│ │ north-south ingress :8080 │ │ │ │ north-south ingress :8080 │ │
│ │ class: enterprise-agentgateway│ │ │ │ class: enterprise-agentgateway│ │
│ └─────────────┬─────────────────┘ │ │ └─────────────┬─────────────────┘ │
│ │ HTTPRoute │ │ │ HTTPRoute │
│ ▼ │ │ ▼ │
│ ┌───────────────────────────────┐ │ │ ┌───────────────────────────────┐ │
│ │ enterprise-agentgateway │ │ │ │ enterprise-agentgateway │ │
│ │ L7 waypoint (MCP / egress) │ │ │ │ L7 waypoint (MCP / egress) │ │
│ │ class: ent-agentgateway- │ │ │ │ class: ent-agentgateway- │ │
│ │ waypoint │ │ │ │ waypoint │ │
│ └─────────────┬─────────────────┘ │ │ └─────────────┬─────────────────┘ │
│ │ │ │ │ │
│ ztunnel (DaemonSet, every node) │ │ ztunnel (DaemonSet, every node) │
│ istiod-gloo (control plane) │ │ istiod-gloo (control plane) │
│ │ │ │
│ ┌───────────────────────────────┐ │ HBONE │ ┌───────────────────────────────┐ │
│ │ istio east-west gateway │ │ :15008 │ │ istio east-west gateway │ │
│ │ class: istio-eastwest │◄─┼───────────────┼─►│ class: istio-eastwest │ │
│ │ ztunnel-backed HBONE fabric │ │ cross- │ │ ztunnel-backed HBONE fabric │ │
│ └───────────────────────────────┘ │ cluster │ └───────────────────────────────┘ │
└─────────────────────────────────────┘ └─────────────────────────────────────┘
└──── Docker bridge ────┘
Cross-cluster traffic can flow via either the
istio-eastwest
gateway or the agentgateway — pick whichever fits the workload. The diagram
above shows the istio east-west variant; swap in agentgateway as the east-west path for AI
/ agent traffic where its L7 awareness adds value.
Gateway substitution map
| Workshop section | Original gatewayClassName | → | This guide |
|---|---|---|---|
| North-south ingress (any HTTP / MCP workload) | istio |
→ | enterprise-agentgateway |
| L7 waypoint (reviews routing) | istio-waypoint |
→ | enterprise-agentgateway-waypoint |
| Egress gateway | istio-waypoint |
→ | enterprise-agentgateway-waypoint |
| HBONE east-west (mesh fabric) | istio-eastwest (via peering chart) |
→ | unchanged — Istio still owns the HBONE fabric |
IP layout
| Cluster | Pod CIDR | Service CIDR | MetalLB pool |
|---|---|---|---|
| east-ag ($CLUSTER1) | 10.10.0.0/16 |
10.96.0.0/16 |
<base>.255.100 – 110 |
| west-ag ($CLUSTER2) | 10.20.0.0/16 |
10.97.0.0/16 |
<base>.255.120 – 130 |
Prerequisites
./scripts/install-prereqs.sh to audit what's missing, or
./scripts/install-prereqs.sh --install to grab everything
via Homebrew (macOS — installs Homebrew itself if needed). Solo
istioctl is fetched into ~/.istioctl/bin by the
same --install pass.
Install Solo istioctl
Upstream istioctl (homebrew, istio.io) is missing the multicluster and bootstrap subcommands the labs use. The shared installer downloads the Solo build into ~/.istioctl/bin — it has a public REPO_KEY baked in as the default so you don't need to set anything.
About — what this does & why
What: Auto-detects your host OS + arch, downloads the Solo istioctl tarball, validates it's a real gzip archive (so a wrong key fails fast with a clear error instead of a confusing tar: Unrecognized archive format), and extracts the binary into ~/.istioctl/bin.
Why: Multi-cluster lab steps call istioctl multicluster check, multicluster expose, and bootstrap — none of which exist in the upstream istioctl. Run the script once at the repo root and you're set for every Solo lab on this machine.
# From the solo-demos repo root — REPO_KEY defaults to the public e6283d67ad60.
# install-prereqs.sh --install installs the Solo istioctl into ~/.istioctl/bin
# alongside the rest of the brew-managed prereqs.
./scripts/install-prereqs.sh --install
# Or pin a specific Istio version (default 1.29.2):
ISTIO_VERSION=1.29.3 ./scripts/install-prereqs.sh --install
# Then add the binary to your PATH (zsh: ~/.zshrc, bash: ~/.bashrc):
export PATH="$HOME/.istioctl/bin:$PATH"
# Verify
istioctl version --short
istioctl multicluster check --help
Steps
Clone this repo and set environment variables
Every command below runs from the repo root — clone first, then export the env vars in every terminal tab you use:
About — what this does & why
What: Clones the tjorourke/solo repo and changes into the agentgw-multi-cluster-kind subdirectory.
Why: Every ./scripts/*.sh path in this guide is relative to that subdirectory, and the kind configs in kind/ are resolved relative to $PWD. Running the steps from anywhere else makes the helper scripts fail with "no such file".
git clone git@github.com:tjorourke/solo.git
cd solo/agentgw-multi-cluster-kind
About — what this does & why
What: Exports the cluster names, Solo licence JWTs, and version pins consumed by every helm install and kubectl command below.
Why: Hardcoding these in each step would couple the guide to one user's environment. The -ag suffix on cluster names lets this demo coexist with the istio-gw demo on the same Docker host. SOLO_ISTIO_LICENSE_KEY must carry "lt": "ent" for the multicluster feature gate to unlock — trial JWTs (lt: trial) won't.
# Cluster names — suffixed -ag so this demo can run alongside the istio-gw demo
export CLUSTER1=kind-east-ag
export CLUSTER2=kind-west-ag
# Solo enterprise licences (required) — request a trial from
# https://www.solo.io/free-trial/ if you don't have keys.
export SOLO_ISTIO_LICENSE_KEY="eyJ..."
export AGENTGATEWAY_LICENSE_KEY="eyJ..."
# Optional — only needed if you run STEP 08 (Gloo UI management plane).
export GLOO_MESH_LICENSE_KEY="eyJ..."
# Or point at a sourceable file (any shell script that exports both keys)
# SECRETS_FILE=/path/to/secrets.sh # used by ./scripts/quick.sh
# Version pins
export ISTIO_VERSION=1.29.2-solo
export ISTIO_VERSION_OPERATOR="${ISTIO_VERSION%-solo}" # strips "-solo" for the operator
export GLOO_OPERATOR_VERSION=0.5.2
export GATEWAY_API_VERSION=v1.4.0
export AGW_VERSION=v2.3.3 # chart tag — note the 'v' prefix (chart at v2.2+ uses it; appVersion/image tag is plain 2.3.3)
./scripts/quick.sh instead of walking the steps by hand? It exits early
with a clear error if the two licence env vars (or SECRETS_FILE) aren't set —
you won't tear down half a cluster before noticing.
Create east-ag + west-ag kind clusters
The script reads $CLUSTER1 and $CLUSTER2, strips the kind- prefix to get the cluster name, and looks up the matching config in kind/. The -ag suffix means this demo can run alongside the istio-gw demo without name collisions.
About — what this does & why
What: Creates two kind clusters (east-ag and west-ag) using the per-cluster kind configs under kind/ — each pins its own pod CIDR, service CIDR, and disables the default CNI so Istio Ambient can install cleanly.
Why: Two clusters are the minimum for a meaningful multicluster demo, and disjoint pod / service CIDRs are required for HBONE peering — overlapping CIDRs would cause istiod to drop remote endpoints when it rewrites them.
./scripts/01-clusters.sh
Expected output:
About — what this does & why
What: Shows the success output you should see — both clusters Ready and both contexts (kind-east-ag, kind-west-ag) registered in your kubeconfig.
Why: If you don't see both Ready lines, stop and fix it now. Every subsequent step assumes both $CLUSTER1 and $CLUSTER2 contexts are usable.
✓ [east-ag] ready
✓ [west-ag] ready
Contexts available:
* kind-east-ag ← $CLUSTER1
kind-west-ag ← $CLUSTER2
Install MetalLB — LoadBalancer IPs on kind
Replaces the cloud LB. Detects the Docker kind network IPv4 CIDR at runtime and assigns non-overlapping pools. Pools are .100–.110 (east-ag) and .120–.130 (west-ag) — no overlap with the istio-gw demo's .200–.230 range.
About — what this does & why
What: Installs MetalLB on both clusters and configures per-cluster IPAddressPools drawn from the Docker kind bridge network.
Why: Without a LoadBalancer implementation, type: LoadBalancer Services on kind sit in Pending forever — the east-west HBONE gateway later in the standup is exposed that way, so peers couldn't reach each other. cloud-provider-kind is the documented alternative but has a macOS bug where it never writes the assigned IP back to the Service status; MetalLB works on both macOS and Linux.
./scripts/02-metallb.sh
Expected output:
About — what this does & why
What: The base Docker network (172.22.0.0/16 here, varies by host), both MetalLB controllers reporting Ready, and the per-cluster pool boundaries.
Why: Verifies the runtime network detection picked sane values. If the base CIDR clashes with another local demo, edit the pool ranges before continuing.
docker kind network: 172.22.0.0/16 (base: 172.22)
✓ [east-ag] MetalLB controller ready
✓ [west-ag] MetalLB controller ready
✓ MetalLB pools configured (east-ag .100-.110 / west-ag .120-.130)
cloud-provider-kind has a macOS bug where it fails to write the assigned IP back to status.loadBalancer.ingress. MetalLB works reliably on both macOS and Linux.
Configure trust — shared root CA
Both clusters share a root CA. Per-cluster intermediates are signed from it. The trust domain is locked to cluster.local on both clusters — the enterprise-agentgateway-waypoint binary hardcodes this and it must match.
Generate root CA
About — what this does & why
What: Generates a 4096-bit RSA root key and a 10-year self-signed root certificate, stored under certs/.
Why: Both clusters must chain to the same root for cross-cluster mTLS to validate. The HBONE handshake on port 15008 between ztunnels is authenticated mTLS — if the certificate chains don't share a root, every cross-cluster call fails with a TLS error.
mkdir -p certs
openssl genrsa -out certs/root-ca.key 4096
openssl req -new -x509 -days 3650 \
-key certs/root-ca.key \
-subj "/O=Solo Demo/CN=Shared Root CA" \
-out certs/root-ca.crt
Generate per-cluster intermediates
The SAN spiffe://cluster.local/ns/istio-system/sa/citadel is required — Solo Istio's cross-cluster cert-chain validation checks it.
About — what this does & why
What: For each cluster, generates a per-cluster intermediate CA key, CSR, certificate (signed by the root from the previous step with the required SPIFFE SAN), and concatenates the intermediate + root into a cert-chain.pem.
Why: Per-cluster intermediates give each cluster its own signing identity (the keys differ) while sharing a root of trust. The SAN must be spiffe://cluster.local/... because the enterprise-agentgateway-waypoint binary hardcodes TRUST_DOMAIN=cluster.local and Solo Istio's peering cross-cluster validator checks the intermediate's SAN against the runtime trust domain — a mismatch silently breaks cross-cluster identity.
for CTX in $CLUSTER1 $CLUSTER2; do
N="${CTX#kind-}"
openssl genrsa -out certs/${N}-ca.key 4096
openssl req -new \
-key certs/${N}-ca.key \
-subj "/O=Solo Demo/CN=${N} Intermediate CA" \
-out certs/${N}-ca.csr
openssl x509 -req -days 3650 \
-in certs/${N}-ca.csr \
-CA certs/root-ca.crt \
-CAkey certs/root-ca.key \
-CAcreateserial \
-extfile <(printf "subjectAltName=URI:spiffe://cluster.local/ns/istio-system/sa/citadel\nbasicConstraints=CA:TRUE\nkeyUsage=keyCertSign,cRLSign") \
-out certs/${N}-ca.crt
# cert-chain = intermediate + root
cat certs/${N}-ca.crt certs/root-ca.crt \
> certs/${N}-cert-chain.pem
done
Apply cacerts secrets
About — what this does & why
What: Creates istio-system on both clusters and uploads the per-cluster intermediate as a Secret named cacerts with the four keys istiod looks for: ca-cert.pem, ca-key.pem, root-cert.pem, cert-chain.pem.
Why: istiod looks for this exact Secret name and key set at startup and uses it instead of generating its own self-signed CA. The Secret must exist before istiod boots — otherwise istiod creates self-signed certs that won't validate against the other cluster.
for CTX in $CLUSTER1 $CLUSTER2; do
N="${CTX#kind-}"
kubectl --context $CTX create namespace istio-system --dry-run=client -o yaml \
| kubectl --context $CTX apply -f -
kubectl --context $CTX -n istio-system create secret generic cacerts \
--from-file=ca-cert.pem=certs/${N}-ca.crt \
--from-file=ca-key.pem=certs/${N}-ca.key \
--from-file=root-cert.pem=certs/root-ca.crt \
--from-file=cert-chain.pem=certs/${N}-cert-chain.pem \
--dry-run=client -o yaml | kubectl --context $CTX apply -f -
done
Install Gateway API CRDs
About — what this does & why
What: Applies the upstream Kubernetes Gateway API standard channel CRDs (Gateway, HTTPRoute, GatewayClass, etc.) on both clusters.
Why: Solo Enterprise agentgateway implements the Gateway API rather than the older Istio Gateway CRD, so these CRDs must be present before the agentgateway controller starts — otherwise its watches fail and it never registers its GatewayClasses.
for CTX in $CLUSTER1 $CLUSTER2; do
kubectl --context $CTX apply -f \
https://github.com/kubernetes-sigs/gateway-api/releases/download/${GATEWAY_API_VERSION}/standard-install.yaml
done
Install Solo Istio Ambient — Gloo Operator + ServiceMeshController
The Gloo Operator manages istiod + ztunnel + CNI as a single lifecycle. You declare what you want via a ServiceMeshController CR; the operator reconciles it.
Install Gloo Operator on both clusters
About — what this does & why
What: Installs the Gloo Operator chart into gloo-system on both clusters.
Why: The operator is the controller that watches ServiceMeshController CRs and reconciles istiod + ztunnel + CNI as a single lifecycle. Solo's documented path for installing Solo Istio is to declare intent in a ServiceMeshController rather than running raw helm installs for each Istio component — the operator handles upgrade and drift correction.
for CTX in $CLUSTER1 $CLUSTER2; do
helm upgrade --install gloo-operator \
oci://us-docker.pkg.dev/solo-public/gloo-operator-helm/gloo-operator \
--kube-context $CTX \
--namespace gloo-system --create-namespace \
--version $GLOO_OPERATOR_VERSION \
--wait
done
Create the Solo Istio licence secret
Must live in istio-system — istiod-gloo reads it via secretKeyRef, which only resolves from the pod's own namespace. The licence JWT itself must have "lt": "ent" for MultiCluster to unlock; trial JWTs (lt: trial, product: gloo-trial) won't satisfy the feature gate even when the secret is wired correctly.
About — what this does & why
What: Stores the Solo Istio enterprise licence JWT as a Secret called solo-istio-license in istio-system on both clusters.
Why: A later step adds a SOLO_LICENSE_KEY env var on the istiod-gloo Deployment that reads from this Secret. pilot-discovery reads only that exact env name — not LICENSE_KEY / GLOO_LICENSE_KEY, and not the volume mount path that older docs reference (confirmed by running strings against the binary).
for CTX in $CLUSTER1 $CLUSTER2; do
kubectl --context $CTX -n istio-system create secret generic solo-istio-license \
--from-literal=license="${SOLO_ISTIO_LICENSE_KEY}" \
--dry-run=client -o yaml | kubectl --context $CTX apply -f -
done
Apply ServiceMeshController — east
About — what this does & why
What: Creates a ServiceMeshController CR on east declaring Ambient dataplane, the Demo scaling profile, the cluster's identity (cluster, network), and the locked-in trustDomain: cluster.local.
Why: This is the single declarative knob that drives the operator to install istiod-gloo, ztunnel, and the Istio CNI for east. cluster and network are baked into ztunnel's identity and become the routing keys istiod uses for cross-cluster endpoint rewriting. trustDomain stays at cluster.local on both clusters because the waypoint binary doesn't take a trust-domain knob.
kubectl --context $CLUSTER1 apply -f - <<EOF
apiVersion: operator.gloo.solo.io/v1
kind: ServiceMeshController
metadata:
name: managed-istio
namespace: gloo-system
spec:
cluster: ${CLUSTER1#kind-}
network: ${CLUSTER1#kind-}
trustDomain: cluster.local
version: "${ISTIO_VERSION_OPERATOR}"
dataplaneMode: Ambient
distribution: Standard
scalingProfile: Demo
EOF
Apply ServiceMeshController — west
About — what this does & why
What: Same CR shape on west, with cluster and network set to west-ag.
Why: Per-cluster cluster / network values are what let istiod distinguish "this cluster" from "the remote peer" when it rewrites endpoints — both clusters can share the trust domain, but their clusterID must differ.
kubectl --context $CLUSTER2 apply -f - <<EOF
apiVersion: operator.gloo.solo.io/v1
kind: ServiceMeshController
metadata:
name: managed-istio
namespace: gloo-system
spec:
cluster: ${CLUSTER2#kind-}
network: ${CLUSTER2#kind-}
trustDomain: cluster.local
version: "${ISTIO_VERSION_OPERATOR}"
dataplaneMode: Ambient
distribution: Standard
scalingProfile: Demo
EOF
Wait for istiod to be ready
About — what this does & why
What: Blocks until the operator has reconciled the SMC and the istiod-gloo Deployment is Available on both clusters.
Why: The next step patches the Deployment — patching it before it exists fails. Five minutes is generous; first-time installs on a cold kind node usually finish under two.
for CTX in $CLUSTER1 $CLUSTER2; do
kubectl --context $CTX -n istio-system wait \
--for=condition=Available deployment/istiod-gloo \
--timeout=300s
done
Patch mesh components — required env vars for Ambient peering
The Solo Istio Ambient multicluster manual install guide sets these via platforms.peering.enabled: true in the istiod and ztunnel Helm values, which wires them automatically. The Gloo Operator 0.5.2 SMC schema doesn't expose that flag yet, so we apply the three required vars directly:
PILOT_ENABLE_K8S_SELECT_WORKLOAD_ENTRIES=falseon istiod — enables Ambient endpoint rewriting (cross-cluster endpoints resolve via east-west gateway, not K8s WorkloadEntries)SOLO_LICENSE_KEYon istiod — sourced from thesolo-istio-licenseSecret.pilot-discoveryonly reads this exact env var (not a mount, notLICENSE_KEY/GLOO_LICENSE_KEY) — without it the licence stays unread and the multicluster feature gate stays closedL7_ENABLED=trueon ztunnel — enables L7-aware HBONE tunnelling through waypoints
About — what this does & why
What: JSON-patches the istiod-gloo Deployment to add two env vars (PILOT_ENABLE_K8S_SELECT_WORKLOAD_ENTRIES=false and SOLO_LICENSE_KEY sourced from the licence Secret), patches the ztunnel DaemonSet to add L7_ENABLED=true, then waits for both rollouts.
Why: The Gloo Operator 0.5.2 ServiceMeshController schema doesn't yet expose platforms.peering.enabled — the upstream Solo Istio chart sets all three of these via that one flag. Patching them on directly is the documented workaround until the operator catches up. Without them, ztunnel can't carry waypoint L7 traffic across the HBONE fabric, istiod won't rewrite remote endpoints, and the multicluster licence stays locked.
# istiod — disable K8s WorkloadEntry selection AND wire the SOLO_LICENSE_KEY env
for CTX in $CLUSTER1 $CLUSTER2; do
kubectl --context $CTX -n istio-system patch deployment istiod-gloo \
--type=json -p='[
{"op":"add","path":"/spec/template/spec/containers/0/env/-",
"value":{"name":"PILOT_ENABLE_K8S_SELECT_WORKLOAD_ENTRIES","value":"false"}},
{"op":"add","path":"/spec/template/spec/containers/0/env/-",
"value":{"name":"SOLO_LICENSE_KEY",
"valueFrom":{"secretKeyRef":{"name":"solo-istio-license","key":"license"}}}}
]'
done
# ztunnel — enable L7-aware HBONE (required for waypoint traffic across clusters)
for CTX in $CLUSTER1 $CLUSTER2; do
kubectl --context $CTX -n istio-system patch daemonset ztunnel \
--type=json -p='[
{"op":"add","path":"/spec/template/spec/containers/0/env/-",
"value":{"name":"L7_ENABLED","value":"true"}}
]'
done
# Wait for rollouts
for CTX in $CLUSTER1 $CLUSTER2; do
kubectl --context $CTX -n istio-system rollout status deployment/istiod-gloo --timeout=120s
kubectl --context $CTX -n istio-system rollout status daemonset/ztunnel --timeout=120s
done
Verify the licence has unlocked multicluster — istiod should now register the global hostname for any Service labelled solo.io/service-scope=global:
About — what this does & why
What: Tails istiod's logs for any licence / feature-gate complaints, then (after a global Service exists later) confirms the "Shared Services" line in istioctl multicluster check.
Why: A wrong licence (trial JWT, wrong env var name, missing Secret) doesn't crash istiod — it just silently leaves the multicluster feature gate closed. The only signal is in the logs and in the absence of globally-shared-service registrations.
kubectl --context $CLUSTER1 -n istio-system logs deploy/istiod-gloo --tail=200 \
| grep -iE "licens|enterpris|multi.?cluster"
# expect: no "invalid license" / "feature locked" errors
# Once a global Service exists (productpage in the cloud-connectivity lab):
istioctl --context $CLUSTER1 multicluster check | grep "Shared Services"
# expect: ✅ Shared Services Check: ... globally shared services found
Create istiod alias Service
The enterprise-agentgateway-waypoint binary hardcodes CA_ADDRESS=istiod.istio-system.svc:15012 but Gloo Operator names the deployment istiod-gloo.
About — what this does & why
What: Creates a second Service called istiod in istio-system with the same selector and ports as istiod-gloo, on both clusters.
Why: The agentgateway-waypoint binary hardcodes the istiod hostname it uses for the XDS / CA channel. Gloo Operator names its istiod Service istiod-gloo, so the waypoint can't find it without this alias. This is codified in the install script with a rationale comment — when the chart fixes this hardcode the alias can come out.
for CTX in $CLUSTER1 $CLUSTER2; do
kubectl --context $CTX apply -f - <<'EOF'
apiVersion: v1
kind: Service
metadata:
name: istiod
namespace: istio-system
spec:
selector:
app: istiod
ports:
- name: grpc-xds
port: 15010
- name: https-dns
port: 15012
- name: https-webhook
port: 443
targetPort: 15017
EOF
done
Peer clusters — east-west HBONE gateways
This installs the Istio east-west gateway on each cluster (the HBONE mesh fabric). The enterprise agentgateway waypoint sits on top of this at L7 — it does not replace it.
type: LoadBalancer — MetalLB assigns an external IP from the kind Docker network range. This mirrors a real-world cloud deployment (cloud LB in front of the east-west GW). Remote peers reference the LB IP on HBONE port 15008.
Label istio-system with the cluster network
Without this label istiod can't classify remote pod networks and endpoint rewriting never fires.
About — what this does & why
What: Labels istio-system on each cluster with topology.istio.io/network=<cluster-network>.
Why: istiod uses this label as the signal that "any pod in this namespace belongs to network X". Every workload namespace also needs the same label (handled later for agentgateway-system and for bookinfo in the connectivity lab). Without it, istiod can't classify the network of a remote pod when its peer pushes endpoints — cross-cluster endpoint rewriting never fires and global services resolve to a VIP with zero endpoints.
kubectl --context $CLUSTER1 label ns istio-system topology.istio.io/network=${CLUSTER1#kind-} --overwrite
kubectl --context $CLUSTER2 label ns istio-system topology.istio.io/network=${CLUSTER2#kind-} --overwrite
Install east-west gateway — both clusters
About — what this does & why
What: Creates an istio-eastwest namespace and installs the Solo Istio peering chart (eastwest.create=true, remote.create=false) on each cluster, which provisions a Gateway of class istio-eastwest backed by a ztunnel-based east-west GW Service of type: LoadBalancer.
Why: The east-west gateway is the HBONE fabric that carries pod-to-pod traffic between clusters. It listens on port 15008 (HBONE) and 15012 (XDS) and is the address remote peers will point at. Splitting eastwest.create and remote.create across calls is intentional — each cluster needs its own EW GW first so it has a stable LB IP before any peer can reference it.
for CTX in $CLUSTER1 $CLUSTER2; do
N="${CTX#kind-}"
kubectl --context $CTX create namespace istio-eastwest --dry-run=client -o yaml \
| kubectl --context $CTX apply -f -
helm upgrade --install peering-eastwest \
oci://us-docker.pkg.dev/soloio-img/istio-helm/peering \
--kube-context $CTX \
--namespace istio-eastwest \
--version $ISTIO_VERSION \
--set eastwest.create=true \
--set eastwest.cluster=$N \
--set eastwest.network=$N \
--set remote.create=false \
--wait
done
Wait for MetalLB to assign LB IPs
About — what this does & why
What: Reads the assigned external IP off each cluster's istio-eastwest Service into shell variables.
Why: The next two helm installs need the remote peer's LB IP as a literal value — Solo Istio peering doesn't auto-discover it. If either variable is empty, MetalLB hasn't allocated yet (or the Service was created as ClusterIP) and the subsequent installs would register the wrong address.
EAST_EW_IP=$(kubectl --context $CLUSTER1 -n istio-eastwest \
get svc istio-eastwest -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
WEST_EW_IP=$(kubectl --context $CLUSTER2 -n istio-eastwest \
get svc istio-eastwest -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo "east-ag east-west GW: $EAST_EW_IP west-ag east-west GW: $WEST_EW_IP"
# → east-ag east-west GW: 172.22.x.x west-ag east-west GW: 172.22.x.x
Add remote peer reference — east knows about west
About — what this does & why
What: Re-runs the peering chart on east with remote.create=true, pointing the remote.items[0] entry at west's east-west GW LB IP, trust domain cluster.local, HBONE port 15008, and XDS port 15012.
Why: This is the data-plane half of cross-cluster wiring — it creates a remote Gateway CR on east that tells ztunnel "to reach network=west-ag, HBONE-tunnel through this address". The control-plane half (remote secrets a couple of steps below) is separate, and both halves are required.
helm upgrade --install remote-peers \
oci://us-docker.pkg.dev/soloio-img/istio-helm/peering \
--kube-context $CLUSTER1 \
--namespace istio-eastwest \
--version $ISTIO_VERSION \
--set eastwest.create=false \
--set remote.create=true \
--set "remote.items[0].cluster=${CLUSTER2#kind-}" \
--set "remote.items[0].network=${CLUSTER2#kind-}" \
--set 'remote.items[0].trustDomain=cluster.local' \
--set "remote.items[0].address=${WEST_EW_IP}" \
--set 'remote.items[0].hbonePort=15008' \
--set 'remote.items[0].xdsPort=15012'
Add remote peer reference — west knows about east
About — what this does & why
What: Mirror of the previous install, run on west, pointing at east's EW GW IP.
Why: Peering is symmetric — each cluster needs to know the other one's east-west address, otherwise traffic only flows one way.
helm upgrade --install remote-peers \
oci://us-docker.pkg.dev/soloio-img/istio-helm/peering \
--kube-context $CLUSTER2 \
--namespace istio-eastwest \
--version $ISTIO_VERSION \
--set eastwest.create=false \
--set remote.create=true \
--set "remote.items[0].cluster=${CLUSTER1#kind-}" \
--set "remote.items[0].network=${CLUSTER1#kind-}" \
--set 'remote.items[0].trustDomain=cluster.local' \
--set "remote.items[0].address=${EAST_EW_IP}" \
--set 'remote.items[0].hbonePort=15008' \
--set 'remote.items[0].xdsPort=15012'
Cross-apply remote secrets (control-plane discovery)
Without these, istiod's "Number of remote clusters" stays at 0 and cross-cluster services never get endpoints.
About — what this does & why
What: Uses istioctl create-remote-secret to mint a kubeconfig Secret bound to istio-reader-service-account on each cluster and applies it to the peer (east's onto west and vice-versa).
Why: The remote Gateway CRs from the previous two steps wire the data plane; this wires the control plane. Each istiod reads the peer's kubeconfig from istio-remote-secret-<cluster> and watches the peer's Services / EndpointSlices / Pods over the Kubernetes API, then rewrites its own endpoints. Without these Secrets, istiod's "Number of remote clusters" stays at 0 — services are visible locally but never get cross-cluster endpoints.
# East's credentials onto west
istioctl create-remote-secret \
--context $CLUSTER1 \
--name ${CLUSTER1#kind-} | \
kubectl --context $CLUSTER2 apply -f -
# West's credentials onto east
istioctl create-remote-secret \
--context $CLUSTER2 \
--name ${CLUSTER2#kind-} | \
kubectl --context $CLUSTER1 apply -f -
Verify peering
About — what this does & why
What: Runs Solo Istio's purpose-built multicluster check diagnostic which inspects istiod, ztunnel, the east-west gateway, the peering registrations, and the certificate roots.
Why: This is the single command that confirms every piece of peering is healthy. Each green tick maps to one prerequisite you set up earlier — if a check fails, the message tells you exactly which step to revisit.
istioctl --context $CLUSTER1 multicluster check
Expected output:
About — what this does & why
What: The full happy-path output you should see — all green ticks except "Shared Services" (informational, becomes a tick after a global Service is created in a downstream lab) and "Intermediate Certs Compatibility" (informational).
Why: The Peers Check ✅ is the load-bearing one — it confirms the HBONE handshake works and both istiods see each other. Failures here are the difference between "platform usable" and "subsequent labs will hang".
✅ Incompatible Environment Variable Check: all relevant environment variables are valid
✅ License Check: license is valid for multicluster
✅ Pod Check (istiod): all pods healthy
✅ Pod Check (ztunnel): all pods healthy
✅ Pod Check (eastwest gateway): all pods healthy
✅ Gateway Check: all eastwest gateways programmed
✅ istio-eastwest/istio-eastwest available at <MetalLB IP>
✅ Peers Check: all clusters connected
✅ Connected to <peer cluster> via <peer LB IP>
ℹ️ Shared Services Check: no globally shared services found (1 after Step 09)
ℹ️ Intermediate Certs Compatibility Check: root certificate SHA256 sum: <hash>
✅ Network Configuration Check: all network configurations are valid
⚠️ License Check: found invalid license for multicluster, your SOLO_ISTIO_LICENSE_KEY covers Solo Istio but not the full multicluster entitlement (GlobalService/Segment CRDs). Basic HBONE peering and locality-aware failover still work — the Peers Check ✅ is what matters.
Install Solo Enterprise agentgateway
Two helm installs per cluster: CRDs first, then the agentgateway (controller + Rust agentgateway proxy). The chart registers both enterprise-agentgateway (ingress) and enterprise-agentgateway-waypoint (L7 / egress) GatewayClasses.
Install CRDs
About — what this does & why
What: Installs the agentgateway CRDs chart into agentgateway-system on both clusters.
Why: The agentgateway controller installed in the next step watches these CRDs (AgentgatewayPolicy, agent / MCP routing types, etc.). Installing CRDs and the controller as separate helm releases avoids "race-on-CRD-creation" startup failures.
for CTX in $CLUSTER1 $CLUSTER2; do
helm upgrade --install agentgateway-crds \
oci://us-docker.pkg.dev/solo-public/enterprise-agentgateway/charts/enterprise-agentgateway-crds \
--kube-context $CTX \
--namespace agentgateway-system --create-namespace \
--version $AGW_VERSION \
--wait
done
Install Enterprise agentgateway
About — what this does & why
What: Installs the Solo Enterprise agentgateway chart on both clusters, passing the enterprise licence key as a Helm value.
Why: This installs the agentgateway controller and registers two GatewayClasses: enterprise-agentgateway (north-south ingress proxy) and enterprise-agentgateway-waypoint (Istio Ambient L7 waypoint and egress). Both classes share a single controller binary — the chart wires up both modes.
helm upgrade --install enterprise-agentgateway \
oci://us-docker.pkg.dev/solo-public/enterprise-agentgateway/charts/enterprise-agentgateway \
--kube-context $CLUSTER1 \
--namespace agentgateway-system \
--version $AGW_VERSION \
--set licensing.licenseKey="${AGENTGATEWAY_LICENSE_KEY}" \
--wait
helm upgrade --install enterprise-agentgateway \
oci://us-docker.pkg.dev/solo-public/enterprise-agentgateway/charts/enterprise-agentgateway \
--kube-context $CLUSTER2 \
--namespace agentgateway-system \
--version $AGW_VERSION \
--set licensing.licenseKey="${AGENTGATEWAY_LICENSE_KEY}" \
--wait
Label agentgateway-system for Ambient + network topology
Every workload namespace — including agentgateway-system — needs these two labels:
istio.io/dataplane-mode=ambient— enrolls all pods in this namespace into the Ambient mesh so ztunnel intercepts their traffic. Without it, agentgateway pods sit outside the mesh and can't use HBONE to reach pods in the other cluster.topology.istio.io/network=<name>— tells istiod which network these pods belong to. istiod uses this to decide routing: same network → go direct; different network → route via the east-west gateway. Without it, cross-cluster endpoint rewriting never fires.
About — what this does & why
What: Labels the agentgateway-system namespace on both clusters with istio.io/dataplane-mode=ambient and the cluster's topology.istio.io/network.
Why: Without Ambient enrolment, agentgateway pods sit outside the mesh and ztunnel doesn't intercept their traffic — they can't HBONE to remote workloads. Without the network label, istiod can't classify the agentgateway pod's network, so its endpoint rewriting logic skips them. Both labels are non-negotiable for cross-cluster reachability from a Gateway hosted in this namespace.
kubectl --context $CLUSTER1 label namespace agentgateway-system \
istio.io/dataplane-mode=ambient \
topology.istio.io/network=${CLUSTER1#kind-} \
--overwrite
kubectl --context $CLUSTER2 label namespace agentgateway-system \
istio.io/dataplane-mode=ambient \
topology.istio.io/network=${CLUSTER2#kind-} \
--overwrite
Verify GatewayClasses are registered
About — what this does & why
What: Lists cluster-scoped GatewayClass resources and confirms both enterprise-agentgateway and enterprise-agentgateway-waypoint are present.
Why: The lab pages create Gateways referencing these classes by name. If they're missing the agentgateway controller didn't start cleanly — usually because the CRDs from the previous step weren't installed first.
kubectl --context $CLUSTER1 get gatewayclasses
# Should list enterprise-agentgateway and enterprise-agentgateway-waypoint
Solo Istio Management Plane — Gloo UI (optional)
Deploys the Gloo Platform management plane on $CLUSTER1 with
$CLUSTER2 registered as a workload cluster. Gives you a UI
with a service graph that visualises the multicluster mesh, plus
centralised policy/insights. Mirrors the
upstream Gloo Management Plane section,
adapted for our east-ag/west-ag cluster naming.
GLOO_MESH_LICENSE_KEY alongside the other licence vars from
STEP 00. Request a trial at
solo.io/free-trial
if you don't have one.
Install meshctl
About — what this does & why
What: Downloads meshctl v2.12.0 to ~/.gloo-mesh/bin and puts it on PATH.
Why: Gloo Platform's CLI handles cluster registration with relay tokens and surfaces convenience commands like meshctl dashboard. It's a separate binary from istioctl.
curl -sL https://run.solo.io/meshctl/install | GLOO_MESH_VERSION=v2.12.0 sh -
export PATH=$HOME/.gloo-mesh/bin:$PATH
meshctl version
Write the management-plane values file
Same shape as the upstream mgmt-values.yaml — just retargeted
at east-ag instead of cluster1.
About — what this does & why
What: Writes a Helm values file at /tmp/mgmt-values.yaml enabling the management-plane components (glooMgmtServer, glooUi, glooAnalyzer, glooInsightsEngine), the workload-cluster components (glooAgent), Prometheus/Redis, and the telemetry gateway with Istio + Jaeger pipelines.
Why: Splitting the values file out keeps the helm install line short and lets you tweak feature toggles (e.g. disable Insights, or flip the agent off if you only want the UI). installEnterpriseCrds: false avoids overwriting the agentgateway CRDs already installed.
cat > /tmp/mgmt-values.yaml <<'EOF'
common:
cluster: east-ag
glooAgent:
enabled: true
runAsSidecar: true
relay:
serverAddress: gloo-mesh-mgmt-server.gloo-mesh:9900
glooAnalyzer:
enabled: true
glooMgmtServer:
enabled: true
registerCluster: true
policyApis:
enabled: true
glooInsightsEngine:
enabled: true
glooUi:
enabled: true
prometheus:
enabled: true
redis:
deployment:
enabled: true
telemetryCollector:
enabled: true
telemetryGateway:
enabled: true
telemetryGatewayCustomization:
pipelines:
traces/jaeger:
enabled: true
telemetryCollectorCustomization:
pipelines:
traces/istio:
enabled: true
installEnterpriseCrds: false
featureGates:
ConfigDistribution: false
EOF
Install Gloo Platform on east-ag (management + workload)
About — what this does & why
What: Installs Gloo Platform CRDs then the main chart into gloo-mesh on east-ag, passing the values file from the previous step and the Gloo Mesh licence key.
Why: Co-locates the management plane and one workload-cluster agent in a single helm release on east — the same Gloo agent that registers workload-cluster east is also what the UI consumes for service graph data. West joins as a separate workload cluster in the next step.
helm repo add gloo-platform https://storage.googleapis.com/gloo-platform/helm-charts
helm repo update
helm upgrade -i gloo-platform-crds gloo-platform/gloo-platform-crds \
-n gloo-mesh --create-namespace \
--version=2.12.0 \
--set installEnterpriseCrds=false \
--kube-context=$CLUSTER1
helm upgrade -i gloo-platform gloo-platform/gloo-platform \
-n gloo-mesh \
--version=2.12.0 \
--kube-context=$CLUSTER1 \
--values /tmp/mgmt-values.yaml \
--set licensing.glooMeshLicenseKey=$GLOO_MESH_LICENSE_KEY
Wait for the telemetry gateway LoadBalancer IP
About — what this does & why
What: Waits for gloo-telemetry-gateway to be Available on east, then reads its MetalLB-assigned LB IP into TELEMETRY_GATEWAY_ADDRESS with the OTLP gRPC port (:4317) appended.
Why: The workload-cluster agent on west needs an address to ship traces, metrics, and topology telemetry to. meshctl cluster register in the next step takes this address as a flag and bakes it into west's agent config.
kubectl --context $CLUSTER1 -n gloo-mesh wait \
--for=condition=Available deployment/gloo-telemetry-gateway --timeout=300s
export TELEMETRY_GATEWAY_ADDRESS=$(kubectl --context $CLUSTER1 \
-n gloo-mesh get svc gloo-telemetry-gateway \
-o jsonpath="{.status.loadBalancer.ingress[0]['hostname','ip']}"):4317
echo "telemetry gateway: $TELEMETRY_GATEWAY_ADDRESS"
Register west-ag as a workload cluster
About — what this does & why
What: Uses meshctl cluster register with the gloo-mesh-agent profile to install the Gloo agent on west-ag and wire it to the relay server on east, with the telemetry gateway address from the previous step.
Why: A multicluster Gloo Platform install has one management plane and N workload clusters — west needs the agent so the UI can see its workloads. meshctl handles the kubeconfig context juggling and the secret token exchange so you don't have to apply Secrets by hand.
meshctl cluster register west-ag \
--kubecontext $CLUSTER1 \
--profiles gloo-mesh-agent \
--remote-context $CLUSTER2 \
--telemetry-server-address $TELEMETRY_GATEWAY_ADDRESS
Launch the UI
About — what this does & why
What: Opens the Gloo UI on http://localhost:8090 via meshctl dashboard (port-forwards under the hood).
Why: The standup itself doesn't deploy any traffic, so the service graph is empty until you run a lab (cloud-connectivity or agentic-mcp). Once workloads exist, the graph visualises cross-cluster calls, waypoint hops, and policy verdicts in real time.
meshctl dashboard
# Opens http://localhost:8090 in your browser — Gloo UI with a service graph
# of both clusters. Workloads light up once you run a lab that deploys traffic.
Next: Labs
The cluster is now standing up: two ambient kind clusters peered over HBONE, Solo Enterprise agentgateway as the north-south ingress, and the Gloo UI showing it all. The fun stuff is in the dedicated lab pages, which assume this standup is complete:
Teardown
About — what this does & why
What: Deletes both kind clusters and removes the generated certs/ directory.
Why: One command nukes the whole demo footprint — no leftover Docker containers, no half-installed CRDs. Re-running STEP 01 from scratch is faster and more reliable than trying to uninstall helm releases in dependency order.
kind delete cluster --name east-ag
kind delete cluster --name west-ag
rm -rf certs/
Appendix · Swap the Enterprise agentgateway build
This block is for swapping the agentgateway build on a cluster that's already up, without rebuilding the whole platform.
The standup pins AGW_VERSION=v2.3.3 from the public Solo registry. To test a different build (a nightly, a pre-release patch from a dev registry, an air-gapped mirror, etc.) replace the controller on both clusters with the snippet below. Edit REGISTRY and VERSION at the top — the rest is mechanical.
About — what this does & why
What: Deletes every Gateway and HTTPRoute targeting enterprise-agentgateway / enterprise-agentgateway-waypoint classes on both clusters, helm-uninstalls the current chart + CRDs, then installs the chart at the new REGISTRY/VERSION you set at the top. $AGENTGATEWAY_LICENSE_KEY from your secrets file is wired into the install.
Why: Lets you A/B test agentgateway releases against the same Solo Istio multicluster mesh — the bookinfo workload, peering, certificates, and ztunnel all stay intact. Useful for verifying upstream fixes (e.g. cross-cluster WorkloadEntry handling) without re-running the whole standup. The Gateway delete is required first because the controller is what owns the backing Deployments/Services — uninstalling the chart while Gateway resources still reference it leaves orphan pods.
# --- Edit these two lines to point at the build you want -----------------
REGISTRY="oci://us-central1-docker.pkg.dev/developers-369321/enterprise-agentgateway-dev/charts"
VERSION="v2026.5.0-beta.4-nightly-2026-05-15"
# -------------------------------------------------------------------------
NAMESPACE="agentgateway-system"
# Re-auth docker for the dev registry if you've never pulled from it before.
# gcloud auth configure-docker us-central1-docker.pkg.dev --quiet
# 1) Delete any Gateway + HTTPRoute resources that reference the agentgateway
# GatewayClasses so the controller can shut down cleanly.
for CTX in $CLUSTER1 $CLUSTER2; do
for CLASS in enterprise-agentgateway enterprise-agentgateway-waypoint; do
kubectl --context $CTX get gateway -A \
-o jsonpath='{range .items[?(@.spec.gatewayClassName=="'$CLASS'")]}{.metadata.namespace} {.metadata.name}{"\n"}{end}' \
| while read ns name; do
[[ -n "$ns" ]] || continue
kubectl --context $CTX -n $ns delete gateway $name --ignore-not-found
done
done
done
# 2) Uninstall the current chart on both clusters.
for CTX in $CLUSTER1 $CLUSTER2; do
helm --kube-context $CTX -n $NAMESPACE uninstall enterprise-agentgateway --ignore-not-found
helm --kube-context $CTX -n $NAMESPACE uninstall agentgateway-crds --ignore-not-found
done
# 3) Install the new build on both clusters.
for CTX in $CLUSTER1 $CLUSTER2; do
helm --kube-context $CTX upgrade --install agentgateway-crds \
"$REGISTRY/enterprise-agentgateway-crds" \
--namespace $NAMESPACE --create-namespace \
--version $VERSION --wait
helm --kube-context $CTX upgrade --install enterprise-agentgateway \
"$REGISTRY/enterprise-agentgateway" \
--namespace $NAMESPACE --version $VERSION \
--set licensing.licenseKey="$AGENTGATEWAY_LICENSE_KEY" \
--wait
done
# 4) Confirm the new controller is Available + the GatewayClasses are still Accepted.
for CTX in $CLUSTER1 $CLUSTER2; do
kubectl --context $CTX -n $NAMESPACE rollout status deploy/enterprise-agentgateway --timeout=120s
kubectl --context $CTX get gatewayclass enterprise-agentgateway -o jsonpath='{.status.conditions[?(@.type=="Accepted")].status}{"\n"}'
done
Gateway + HTTPRoute resources your lab needs (e.g.
cloud-connectivity LAB 0) after the new controller is up.
Resources
- rvennam/ambient-multicluster-workshop — upstream workshop this guide follows
- Istio Ingress Gateway companion guide — same setup with
gatewayClassName: istio - ambientmesh.io — Istio Ambient documentation
- Source on GitHub