If you've operated sidecar Istio, you already know xDS as the way
istiod pushes Envoy config — clusters, listeners, routes,
endpoints. Ambient keeps the same transport (ADS — one
bidirectional gRPC stream multiplexing every resource type) and the
same port (15012 mTLS, 15010 plaintext for
bootstrap / dev), but the resources flowing over it changed.
Waypoints are still Envoys, so they get the Envoy xDS set you already
know. ztunnel is not an Envoy — it's a Rust dataplane
that needs to know about workloads (pods, their identity,
which services they back, which waypoint they should send through),
not about Envoy clusters and listeners. So Istio defined two new
resource types for it: WDS and WADS.
Everything ztunnel learns about the mesh — every IP it can route to, every SPIFFE identity it'll see on the wire, every authorization rule it'll enforce at L4 — arrives over that one ADS stream, as delta updates. The page below walks the wire format, shows the in-memory state ztunnel builds from it, and ends with the bit that's actually Solo-specific: how discovery information federates between clusters on Solo Enterprise vs upstream.
xDS in one sentence, in case you want it: a set of gRPC
discovery APIs (eXtensible
Discovery Service) where each
resource type — Listener, Cluster, Route, Endpoint, Secret, Workload,
Authorization — has its own well-known type_url, and a
client subscribes by listing the type URLs it wants. ADS just bundles
all of those onto one stream so ordering is consistent. For the
long-form, the references at the bottom of the page are good.
One stream, two consumers
Read it like this: a single ADS gRPC stream multiplexes
every resource type, but the type URLs a client subscribes to
determine what flows. ztunnel asks for two Istio-specific types
(istio.workload.Address, istio.security.Authorization);
it never asks for an Envoy Listener or Cluster because it doesn't have any.
A waypoint asks for the regular Envoy set. Both ends initiate a
delta stream (subscribe / unsubscribe by resource name),
so istiod only sends what changed after the first sync.
What ztunnel actually subscribes to
🌐 WDS — workload & service discovery istio.workload.Address
Wraps two underlying messages — Workload (one pod / VM,
keyed by a globally-unique uid) and Service
(a namespaced hostname plus its VIPs). ztunnel subscribes with
wildcard ["*"] on first connect, then takes deltas
forever. Every endpoint it knows about — local pods, remote pods in
federated clusters, ServiceEntry workloads — arrives this way.
Workload proto · the fields ztunnel actually usestrimmed from workload.proto
message Workload {
string uid = 20; // primary key: "cluster/group/kind/ns/name"
string name = 1;
string namespace = 2;
repeated bytes addresses = 3; // IPv4 / IPv6, no port
string hostname = 21;
string network = 4;
TunnelProtocol tunnel_protocol = 5; // NONE | HBONE
string trust_domain = 6;
string service_account = 7; // SPIFFE: spiffe://<trust>/ns/<ns>/sa/<sa>
GatewayAddress waypoint = 8; // which waypoint this workload sends through
GatewayAddress network_gateway = 19; // east-west gateway for cross-network
string node = 9;
string canonical_name = 10; // app + version for telemetry
string canonical_revision = 11;
WorkloadType workload_type = 12; // DEPLOYMENT | POD | CRONJOB | JOB
map<string, PortList> services = 22; // "ns/hostname" → ports
repeated string authorization_policies = 16; // names of WADS rules
WorkloadStatus status = 17; // HEALTHY | UNHEALTHY
string cluster_id = 18;
Locality locality = 24; // region/zone/subzone
}
Sample DeltaDiscoveryResponse · pod join · on the wirehttpbin-5d8d... gets scheduled
# What istiod pushes to ztunnel when a single pod is added.
# DeltaDiscoveryResponse, type_url omitted on resources because it
# matches the response's type_url.
DeltaDiscoveryResponse {
type_url: "type.googleapis.com/istio.workload.Address"
system_version_info: "push-1747500000"
nonce: "9c3d-…"
resources: [
Resource {
name: "default/httpbin-5d8d5f7c6b-abc12" # = workload.uid
version: "1747500000"
resource: Address {
type: workload {
uid: "default/httpbin-5d8d5f7c6b-abc12"
name: "httpbin-5d8d5f7c6b-abc12"
namespace: "default"
addresses: [ 0x0a000142 ] # 10.0.1.66 packed
network: "network1"
tunnel_protocol: HBONE
trust_domain: "cluster.local"
service_account: "httpbin"
node: "node-1"
canonical_name: "httpbin"
workload_type: DEPLOYMENT
status: HEALTHY
cluster_id: "cluster-east"
services: {
"default/httpbin.default.svc.cluster.local": {
ports: [ { service_port: 8000, target_port: 8080 } ]
}
}
waypoint: {
address: { network: "network1", address: 0x0a000264, length: 32 }
hbone_mtls_port: 15008
}
}
}
}
]
removed_resources: []
}
🛡️ WADS — authorization for L4 enforcement istio.security.Authorization
The L4 slice of every AuthorizationPolicy that targets
something ztunnel enforces (namespace-wide, or
targetRefs-attached to a Service). The L7 rules from the
same AuthorizationPolicy only show up here if the policy's
target is on ztunnel itself — anything that needs L7 (JWT claims,
method, header) is sent to the waypoint as an Envoy
RBAC filter via the regular xDS path, not WADS.
Authorization · proto fields ztunnel enforcestrimmed
message Authorization {
string name = 1;
string namespace = 2;
Scope scope = 3; // GLOBAL | NAMESPACE | WORKLOAD_SELECTOR
Action action = 4; // ALLOW | DENY
repeated Rule rules = 5;
}
message Rule {
repeated Match matches = 1; // OR
}
message Match {
repeated Address source_ips = 1;
repeated Address not_source_ips = 2;
repeated string source_identities = 3; // SPIFFE
repeated string not_source_identities = 4;
repeated NetworkAddress destination_ips = 5;
repeated PortRange destination_ports = 6;
repeated string principals = 7; // L4 identity
}
What ztunnel holds in memory
ztunnel doesn't store the proto messages directly — it parses each
delta update into a pair of in-memory stores. WorkloadStore
is keyed by uid (the proto's primary key) with a reverse
index by IP. ServiceStore is keyed by
NamespacedHostname and holds the set of workloads that
back the service. When a packet arrives at ztunnel's TPROXY socket,
the IP lookup is O(1) into WorkloadStore — that's the
hot path.
state/workload.rs · the structs ztunnel actually keepssimplified from the Rust source
// One workload — built from a Workload proto, kept Arc<_> so reads
// don't block writes during pushes.
pub struct Workload {
pub uid: Strng,
pub addresses: Vec<IpNet>,
pub identity: Identity, // SPIFFE URI, derived
pub status: HealthStatus, // Healthy | Unhealthy
pub node: Strng,
pub services: HashMap<NamespacedHostname, Vec<Port>>,
pub waypoint: Option<GatewayAddress>,
pub network_gateway: Option<GatewayAddress>,
pub tunnel_protocol: TunnelProtocol, // None | Hbone
pub cluster_id: Strng,
pub locality: Option<Locality>,
pub authorization_policies: Vec<Strng>,
// …
}
pub struct WorkloadStore {
by_uid: HashMap<Strng, Arc<Workload>>,
by_ip: HashMap<IpAddr, Vec<Arc<Workload>>>,
}
// Services and the endpoints behind them.
pub struct Service {
pub namespaced_hostname: NamespacedHostname, // "default/httpbin"
pub vips: Vec<NetworkAddress>, // ClusterIP(s)
pub ports: Vec<Port>,
pub endpoints: HashMap<Strng, Endpoint>, // workload_uid → endpoint
pub waypoint: Option<GatewayAddress>, // service-attached waypoint
}
pub struct Endpoint {
pub workload_uid: Strng,
pub port: Vec<Port>,
pub status: HealthStatus,
}
pub struct ServiceStore {
by_hostname: HashMap<NamespacedHostname, Arc<Service>>,
by_vip: HashMap<NetworkAddress, Arc<Service>>,
}
Snapshot · what's in WorkloadStore after the httpbin pod above is delta'd indebug-formatted, three workloads
WorkloadStore {
by_uid: {
"default/httpbin-5d8d5f7c6b-abc12": Workload {
uid: "default/httpbin-5d8d5f7c6b-abc12",
addresses: [ 10.0.1.66/32 ],
identity: Identity::Spiffe {
trust_domain: "cluster.local",
namespace: "default",
service_account: "httpbin",
},
status: Healthy,
node: "node-1",
services: {
"default/httpbin.default.svc.cluster.local" => [ Port { svc: 8000, tgt: 8080 } ],
},
waypoint: Some(GatewayAddress {
address: 10.0.2.100/32,
hbone_mtls_port: 15008,
}),
tunnel_protocol: Hbone,
cluster_id: "cluster-east",
…
},
"default/curl-7d9b6c4f8c-pq8r2": Workload { … },
"kube-system/coredns-…": Workload { … },
},
by_ip: {
10.0.1.66 => [ "default/httpbin-5d8d5f7c6b-abc12" ],
10.0.1.81 => [ "default/curl-7d9b6c4f8c-pq8r2" ],
10.0.0.10 => [ "kube-system/coredns-…" ],
},
}
ServiceStore {
by_hostname: {
"default/httpbin.default.svc.cluster.local" => Service {
vips: [ 10.96.0.42/32 ],
ports: [ Port { svc: 8000, tgt: 8080 } ],
endpoints: {
"default/httpbin-5d8d5f7c6b-abc12" => Endpoint { port: [8000→8080], status: Healthy },
},
waypoint: Some(GatewayAddress { 10.0.2.100, port: 15008 }),
},
},
by_vip: {
10.96.0.42 => "default/httpbin.default.svc.cluster.local",
},
}
What the waypoint subscribes to
📦 Standard Envoy xDS no surprises
A waypoint is an Envoy that istiod generates config for
the same way it does for a sidecar — the same code paths in
pilot/pkg/networking/core produce the listener / cluster
/ route set. The only Ambient-specific shape is the inbound HBONE
listener on :15008 that terminates the per-node ztunnel's
tunnel and hands the inner connection to the L7 filter chain.
Resource types a waypoint subscribes totype URLs as seen on the ADS stream
envoy.config.listener.v3.Listener # LDS — inbound HBONE :15008,
# plus per-port L7 chains
envoy.config.cluster.v3.Cluster # CDS — upstreams (per service VIP)
envoy.config.route.v3.RouteConfiguration # RDS — HTTPRoute / VirtualService
envoy.config.endpoint.v3.ClusterLoadAssignment # EDS — endpoints per cluster
envoy.extensions.transport_sockets.tls.v3.Secret # SDS — SPIFFE SVID + roots
# rotated continuously
What a waypoint will not seethings ztunnel handles below it
# A waypoint does NOT subscribe to istio.workload.Address or
# istio.security.Authorization — those are ztunnel-only. So when you
# look at a waypoint's xDS config dump:
#
# istioctl proxy-config all -n bookinfo deploy/reviews-waypoint
#
# you'll see Listeners/Clusters/Routes/Endpoints/Secrets, but no
# "Workload" rows and no "WADS" rows. L7 AuthorizationPolicy that
# targets the waypoint is rendered as an Envoy RBAC filter inside an
# HTTP filter chain — it arrives over LDS, not WADS.
Multicluster — peering, not cross-watching
This is the part that confuses people, including me when I first read about it. Multicluster Istio has gone through three eras, and the Solo Enterprise scaling story makes sense only once you know which era you're comparing against.
Same five clusters, three topologies. Era 1 is a control-plane mesh
(every istiod reaches into every other cluster's kube-API).
Era 2 keeps istiods local but you hand-author one Gateway in each
direction — N²-style ops burden, even if the wire is now O(N) data-plane
HBONE. Era 3 collapses the ops side to N: each cluster makes one outbound
connection to mgmt-server; the controller fans peering Gateways
out for you.
1 · OSS Istio sidecar — RemoteSecret (the era that did scale badly)
Each istiod holds a kubeconfig (a "remote secret")
for every other cluster and opens watches into all of
them. istiod-A literally talks to kube-apiserver-B,
kube-apiserver-C, and so on. N clusters means roughly
N² cross-cluster kube-API connections, plus N kubeconfigs to
distribute and rotate. This is the "doesn't scale past a
handful of clusters" you've heard about — and it's what most
Solo Enterprise marketing implicitly contrasts against.
2 · OSS Istio Ambient — manual peering Gateways (already much better)
Ambient drops the RemoteSecret model entirely. Each
istiod stays self-contained — it reads only its own
cluster's kube-API, full stop. Clusters are linked by Gateway
API resources: in cluster A you create an istio-remote
Gateway pointing at cluster B's east-west gateway IP, and
istiod-A programs ztunnel + waypoints to HBONE
that IP for B-bound traffic. Cross-cluster kube-API connections:
zero. The downside: you hand-create those Gateways
in every cluster. N clusters → N×(N−1) peer Gateways to maintain.
istioctl multicluster link generates them in bulk, but
every add / remove / IP-change is still a coordinated apply.
3 · Solo Enterprise — same peering model, automated
Solo Enterprise uses the same ambient peering wire format
as upstream OSS. The added value is in the control loop
that creates and distributes the peering Gateways. With
PEERING_AUTOMATIC_LOCAL_GATEWAY=true, each cluster's
istiod generates its own peering Gateway as a
self-description. The Gloo Mesh Agent in each
cluster ships it up to the central mgmt-server over
a single outbound mTLS gRPC connection. A
peering-controller in the mgmt-server (code:
pkg/server/peering/controller/peering_controller.go)
fans every cluster's peering Gateway out to all the others, tagged
with a distribute-to annotation. The receiving agents
apply them locally. Adding a 50th cluster is one join
operation; the peering Gateways for the new cluster
appear automatically in the other 49, and vice versa.
So: when Ram says "Solo Enterprise scales because istiod talks to
istiod, not via the kube-API server" — the picture he's painting is
shorthand for two true facts. (a) No
istiod in a Solo Enterprise deployment ever opens a
connection to another cluster's kube-API — true. (b)
Clusters peer via direct HBONE between east-west gateways, not via
some intermediate proxy or kube-API hop — true. The literal wire
for the control-plane link still goes
istiod-A → kube-A → agent-A → mgmt-server → agent-B → kube-B →
istiod-B, but the result is what he says: N outbound
connections to mgmt-server, not N² kube-API watches. That
is the scale property, and it holds at hundreds of clusters because
every connection is one-out-per-cluster.
Below, the two code blocks compare era 2 (OSS Ambient manual peering) with era 3 (Solo Enterprise automatic peering). Era 1 is shown separately above because that's the comparison most people are actually carrying in their head.
OSS Istio Ambient · manual peering Gateways, one pair per cluster pairstock upstream
# OSS Ambient multicluster: every cluster is self-contained. No
# RemoteSecret, no cross-cluster kube-API watches. istiod-A only ever
# reads cluster-A's kube-API. Clusters are linked by hand-creating
# Gateway resources in every cluster.
#
# cluster-A's istiod cluster-B's istiod
# │ │
# reads only reads only
# kube-A kube-B
#
# Step 1 — in EVERY cluster, deploy a local east-west gateway:
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: istio-eastwest
namespace: istio-eastwest
spec:
gatewayClassName: istio-eastwest
listeners:
- name: hbone
port: 15008
protocol: HBONE
tls: { mode: Terminate }
---
# Step 2 — in EACH cluster, hand-create an istio-remote Gateway for
# EVERY OTHER cluster you want to peer with. The Gateway is a pointer
# to the remote east-west gateway's address. istiod reads these from
# its own kube-API and programs xDS for ztunnel + waypoint to HBONE
# the remote endpoint.
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: istio-remote-peer-cluster-b # in cluster-A
namespace: istio-eastwest
spec:
gatewayClassName: istio-remote
addresses:
- type: IPAddress
value: 10.4.5.6 # cluster-B east-west gw IP
listeners:
- name: hbone
port: 15008
protocol: HBONE
---
# N clusters means N×(N−1) peer Gateways to maintain. `istioctl
# multicluster link` bulk-generates them, but every add/remove/IP-
# change still requires applying to all the right places.
Solo Enterprise · peering Gateways generated and distributed automaticallyGloo Mesh Enterprise
# Solo Enterprise uses the SAME ambient peering model — istio-remote
# Gateways pointing at remote east-west gateways — but a control
# plane creates and distributes them for you. You don't hand-author
# any istio-remote Gateway.
#
# mgmt-cluster: gloo-mesh-mgmt-server
# └─ peering-controller (pkg/server/peering/...)
# ▲ ▲ ▲
# │ │ │ outbound mTLS gRPC (:9900)
# │ │ │ one connection per cluster
# ┌─────────┘ │ └─────────┐
# │ │ │
# agent-A agent-B agent-C (Gloo Mesh Agent)
# │ │ │
# istiod-A istiod-B istiod-C (local only)
# Step 1 — istiod in every cluster auto-generates its OWN local
# istio-remote Gateway (describes itself to peers):
env:
PEERING_AUTOMATIC_LOCAL_GATEWAY: "true"
DISABLE_LEGACY_MULTICLUSTER: "true"
platforms:
peering:
enabled: true
---
# Step 2 — Gloo Mesh Agent in each cluster relays that local Gateway
# up to the mgmt-server over its single outbound mTLS gRPC channel.
# Step 3 — peering-controller in mgmt-server picks up every cluster's
# istio-remote-peer Gateway and fans it back out to all OTHER clusters,
# tagged with a distribute-to annotation. The agents on the receiving
# end apply it locally.
# Step 4 — istiod-B reads the istio-remote-peer-cluster-a Gateway
# from its OWN kube-API (just like the OSS manual case) and programs
# ztunnel-B / waypoint-B to HBONE 10.4.5.6:15008 for cluster-A traffic.
# End state (per cluster):
# $ kubectl get gateways -n istio-eastwest
# NAME CLASS
# istio-eastwest istio-eastwest (local)
# istio-remote-peer-cluster-a istio-remote (auto-managed)
# istio-remote-peer-cluster-b istio-remote (auto-managed)
# istio-remote-peer-cluster-c istio-remote (auto-managed)
#
# Add a new cluster → join it to the mgmt-server → its peer Gateway
# appears in every existing cluster automatically. No Gateway YAML
# applied by a human. No N×(N−1) toil. This is what scales.
#
# Side note: the istiod-agent Solo sidecar (same word, confusingly) is
# NOT the federation agent — it's a Vault / CA helper that refreshes
# the intermediate signing cert before its grace period elapses. The
# federation agent is Gloo Mesh Agent. Two different pieces.
What "scales to large fleets" actually means here why it works
The scale claim isn't about throughput per cluster — each
istiod serves the same xDS to ztunnel and waypoints
regardless of how many clusters exist. It's about the
shape of the cross-cluster wiring. A 100-cluster
Era 1 deployment needs around 5,000 cross-cluster kube-API
watches (the mesh) plus a kubeconfig per pair to provision and
rotate. A 100-cluster Era 2 deployment cuts kube-API watches to
zero but trades them for ~9,900 istio-remote Gateway
resources to author and reconcile. A 100-cluster Era 3 deployment
has 100 outbound mTLS gRPC connections total (one
per cluster, all aimed at mgmt-server) and zero peering
Gateways for humans to maintain. The blast radius of a control-plane
blip is also localised: pause new pushes, but no
istiod's in-cluster view collapses, because no
istiod ever depended on a connection to another cluster.
CLI — inspecting the xDS state
🔍 istioctl + ztunnel admin day 2
The waypoint side you debug with regular istioctl proxy-config
against the waypoint Deployment. The ztunnel side needs the ztunnel
admin API — same machine the ztunnel pod runs on, port
15000.
Waypoint side — istioctl proxy-configlisteners / clusters / endpoints
# Anything that looks like sidecar debugging works on a waypoint —
# it's an Envoy.
istioctl proxy-config listeners -n bookinfo deploy/reviews-waypoint
istioctl proxy-config clusters -n bookinfo deploy/reviews-waypoint
istioctl proxy-config routes -n bookinfo deploy/reviews-waypoint
istioctl proxy-config endpoints -n bookinfo deploy/reviews-waypoint
istioctl proxy-config secret -n bookinfo deploy/reviews-waypoint
# Full dump as JSON (handy for grepping a specific resource name)
istioctl proxy-config all -n bookinfo deploy/reviews-waypoint -o json
ztunnel side — ztunnel admin API on :15000workloads, services, certs, config
# ztunnel exposes its in-memory state on :15000 from inside the pod.
# Two ways to hit it:
# 1. Port-forward
kubectl -n istio-system port-forward ds/ztunnel 15000:15000
curl -s localhost:15000/config_dump | jq .
# 2. From a debug pod on the same node
kubectl -n istio-system exec -it ds/ztunnel -- curl -s localhost:15000/config_dump | jq .
# Useful endpoints:
# /config_dump full snapshot — workloads + services + policies
# /config_dump?level=warn just what's mis-synced
# /metrics Prometheus-format counters: xds_message_total,
# xds_connection_terminations, …
# /stats/prometheus ztunnel's own data-plane counters
#
# To list just the Workload entries:
curl -s localhost:15000/config_dump | jq '.workloads[] | {uid, addresses, identity, services}'
Confirming delta vs SotWistiod-side log
# istiod logs the per-connection xDS mode. ztunnel always connects
# in delta mode; sidecars / waypoints depend on PILOT_ENABLE_DELTA_XDS.
kubectl -n istio-system logs deploy/istiod | grep -E 'ADS|delta|WDS|WADS' | head -50
# Per-connection nonce / ack debugging:
istioctl ps # who's connected
istioctl ps -i <proxy-id> # one proxy's xDS sync status
Full reference — every xDS resource type on Ambient
| Type URL (proto) | Group | Consumed by | What it carries |
|---|---|---|---|
istio.workload.Address |
WDS | ztunnel |
Workloads (pods, VMs) and Services with VIPs & endpoints — the entire mesh topology, keyed by uid. |
istio.security.Authorization |
WADS | ztunnel |
L4 slice of every AuthorizationPolicy ztunnel enforces — source IPs / identities / dest ports. |
envoy.config.listener.v3.Listener |
LDS | waypoint · ingress envoy |
Listeners — including the inbound HBONE listener on :15008 on every waypoint. |
envoy.config.cluster.v3.Cluster |
CDS | waypoint · ingress envoy |
Upstream clusters — one per service the waypoint routes to. |
envoy.config.route.v3.RouteConfiguration |
RDS | waypoint · ingress envoy |
HTTP route rules — rendered from HTTPRoute or legacy VirtualService. |
envoy.config.endpoint.v3.ClusterLoadAssignment |
EDS | waypoint · ingress envoy |
Endpoints per cluster — IPs, ports, locality. |
envoy.extensions.transport_sockets.tls.v3.Secret |
SDS | waypoint · ingress envoy |
SPIFFE SVID + trust bundle. Rotated continuously by istiod. |
gateway.networking.k8s.io/v1.Gateway · gatewayClass: istio-remote |
SOLO | peering-controller (in mgmt-server) | Distributes istio-remote-peer-<cluster> Gateways between clusters via the Gloo Mesh Agent relay. The unit of cross-cluster discovery is a Gateway, not WDS — and no istiod ever watches another cluster's kube-API. |
Where to go from here
The deep references for xDS itself — protocol shape, delta semantics, why ADS exists — are Jimmy Song on delta xDS and the OneUptime xDS walkthrough. Both are short and worth reading once.
For the Ambient-specific resource types, the upstream sources are istio/ztunnel (Rust dataplane, where the proto consumers live) and istio/api · workload (the WDS / WADS proto definitions).
For the Solo Enterprise multicluster story, the two posts to read are Introducing Gloo Mesh Ambient multi-cluster and Istio Ambient multicluster support · Gloo Mesh multicluster peering. Pair them with the Gloo Operator reference on this site for the install topology, and the Trust & identity page for how SPIFFE roots get federated alongside the discovery surface.