istiod xDS to ztunnel and waypoint — Istio Ambient control-plane wire format

If you've operated sidecar Istio, you already know xDS as the way istiod pushes Envoy config — clusters, listeners, routes, endpoints. Ambient keeps the same transport (ADS — one bidirectional gRPC stream multiplexing every resource type) and the same port (15012 mTLS, 15010 plaintext for bootstrap / dev), but the resources flowing over it changed. Waypoints are still Envoys, so they get the Envoy xDS set you already know. ztunnel is not an Envoy — it's a Rust dataplane that needs to know about workloads (pods, their identity, which services they back, which waypoint they should send through), not about Envoy clusters and listeners. So Istio defined two new resource types for it: WDS and WADS.

Everything ztunnel learns about the mesh — every IP it can route to, every SPIFFE identity it'll see on the wire, every authorization rule it'll enforce at L4 — arrives over that one ADS stream, as delta updates. The page below walks the wire format, shows the in-memory state ztunnel builds from it, and ends with the bit that's actually Solo-specific: how discovery information federates between clusters on Solo Enterprise vs upstream.

xDS in one sentence, in case you want it: a set of gRPC discovery APIs (eXtensible Discovery Service) where each resource type — Listener, Cluster, Route, Endpoint, Secret, Workload, Authorization — has its own well-known type_url, and a client subscribes by listing the type URLs it wants. ADS just bundles all of those onto one stream so ordering is consistent. For the long-form, the references at the bottom of the page are good.

One stream, two consumers

ztunnel · WDS + WADS (Istio-specific) waypoint · LDS/CDS/RDS/EDS/SDS (standard Envoy xDS) istiod · ADS gRPC server

Read it like this: a single ADS gRPC stream multiplexes every resource type, but the type URLs a client subscribes to determine what flows. ztunnel asks for two Istio-specific types (istio.workload.Address, istio.security.Authorization); it never asks for an Envoy Listener or Cluster because it doesn't have any. A waypoint asks for the regular Envoy set. Both ends initiate a delta stream (subscribe / unsubscribe by resource name), so istiod only sends what changed after the first sync.

What ztunnel actually subscribes to

🌐 WDS — workload & service discovery istio.workload.Address

Wraps two underlying messages — Workload (one pod / VM, keyed by a globally-unique uid) and Service (a namespaced hostname plus its VIPs). ztunnel subscribes with wildcard ["*"] on first connect, then takes deltas forever. Every endpoint it knows about — local pods, remote pods in federated clusters, ServiceEntry workloads — arrives this way.

Workload proto · the fields ztunnel actually usestrimmed from workload.proto

message Workload {
  string uid                 = 20;  // primary key: "cluster/group/kind/ns/name"
  string name                = 1;
  string namespace           = 2;
  repeated bytes addresses   = 3;   // IPv4 / IPv6, no port
  string hostname            = 21;
  string network             = 4;
  TunnelProtocol tunnel_protocol = 5;  // NONE | HBONE
  string trust_domain        = 6;
  string service_account     = 7;   // SPIFFE: spiffe://<trust>/ns/<ns>/sa/<sa>
  GatewayAddress waypoint    = 8;   // which waypoint this workload sends through
  GatewayAddress network_gateway = 19;  // east-west gateway for cross-network
  string node                = 9;
  string canonical_name      = 10;  // app + version for telemetry
  string canonical_revision  = 11;
  WorkloadType workload_type = 12;  // DEPLOYMENT | POD | CRONJOB | JOB
  map<string, PortList> services = 22;  // "ns/hostname" → ports
  repeated string authorization_policies = 16;  // names of WADS rules
  WorkloadStatus status      = 17;  // HEALTHY | UNHEALTHY
  string cluster_id          = 18;
  Locality locality          = 24;  // region/zone/subzone
}

Sample DeltaDiscoveryResponse · pod join · on the wirehttpbin-5d8d... gets scheduled

# What istiod pushes to ztunnel when a single pod is added.
# DeltaDiscoveryResponse, type_url omitted on resources because it
# matches the response's type_url.

DeltaDiscoveryResponse {
  type_url:             "type.googleapis.com/istio.workload.Address"
  system_version_info:  "push-1747500000"
  nonce:                "9c3d-…"
  resources: [
    Resource {
      name:    "default/httpbin-5d8d5f7c6b-abc12"   # = workload.uid
      version: "1747500000"
      resource: Address {
        type: workload {
          uid:              "default/httpbin-5d8d5f7c6b-abc12"
          name:             "httpbin-5d8d5f7c6b-abc12"
          namespace:        "default"
          addresses:        [ 0x0a000142 ]          # 10.0.1.66 packed
          network:          "network1"
          tunnel_protocol:  HBONE
          trust_domain:     "cluster.local"
          service_account:  "httpbin"
          node:             "node-1"
          canonical_name:   "httpbin"
          workload_type:    DEPLOYMENT
          status:           HEALTHY
          cluster_id:       "cluster-east"
          services: {
            "default/httpbin.default.svc.cluster.local": {
              ports: [ { service_port: 8000, target_port: 8080 } ]
            }
          }
          waypoint: {
            address: { network: "network1", address: 0x0a000264, length: 32 }
            hbone_mtls_port: 15008
          }
        }
      }
    }
  ]
  removed_resources: []
}

🛡️ WADS — authorization for L4 enforcement istio.security.Authorization

The L4 slice of every AuthorizationPolicy that targets something ztunnel enforces (namespace-wide, or targetRefs-attached to a Service). The L7 rules from the same AuthorizationPolicy only show up here if the policy's target is on ztunnel itself — anything that needs L7 (JWT claims, method, header) is sent to the waypoint as an Envoy RBAC filter via the regular xDS path, not WADS.

Authorization · proto fields ztunnel enforcestrimmed

message Authorization {
  string  name        = 1;
  string  namespace   = 2;
  Scope   scope       = 3;     // GLOBAL | NAMESPACE | WORKLOAD_SELECTOR
  Action  action      = 4;     // ALLOW | DENY
  repeated Rule rules = 5;
}

message Rule {
  repeated Match matches = 1;  // OR
}

message Match {
  repeated Address       source_ips           = 1;
  repeated Address       not_source_ips       = 2;
  repeated string        source_identities    = 3;   // SPIFFE
  repeated string        not_source_identities = 4;
  repeated NetworkAddress destination_ips     = 5;
  repeated PortRange     destination_ports    = 6;
  repeated string        principals           = 7;   // L4 identity
}

What ztunnel holds in memory

ztunnel doesn't store the proto messages directly — it parses each delta update into a pair of in-memory stores. WorkloadStore is keyed by uid (the proto's primary key) with a reverse index by IP. ServiceStore is keyed by NamespacedHostname and holds the set of workloads that back the service. When a packet arrives at ztunnel's TPROXY socket, the IP lookup is O(1) into WorkloadStore — that's the hot path.

state/workload.rs · the structs ztunnel actually keepssimplified from the Rust source

// One workload — built from a Workload proto, kept Arc<_> so reads
// don't block writes during pushes.

pub struct Workload {
    pub uid:              Strng,
    pub addresses:        Vec<IpNet>,
    pub identity:         Identity,                // SPIFFE URI, derived
    pub status:           HealthStatus,            // Healthy | Unhealthy
    pub node:             Strng,
    pub services:         HashMap<NamespacedHostname, Vec<Port>>,
    pub waypoint:         Option<GatewayAddress>,
    pub network_gateway:  Option<GatewayAddress>,
    pub tunnel_protocol:  TunnelProtocol,          // None | Hbone
    pub cluster_id:       Strng,
    pub locality:         Option<Locality>,
    pub authorization_policies: Vec<Strng>,
    // …
}

pub struct WorkloadStore {
    by_uid:  HashMap<Strng, Arc<Workload>>,
    by_ip:   HashMap<IpAddr, Vec<Arc<Workload>>>,
}

// Services and the endpoints behind them.

pub struct Service {
    pub namespaced_hostname: NamespacedHostname,   // "default/httpbin"
    pub vips:        Vec<NetworkAddress>,           // ClusterIP(s)
    pub ports:       Vec<Port>,
    pub endpoints:   HashMap<Strng, Endpoint>,      // workload_uid → endpoint
    pub waypoint:    Option<GatewayAddress>,        // service-attached waypoint
}

pub struct Endpoint {
    pub workload_uid: Strng,
    pub port:         Vec<Port>,
    pub status:        HealthStatus,
}

pub struct ServiceStore {
    by_hostname: HashMap<NamespacedHostname, Arc<Service>>,
    by_vip:      HashMap<NetworkAddress, Arc<Service>>,
}

Snapshot · what's in WorkloadStore after the httpbin pod above is delta'd indebug-formatted, three workloads

WorkloadStore {
  by_uid: {
    "default/httpbin-5d8d5f7c6b-abc12": Workload {
      uid: "default/httpbin-5d8d5f7c6b-abc12",
      addresses: [ 10.0.1.66/32 ],
      identity: Identity::Spiffe {
        trust_domain: "cluster.local",
        namespace:    "default",
        service_account: "httpbin",
      },
      status: Healthy,
      node:   "node-1",
      services: {
        "default/httpbin.default.svc.cluster.local" => [ Port { svc: 8000, tgt: 8080 } ],
      },
      waypoint: Some(GatewayAddress {
        address: 10.0.2.100/32,
        hbone_mtls_port: 15008,
      }),
      tunnel_protocol: Hbone,
      cluster_id: "cluster-east",
      …
    },
    "default/curl-7d9b6c4f8c-pq8r2": Workload { … },
    "kube-system/coredns-…":         Workload { … },
  },

  by_ip: {
    10.0.1.66 => [ "default/httpbin-5d8d5f7c6b-abc12" ],
    10.0.1.81 => [ "default/curl-7d9b6c4f8c-pq8r2"   ],
    10.0.0.10 => [ "kube-system/coredns-…"           ],
  },
}

ServiceStore {
  by_hostname: {
    "default/httpbin.default.svc.cluster.local" => Service {
      vips:  [ 10.96.0.42/32 ],
      ports: [ Port { svc: 8000, tgt: 8080 } ],
      endpoints: {
        "default/httpbin-5d8d5f7c6b-abc12" => Endpoint { port: [8000→8080], status: Healthy },
      },
      waypoint: Some(GatewayAddress { 10.0.2.100, port: 15008 }),
    },
  },
  by_vip: {
    10.96.0.42 => "default/httpbin.default.svc.cluster.local",
  },
}

What the waypoint subscribes to

📦 Standard Envoy xDS no surprises

A waypoint is an Envoy that istiod generates config for the same way it does for a sidecar — the same code paths in pilot/pkg/networking/core produce the listener / cluster / route set. The only Ambient-specific shape is the inbound HBONE listener on :15008 that terminates the per-node ztunnel's tunnel and hands the inner connection to the L7 filter chain.

Resource types a waypoint subscribes totype URLs as seen on the ADS stream

envoy.config.listener.v3.Listener           # LDS — inbound HBONE :15008,
                                            #        plus per-port L7 chains
envoy.config.cluster.v3.Cluster             # CDS — upstreams (per service VIP)
envoy.config.route.v3.RouteConfiguration    # RDS — HTTPRoute / VirtualService
envoy.config.endpoint.v3.ClusterLoadAssignment   # EDS — endpoints per cluster
envoy.extensions.transport_sockets.tls.v3.Secret # SDS — SPIFFE SVID + roots
                                            #        rotated continuously

What a waypoint will not seethings ztunnel handles below it

# A waypoint does NOT subscribe to istio.workload.Address or
# istio.security.Authorization — those are ztunnel-only. So when you
# look at a waypoint's xDS config dump:
#
#   istioctl proxy-config all -n bookinfo deploy/reviews-waypoint
#
# you'll see Listeners/Clusters/Routes/Endpoints/Secrets, but no
# "Workload" rows and no "WADS" rows. L7 AuthorizationPolicy that
# targets the waypoint is rendered as an Envoy RBAC filter inside an
# HTTP filter chain — it arrives over LDS, not WADS.

Multicluster — peering, not cross-watching

This is the part that confuses people, including me when I first read about it. Multicluster Istio has gone through three eras, and the Solo Enterprise scaling story makes sense only once you know which era you're comparing against.

Same five clusters, three topologies. Era 1 is a control-plane mesh (every istiod reaches into every other cluster's kube-API). Era 2 keeps istiods local but you hand-author one Gateway in each direction — N²-style ops burden, even if the wire is now O(N) data-plane HBONE. Era 3 collapses the ops side to N: each cluster makes one outbound connection to mgmt-server; the controller fans peering Gateways out for you.

1 · OSS Istio sidecar — RemoteSecret (the era that did scale badly)

Each istiod holds a kubeconfig (a "remote secret") for every other cluster and opens watches into all of them. istiod-A literally talks to kube-apiserver-B, kube-apiserver-C, and so on. N clusters means roughly N² cross-cluster kube-API connections, plus N kubeconfigs to distribute and rotate. This is the "doesn't scale past a handful of clusters" you've heard about — and it's what most Solo Enterprise marketing implicitly contrasts against.

2 · OSS Istio Ambient — manual peering Gateways (already much better)

Ambient drops the RemoteSecret model entirely. Each istiod stays self-contained — it reads only its own cluster's kube-API, full stop. Clusters are linked by Gateway API resources: in cluster A you create an istio-remote Gateway pointing at cluster B's east-west gateway IP, and istiod-A programs ztunnel + waypoints to HBONE that IP for B-bound traffic. Cross-cluster kube-API connections: zero. The downside: you hand-create those Gateways in every cluster. N clusters → N×(N−1) peer Gateways to maintain. istioctl multicluster link generates them in bulk, but every add / remove / IP-change is still a coordinated apply.

3 · Solo Enterprise — same peering model, automated

Solo Enterprise uses the same ambient peering wire format as upstream OSS. The added value is in the control loop that creates and distributes the peering Gateways. With PEERING_AUTOMATIC_LOCAL_GATEWAY=true, each cluster's istiod generates its own peering Gateway as a self-description. The Gloo Mesh Agent in each cluster ships it up to the central mgmt-server over a single outbound mTLS gRPC connection. A peering-controller in the mgmt-server (code: pkg/server/peering/controller/peering_controller.go) fans every cluster's peering Gateway out to all the others, tagged with a distribute-to annotation. The receiving agents apply them locally. Adding a 50th cluster is one join operation; the peering Gateways for the new cluster appear automatically in the other 49, and vice versa.

So: when Ram says "Solo Enterprise scales because istiod talks to istiod, not via the kube-API server" — the picture he's painting is shorthand for two true facts. (a) No istiod in a Solo Enterprise deployment ever opens a connection to another cluster's kube-API — true. (b) Clusters peer via direct HBONE between east-west gateways, not via some intermediate proxy or kube-API hop — true. The literal wire for the control-plane link still goes istiod-A → kube-A → agent-A → mgmt-server → agent-B → kube-B → istiod-B, but the result is what he says: N outbound connections to mgmt-server, not N² kube-API watches. That is the scale property, and it holds at hundreds of clusters because every connection is one-out-per-cluster.

Below, the two code blocks compare era 2 (OSS Ambient manual peering) with era 3 (Solo Enterprise automatic peering). Era 1 is shown separately above because that's the comparison most people are actually carrying in their head.

OSS Istio Ambient · manual peering Gateways, one pair per cluster pairstock upstream

# OSS Ambient multicluster: every cluster is self-contained. No
# RemoteSecret, no cross-cluster kube-API watches. istiod-A only ever
# reads cluster-A's kube-API. Clusters are linked by hand-creating
# Gateway resources in every cluster.
#
#   cluster-A's istiod        cluster-B's istiod
#         │                          │
#       reads only                reads only
#       kube-A                    kube-B
#
# Step 1 — in EVERY cluster, deploy a local east-west gateway:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: istio-eastwest
  namespace: istio-eastwest
spec:
  gatewayClassName: istio-eastwest
  listeners:
    - name: hbone
      port: 15008
      protocol: HBONE
      tls: { mode: Terminate }
---
# Step 2 — in EACH cluster, hand-create an istio-remote Gateway for
# EVERY OTHER cluster you want to peer with. The Gateway is a pointer
# to the remote east-west gateway's address. istiod reads these from
# its own kube-API and programs xDS for ztunnel + waypoint to HBONE
# the remote endpoint.

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: istio-remote-peer-cluster-b      # in cluster-A
  namespace: istio-eastwest
spec:
  gatewayClassName: istio-remote
  addresses:
    - type: IPAddress
      value: 10.4.5.6                    # cluster-B east-west gw IP
  listeners:
    - name: hbone
      port: 15008
      protocol: HBONE
---
# N clusters means N×(N−1) peer Gateways to maintain. `istioctl
# multicluster link` bulk-generates them, but every add/remove/IP-
# change still requires applying to all the right places.

Solo Enterprise · peering Gateways generated and distributed automaticallyGloo Mesh Enterprise

# Solo Enterprise uses the SAME ambient peering model — istio-remote
# Gateways pointing at remote east-west gateways — but a control
# plane creates and distributes them for you. You don't hand-author
# any istio-remote Gateway.
#
#   mgmt-cluster:   gloo-mesh-mgmt-server
#                   └─ peering-controller (pkg/server/peering/...)
#                            ▲ ▲ ▲
#                            │ │ │  outbound mTLS gRPC (:9900)
#                            │ │ │  one connection per cluster
#                  ┌─────────┘ │ └─────────┐
#                  │           │           │
#              agent-A      agent-B     agent-C    (Gloo Mesh Agent)
#                  │           │           │
#               istiod-A    istiod-B    istiod-C   (local only)

# Step 1 — istiod in every cluster auto-generates its OWN local
# istio-remote Gateway (describes itself to peers):
env:
  PEERING_AUTOMATIC_LOCAL_GATEWAY: "true"
  DISABLE_LEGACY_MULTICLUSTER:     "true"
platforms:
  peering:
    enabled: true
---
# Step 2 — Gloo Mesh Agent in each cluster relays that local Gateway
# up to the mgmt-server over its single outbound mTLS gRPC channel.

# Step 3 — peering-controller in mgmt-server picks up every cluster's
# istio-remote-peer Gateway and fans it back out to all OTHER clusters,
# tagged with a distribute-to annotation. The agents on the receiving
# end apply it locally.

# Step 4 — istiod-B reads the istio-remote-peer-cluster-a Gateway
# from its OWN kube-API (just like the OSS manual case) and programs
# ztunnel-B / waypoint-B to HBONE 10.4.5.6:15008 for cluster-A traffic.

# End state (per cluster):
#   $ kubectl get gateways -n istio-eastwest
#   NAME                            CLASS
#   istio-eastwest                  istio-eastwest        (local)
#   istio-remote-peer-cluster-a     istio-remote          (auto-managed)
#   istio-remote-peer-cluster-b     istio-remote          (auto-managed)
#   istio-remote-peer-cluster-c     istio-remote          (auto-managed)
#
# Add a new cluster → join it to the mgmt-server → its peer Gateway
# appears in every existing cluster automatically. No Gateway YAML
# applied by a human. No N×(N−1) toil. This is what scales.
#
# Side note: the istiod-agent Solo sidecar (same word, confusingly) is
# NOT the federation agent — it's a Vault / CA helper that refreshes
# the intermediate signing cert before its grace period elapses. The
# federation agent is Gloo Mesh Agent. Two different pieces.

What "scales to large fleets" actually means here why it works

The scale claim isn't about throughput per cluster — each istiod serves the same xDS to ztunnel and waypoints regardless of how many clusters exist. It's about the shape of the cross-cluster wiring. A 100-cluster Era 1 deployment needs around 5,000 cross-cluster kube-API watches (the mesh) plus a kubeconfig per pair to provision and rotate. A 100-cluster Era 2 deployment cuts kube-API watches to zero but trades them for ~9,900 istio-remote Gateway resources to author and reconcile. A 100-cluster Era 3 deployment has 100 outbound mTLS gRPC connections total (one per cluster, all aimed at mgmt-server) and zero peering Gateways for humans to maintain. The blast radius of a control-plane blip is also localised: pause new pushes, but no istiod's in-cluster view collapses, because no istiod ever depended on a connection to another cluster.

CLI — inspecting the xDS state

🔍 istioctl + ztunnel admin day 2

The waypoint side you debug with regular istioctl proxy-config against the waypoint Deployment. The ztunnel side needs the ztunnel admin API — same machine the ztunnel pod runs on, port 15000.

Waypoint side — istioctl proxy-configlisteners / clusters / endpoints

# Anything that looks like sidecar debugging works on a waypoint —
# it's an Envoy.

istioctl proxy-config listeners  -n bookinfo deploy/reviews-waypoint
istioctl proxy-config clusters   -n bookinfo deploy/reviews-waypoint
istioctl proxy-config routes     -n bookinfo deploy/reviews-waypoint
istioctl proxy-config endpoints  -n bookinfo deploy/reviews-waypoint
istioctl proxy-config secret     -n bookinfo deploy/reviews-waypoint

# Full dump as JSON (handy for grepping a specific resource name)
istioctl proxy-config all -n bookinfo deploy/reviews-waypoint -o json

ztunnel side — ztunnel admin API on :15000workloads, services, certs, config

# ztunnel exposes its in-memory state on :15000 from inside the pod.
# Two ways to hit it:

# 1. Port-forward
kubectl -n istio-system port-forward ds/ztunnel 15000:15000
curl -s localhost:15000/config_dump | jq .

# 2. From a debug pod on the same node
kubectl -n istio-system exec -it ds/ztunnel -- curl -s localhost:15000/config_dump | jq .

# Useful endpoints:
#   /config_dump        full snapshot — workloads + services + policies
#   /config_dump?level=warn   just what's mis-synced
#   /metrics            Prometheus-format counters: xds_message_total,
#                        xds_connection_terminations, …
#   /stats/prometheus   ztunnel's own data-plane counters
#
# To list just the Workload entries:

curl -s localhost:15000/config_dump | jq '.workloads[] | {uid, addresses, identity, services}'

Confirming delta vs SotWistiod-side log

# istiod logs the per-connection xDS mode. ztunnel always connects
# in delta mode; sidecars / waypoints depend on PILOT_ENABLE_DELTA_XDS.

kubectl -n istio-system logs deploy/istiod | grep -E 'ADS|delta|WDS|WADS' | head -50

# Per-connection nonce / ack debugging:
istioctl ps                                 # who's connected
istioctl ps -i <proxy-id>                   # one proxy's xDS sync status

Full reference — every xDS resource type on Ambient

Type URL (proto)	Group	Consumed by	What it carries
`istio.workload.Address`	WDS	`ztunnel`	Workloads (pods, VMs) and Services with VIPs & endpoints — the entire mesh topology, keyed by `uid`.
`istio.security.Authorization`	WADS	`ztunnel`	L4 slice of every `AuthorizationPolicy` ztunnel enforces — source IPs / identities / dest ports.
`envoy.config.listener.v3.Listener`	LDS	`waypoint` · ingress envoy	Listeners — including the inbound HBONE listener on `:15008` on every waypoint.
`envoy.config.cluster.v3.Cluster`	CDS	`waypoint` · ingress envoy	Upstream clusters — one per service the waypoint routes to.
`envoy.config.route.v3.RouteConfiguration`	RDS	`waypoint` · ingress envoy	HTTP route rules — rendered from `HTTPRoute` or legacy `VirtualService`.
`envoy.config.endpoint.v3.ClusterLoadAssignment`	EDS	`waypoint` · ingress envoy	Endpoints per cluster — IPs, ports, locality.
`envoy.extensions.transport_sockets.tls.v3.Secret`	SDS	`waypoint` · ingress envoy	SPIFFE SVID + trust bundle. Rotated continuously by `istiod`.
`gateway.networking.k8s.io/v1.Gateway` · `gatewayClass: istio-remote`	SOLO	peering-controller (in mgmt-server)	Distributes `istio-remote-peer-<cluster>` Gateways between clusters via the Gloo Mesh Agent relay. The unit of cross-cluster discovery is a Gateway, not WDS — and no `istiod` ever watches another cluster's kube-API.

Where to go from here

The deep references for xDS itself — protocol shape, delta semantics, why ADS exists — are Jimmy Song on delta xDS and the OneUptime xDS walkthrough. Both are short and worth reading once.

For the Ambient-specific resource types, the upstream sources are istio/ztunnel (Rust dataplane, where the proto consumers live) and istio/api · workload (the WDS / WADS proto definitions).

For the Solo Enterprise multicluster story, the two posts to read are Introducing Gloo Mesh Ambient multi-cluster and Istio Ambient multicluster support · Gloo Mesh multicluster peering. Pair them with the Gloo Operator reference on this site for the install topology, and the Trust & identity page for how SPIFFE roots get federated alongside the discovery surface.