Minimal diagnostic dump (collect-minimal)
When you are chasing an intermittent 503 or a connectivity failure for one
specific service, a full tctl collect of the whole cluster gives you far
more data than you need and takes longer to produce, share, and read through.
tctl collect-minimal is a focused alternative: it captures a hostname-scoped,
correlated, layered diagnostic dump for a single host on a single cluster. It
resolves everything that participates in serving that one hostname — the proxies,
the Istio and XCP configuration, the backing Kubernetes objects, the istiod
replicas, and the XCP edge — and lays the artifacts out in the order an engineer
naturally triages them.
tctl collect-minimal complements, and does not replace, tctl collect. When
Tetrate Support asks for a full cluster dump, continue to use
tctl collect. Reach for collect-minimal
when the problem is already narrowed down to a single hostname.
Prerequisites
tctlinstalled and configured against the cluster you want to inspect. See the tctl installation and usage guide.- Your current
kubecontextpointing at the cluster where the affected gateway or workload runs.collect-minimalreads cluster state through that context, exactly liketctl collect. - The fully qualified hostname of the affected service (for example
echo.echo.svc.cluster.local) and the namespace it lives in.
Basic usage
Two flags are required — the hostname to scope the dump to, and that hostname's namespace:
tctl collect-minimal --hostname echo.echo.svc.cluster.local --namespace echo
This resolves the scope, runs every collector, and writes a single timestamped
tar.gz archive into the current directory. Attach that archive to your support
ticket, or unpack it and read it yourself.
To write the files to a directory instead of an archive (handy for local
debugging), add --disable-archive:
tctl collect-minimal --hostname echo.echo.svc.cluster.local --namespace echo --disable-archive
Options
| Flag | Description |
|---|---|
--hostname | Required. Hostname to scope the dump to, e.g. echo.echo.svc.cluster.local. |
--namespace | Required. Namespace of the hostname being scoped to. |
-o, --output-directory | Path to write the collected files under. Defaults to a timestamped directory. |
--disable-archive | Output a directory of files rather than a tar.gz tarball. |
--redact-presets | Comma-separated redaction presets. The networking preset replaces every IPv4/IPv6 address with a consistent hash. |
--redact-regexes | Comma-separated regexes; any match is replaced with a SHA-256 hash of the matched string. |
--rps-limit | Requests-per-second limit to the Kubernetes API server (default 50). Raise it to collect faster on a quiet cluster. |
See the tctl collect-minimal command reference
for the canonical, auto-generated flag list.
What gets collected
The command first resolves the scope of the hostname: it works out whether this cluster is acting as a tier-1 or tier-2 for the host, finds the service, its backing pods, the gateway pods, the istiod replicas that program those proxies, and the XCP edge. It then runs a set of collectors and organises their output into six fixed, numbered layers plus two metadata files. The numbering follows the triage order: proxy stats → config → Kubernetes state → control plane → mesh control plane.
| Path | Contents |
|---|---|
00-summary.md | Human-readable triage guide — start here (see below). |
manifest.json | Machine-readable record of the resolved scope and per-collector status. |
01-envoy/ | Envoy proxy artifacts for the host's gateway, backing, and client pods: stats.txt, clusters.json, and config_dump.json. The authoritative artifact for the classic 503. |
02-istio-config/ | The raw Istio CRs that shape routing for this host (VirtualService, DestinationRule, Gateway, ServiceEntry, …) plus the relevant IstioOperator CRs. |
03-xcp-config/ | The XCP config CRs (*.xcp.tetrate.io) — the Workspace and Group containers plus the XCP gateway/routing intent serving this hostname. |
04-k8s/ | The backing Kubernetes objects (Service, Endpoints, pods, …). |
05-istiod/ | The istiod replicas that program this host's proxies, filtered to the istio.io/rev revisions actually in use. Each replica is grouped under a self-contained <rev>-<pod>/ directory with its logs, debug endpoints, owning Deployment, fronting Service, and served Envoy config dumps. |
06-xcp-edge/ | XCP edge debug endpoints, including debug-gateways.json (does the edge know this host?) and debug-appliedconfigz.json (the edge's live applied config). |
Collectors are best-effort: if one fails, it records the error in the manifest and the run continues. A partial dump is still useful.
How to read the dump
Open 00-summary.md first. It is written top-to-bottom as a triage guide and
contains:
- The resolved scope — hostname, namespace, service name, and the detected cluster role (tier-1 / tier-2) with the reason it was chosen.
- What was found — counts and names of backing pods, gateway pods, istiod pods, XCP edge pods, and the matched Istio and XCP CRs.
- Revisions — which istiod revisions were captured, and which replicas were skipped because their revision is not in use on this hostname's proxies. This is a safety valve: if the proxy you are debugging looks like it is programmed by a skipped revision, re-run after relabelling the pods.
- Gold-thread hints — concrete pointers into the layers, for example:
01-envoy/<pod>/stats.txt— a non-zerono_cluster_foundpoints at a missing route/cluster (the classic503);upstream_cx_connect_fail/upstream_cx_none_healthypoint at an L4/TCP upstream failure.01-envoy/<pod>/clusters.json— an absent destination cluster, or endpoints withhealth_flagsset, explains UH/UF.01-envoy/<pod>/config_dump.json— route config and listener filter chains.
Tier-1 cross-cluster note
When the resolved cluster role is tier-1, the summary includes a cross-cluster
note. A tier-1 gateway rewrites the HTTP authority, so the hostname you queried is
not necessarily the hostname serving traffic on the downstream cluster. Confirm
the real tier-2 hostname in the tier-1 pod's config_dump.json (look at
route.rewrite.authority and the outbound|... cluster name) before re-running
collect-minimal against the tier-2 cluster. TCP/TLS-passthrough is SNI-routed
and is not rewritten.
Sharing dumps safely
collect-minimal runs through the same redaction pipeline as tctl collect. To
obfuscate sensitive data before attaching a dump to a ticket, use the redaction
flags:
# Hash every IP address consistently
tctl collect-minimal --hostname echo.echo.svc.cluster.local --namespace echo \
--redact-presets networking
# Hash anything matching your own patterns
tctl collect-minimal --hostname echo.echo.svc.cluster.local --namespace echo \
--redact-regexes "<regex-one>,<regex-two>"
The manifest.json and 00-summary.md metadata files carry only hostnames, pod
names, and per-collector status — never secret material.
Current limitations and roadmap
collect-minimal is being delivered iteratively. The current release captures a
single point-in-time snapshot for one hostname on the cluster your kubecontext
points at. The following enhancements are planned for future releases:
- Windowed collection (
--until <duration>) — capture START and END snapshots in a single dump, with mid-window proxy-restart detection, so you can bracket an intermittent failure. - Streamed log capture (
--stream) — windowed log streaming with stratified per-second sampling: keep every failure, sample the healthy baseline, and aggregate storms to bound volume without losing data to log rotation. - Reversible proxy diagnostics (
tctl debug proxy-stats) — arm the gateways serving a hostname with a connectivity-failure EnvoyproxyStatsMatcherprofile, fully revertible. - Access-log toggling (
tctl debug access-log) — reconfigure access logging for a hostname/namespace, with a revert option. - Richer control-plane capture — istiod deployment and logs for the exact
revision the data plane uses, alongside
IstioOperatorandEdgeXcpCRs. - Cross-cluster client discovery — today client-side envoy logs and data are captured only when the client lives in the same cluster as the first proxy it hits; this will be extended across clusters.
- PCAP capture — north/south-bound packet capture.