Skip to main content
logoTetrate Service BridgeVersion: next

Minimal diagnostic dump (collect-minimal)

When you are chasing an intermittent 503 or a connectivity failure for one specific service, a full tctl collect of the whole cluster gives you far more data than you need and takes longer to produce, share, and read through.

tctl collect-minimal is a focused alternative: it captures a hostname-scoped, correlated, layered diagnostic dump for a single host on a single cluster. It resolves everything that participates in serving that one hostname — the proxies, the Istio and XCP configuration, the backing Kubernetes objects, the istiod replicas, and the XCP edge — and lays the artifacts out in the order an engineer naturally triages them.

note

tctl collect-minimal complements, and does not replace, tctl collect. When Tetrate Support asks for a full cluster dump, continue to use tctl collect. Reach for collect-minimal when the problem is already narrowed down to a single hostname.

Prerequisites

  • tctl installed and configured against the cluster you want to inspect. See the tctl installation and usage guide.
  • Your current kubecontext pointing at the cluster where the affected gateway or workload runs. collect-minimal reads cluster state through that context, exactly like tctl collect.
  • The fully qualified hostname of the affected service (for example echo.echo.svc.cluster.local) and the namespace it lives in.

Basic usage

Two flags are required — the hostname to scope the dump to, and that hostname's namespace:

tctl collect-minimal --hostname echo.echo.svc.cluster.local --namespace echo

This resolves the scope, runs every collector, and writes a single timestamped tar.gz archive into the current directory. Attach that archive to your support ticket, or unpack it and read it yourself.

To write the files to a directory instead of an archive (handy for local debugging), add --disable-archive:

tctl collect-minimal --hostname echo.echo.svc.cluster.local --namespace echo --disable-archive

Options

FlagDescription
--hostnameRequired. Hostname to scope the dump to, e.g. echo.echo.svc.cluster.local.
--namespaceRequired. Namespace of the hostname being scoped to.
-o, --output-directoryPath to write the collected files under. Defaults to a timestamped directory.
--disable-archiveOutput a directory of files rather than a tar.gz tarball.
--redact-presetsComma-separated redaction presets. The networking preset replaces every IPv4/IPv6 address with a consistent hash.
--redact-regexesComma-separated regexes; any match is replaced with a SHA-256 hash of the matched string.
--rps-limitRequests-per-second limit to the Kubernetes API server (default 50). Raise it to collect faster on a quiet cluster.

See the tctl collect-minimal command reference for the canonical, auto-generated flag list.

What gets collected

The command first resolves the scope of the hostname: it works out whether this cluster is acting as a tier-1 or tier-2 for the host, finds the service, its backing pods, the gateway pods, the istiod replicas that program those proxies, and the XCP edge. It then runs a set of collectors and organises their output into six fixed, numbered layers plus two metadata files. The numbering follows the triage order: proxy stats → config → Kubernetes state → control plane → mesh control plane.

PathContents
00-summary.mdHuman-readable triage guide — start here (see below).
manifest.jsonMachine-readable record of the resolved scope and per-collector status.
01-envoy/Envoy proxy artifacts for the host's gateway, backing, and client pods: stats.txt, clusters.json, and config_dump.json. The authoritative artifact for the classic 503.
02-istio-config/The raw Istio CRs that shape routing for this host (VirtualService, DestinationRule, Gateway, ServiceEntry, …) plus the relevant IstioOperator CRs.
03-xcp-config/The XCP config CRs (*.xcp.tetrate.io) — the Workspace and Group containers plus the XCP gateway/routing intent serving this hostname.
04-k8s/The backing Kubernetes objects (Service, Endpoints, pods, …).
05-istiod/The istiod replicas that program this host's proxies, filtered to the istio.io/rev revisions actually in use. Each replica is grouped under a self-contained <rev>-<pod>/ directory with its logs, debug endpoints, owning Deployment, fronting Service, and served Envoy config dumps.
06-xcp-edge/XCP edge debug endpoints, including debug-gateways.json (does the edge know this host?) and debug-appliedconfigz.json (the edge's live applied config).

Collectors are best-effort: if one fails, it records the error in the manifest and the run continues. A partial dump is still useful.

How to read the dump

Open 00-summary.md first. It is written top-to-bottom as a triage guide and contains:

  • The resolved scope — hostname, namespace, service name, and the detected cluster role (tier-1 / tier-2) with the reason it was chosen.
  • What was found — counts and names of backing pods, gateway pods, istiod pods, XCP edge pods, and the matched Istio and XCP CRs.
  • Revisions — which istiod revisions were captured, and which replicas were skipped because their revision is not in use on this hostname's proxies. This is a safety valve: if the proxy you are debugging looks like it is programmed by a skipped revision, re-run after relabelling the pods.
  • Gold-thread hints — concrete pointers into the layers, for example:
    1. 01-envoy/<pod>/stats.txt — a non-zero no_cluster_found points at a missing route/cluster (the classic 503); upstream_cx_connect_fail / upstream_cx_none_healthy point at an L4/TCP upstream failure.
    2. 01-envoy/<pod>/clusters.json — an absent destination cluster, or endpoints with health_flags set, explains UH/UF.
    3. 01-envoy/<pod>/config_dump.json — route config and listener filter chains.

Tier-1 cross-cluster note

When the resolved cluster role is tier-1, the summary includes a cross-cluster note. A tier-1 gateway rewrites the HTTP authority, so the hostname you queried is not necessarily the hostname serving traffic on the downstream cluster. Confirm the real tier-2 hostname in the tier-1 pod's config_dump.json (look at route.rewrite.authority and the outbound|... cluster name) before re-running collect-minimal against the tier-2 cluster. TCP/TLS-passthrough is SNI-routed and is not rewritten.

Sharing dumps safely

collect-minimal runs through the same redaction pipeline as tctl collect. To obfuscate sensitive data before attaching a dump to a ticket, use the redaction flags:

# Hash every IP address consistently
tctl collect-minimal --hostname echo.echo.svc.cluster.local --namespace echo \
--redact-presets networking

# Hash anything matching your own patterns
tctl collect-minimal --hostname echo.echo.svc.cluster.local --namespace echo \
--redact-regexes "<regex-one>,<regex-two>"

The manifest.json and 00-summary.md metadata files carry only hostnames, pod names, and per-collector status — never secret material.

Current limitations and roadmap

collect-minimal is being delivered iteratively. The current release captures a single point-in-time snapshot for one hostname on the cluster your kubecontext points at. The following enhancements are planned for future releases:

  • Windowed collection (--until <duration>) — capture START and END snapshots in a single dump, with mid-window proxy-restart detection, so you can bracket an intermittent failure.
  • Streamed log capture (--stream) — windowed log streaming with stratified per-second sampling: keep every failure, sample the healthy baseline, and aggregate storms to bound volume without losing data to log rotation.
  • Reversible proxy diagnostics (tctl debug proxy-stats) — arm the gateways serving a hostname with a connectivity-failure Envoy proxyStatsMatcher profile, fully revertible.
  • Access-log toggling (tctl debug access-log) — reconfigure access logging for a hostname/namespace, with a revert option.
  • Richer control-plane capture — istiod deployment and logs for the exact revision the data plane uses, alongside IstioOperator and EdgeXcp CRs.
  • Cross-cluster client discovery — today client-side envoy logs and data are captured only when the client lives in the same cluster as the first proxy it hits; this will be extended across clusters.
  • PCAP capture — north/south-bound packet capture.