This document assumes the reader to have basic knowledge of Distributed Tracing concepts and nouns. If not familiar with Distributed Tracing, it is advised to first read up on Distributed Tracing before following this document and adjusting the Distributed Tracing configuration of TSB. For a good introduction to Distributed Tracing concepts, read this excellent blog by Nic Munroe.
Distributed Tracing does not work out of the box as it is important for your
deployed services to propagate trace context. Without enabling context
propagation in your services, you will experience broken traces and see a highly
diminished value in traces. We suggest to minimally support propagating the B3
and W3C trace context headers as well as the
x-request-id for request
correlation. See also the trace context propagation explanation
in the Istio documentation. Next to context propagation it is a very good idea
to include the
x-request-id (and possibly the
trace id from distributed
tracing) in all request bound log lines in your services. This enables near
effortless correlation between request traces and service logs and speeds up
By default, TSB provides a SkyWalking powered, Zipkin compatible distributed tracing backend. All Envoy ingress gateways and sidecars, under TSB’s control, have their internal Zipkin tracing instrumentation set to send span data straight to TSB’s SkyWalking collectors. A fixed global sampling rate can also be configured through TSB’s ControlPlane resource object.
If needing more flexibility in setting more granular sampling rates, using different tracing instrumentation, or sending span data to different backends; this document will provide you with the needed context to make the required changes.
Istio Telemetry API
The Istio Telemetry API provides granular and flexible methods to adjust
observability signals at runtime through the use of scoped
Prior to the Istio Telemetry API it was required to adjust the TSB control plane
and data plane operator configuration objects to configure a single distributed
tracer with a fixed sampling rate.
After enabling tracing extension providers for the Istio Telemetry API through the TSB control plane operator configuration object, it is possible to set specific tracers with different sampling rates for different namespaces using the Istio Telemetry objects.
While the Telemetry API has been around since Istio 1.12, it is still marked as in alpha state. This is mostly due to the ability for fine-grained configuration with many potential edge cases for tracing, metrics, and logging. We have tested and validated cluster level tracing configuration to be functional for Zipkin, OpenCensus, and OpenTelemetry tracing providers but do not guarantee successful configuration use cases beyond this. Istio does not and will not provide native OpenTelemetry support without the use of the Telemetry API.
To learn more about the Istio Telemetry API, see the Istio Telemetry API.
W3C Trace Context Propagation
By default TSB, through Envoy’s native Zipkin tracer instrumentation, uses the
B3 trace context propagation method.
B3 is one of the best
supported propagation methods available for a variety of Distributed Tracing
ecosystems, as it has been the de facto standard for many site owners who have
adopted Distributed Tracing early on (e.g. Netflix).
The Zipkin ecosystem originated at Twitter, where most services had names of
birds. Zipkin’s backend internal project name was Big Brother Bird. When the
Zipkin ecosystem was open sourced, the
B3 trace context headers were kept as
During a Distributed Tracing workshop in 2019, hosted by the Zipkin open source community, a few engineers from different organization where invited and came together to figure out a new context propagation method that would allow for tracing systems to interoperate even though some of them had different (optional) metadata requirements (most notably projects from Microsoft, AWS, and DynaTrace).
The idea was to have a common base context understood by all systems in one
Another header (
tracestate) can contain multiple chunks of metadata from
different tracing vendors. If a specific chunk of metadata is understood, it can
be interacted with. Tracers can add their own metadata but are required to
propagate the other tracing vendor metadata up to the maximum size of the header
value. Metadata chunks then get purged in a FIFO manner.
To put more weight to the newly proposed solution, the effort was taken to the
W3C, hence this propagation format being called the
W3C Trace Context. When
OpenTelemetry came into existence through CNCF dictated merging of Google’s
OpenCensus project (back then using
B3 for propagation) and vendor consortium
backed OpenTracing (having no guarantees at all on trace context propagation,
each vendor used their own), it was decided to switch the default from
W3C Trace Context. However, most OpenTelemetry instrumentation supports
after a small configuration change.
B3 context propagation to
W3C Trace Context in a TSB
environment can be accomplished by changing the active Envoy tracing
implementation. For a TSB 1.6 cluster, the only choice is
OpenCensus. It is
advised to not use this tracer going forward as OpenCensus has been deprecated
and is no longer maintained. The tracer will be removed in a future version of
Envoy Proxy and highly likely also the OpenTelemetry collector. When upgrading
to TSB 1.7 and up, it is advised to switch to the
OpenTelemetry Collector for Tracing
The OpenTelemetry collector is a “swiss army knife” of span data management. It can receive span data in different formats from various different tracing instrumentations and export this data out to multiple backends, potentially using different span data formats. In this document we will show how an OpenTelemetry Collector can be used to receive span data from incoming Zipkin, OpenCensus, and OpenTelemetry tracing instrumentation to be exported to an OpenTelemetry compatible backend as well as to TSB’s embedded tracing backend.
Enabling the Telemetry API for tracing in TSB
To make any change to TSB’s tracing configuration as described in this document, you first need to enable tracing extension providers for the Istio Telemetry API in TSB. For this you are required to adjust the TSB ControlPlane resource object for each cluster in your environment.
The TSB operator, using its
ControlPlane resource object, manages the
configuration and deployment of its Istio dependency. When a TSB ControlPlane
object is applied, the TSB operator will create an IstioOperator
resource object. This resulting object is then used to (re)configure the Istio
deployment. To enable the Telemetry API for tracing, we need to patch this
IstioOperator resource object through the TSB
TSB ControlPlane resource object overlay
To make sure we don’t overwrite important custom configurations found in the
ControlPlane object, we download the current state first. The following steps
need to be repeated for each cluster you want to adjust.
Fetch the ControlPlane resource object by running:
kubectl get -n istio-system controlplane controlplane \
-o yaml > controlplane.yaml
Take a note of the value of the
clusterName as found in the
section. You will need this value later when configuring the Istio
Edit the ControlPlane object by adding a patch for the
# start of overlay
- apiVersion: install.istio.io/v1alpha1
- path: spec.meshConfig.extensionProviders
# You can list multiple trace configurations here!
# They can be different tracers as well as different configurations
# for the same tracing instrumentation.
- name: <tracing-config-name>
# optional default extension provider patch; not required
# warning: this will inject trace headers even if tracing is disabled
# for a particular namespace. Make sure this is a desired side effect.
- path: spec.meshConfig.defaultProviders.tracing
# Even though this is a list, only one default tracer is supported!
# end of overlay
To install the adjusted ControlPlane resource object:
kubectl apply -f controlplane.yaml
The required part of the patch is to provide one or multiple tracing
configurations to the spec.meshConfig.extensionProviders
configuration type. Setting a patch for
spec.meshConfig.defaultProviders.tracing has the side effect that all request
traffic will be augmented with the trace headers of the default trace config
instrumentation, even if your Telemetry API configuration does not explicitly
set tracing config for the incoming request. Our advice is to not set the
default as a patch but rather rely on your Telemetry API resource objects unless
you use the trace id for request correlation in logs even if distributed tracing
Here is an example of a flexible set-up configuration with multiple trace
configurations, allowing the use of both
B3 as well as
tracers with the ability to send the data to TSB, an external Jaeger tracing
backend, or both.
Note that it is possible to add the same tracer type multiple times but each with different endpoint configurations. This can be very handy to designate one tracing backend as specific for troubleshooting purposes or provide app teams their own setups.
Jaeger’s Zipkin support can be activated through the following command line
--collector.zipkin.host-port=:9411. In the example below this is
required as it enables the “jaeger-b3” tracing configuration to send data
straight to Jaeger without an OpenTelemetry collector in between.
Jaeger versions v1.35 and up have native support for OpenTelemetry’s OTLP transport, older versions will need an OpenTelemetry collector in between. In the example below OTLP support is assumed as it enables the “jaeger-w3c” tracing configuration to send data straight to Jaeger without an OpenTelemetry collector in between.
- path: spec.meshConfig.extensionProviders
- name: tsb-b3 # Zipkin tracer to TSB backend
- name: jaeger-b3 # Zipkin tracer to Jaeger backend
- name: jaeger-w3c # OTel tracer to Jaeger backend
- name: both-b3 # Zipkin tracer to OTel collector
- name: both-w3c # OTel tracer to OTel collector
Since it is easy to make mistakes when dealing with overlays and edited resource objects are typically not being denied if the yaml is syntactically correct, it is a good idea to tail the logs of the TSB operator while applying the resource object. If you see an apply error, it is most likely you’ve made a typo in the patch or applied incorrect indentation in the patch value.
To tail TSB operator logs to inspect if your overlay is successfully processed, run the following command:
kubectl logs -n istio-system -l name=tsb-operator -f
Setting up an OpenTelemetry collector for tracing
In the above example of extension provider configurations it is assumed that sending span data to an OpenTelemetry collector results in this collector sending the data to an OTLP compatible Jaeger backend as well as TSB’s SkyWalking collector which expects Zipkin data. The default OpenTelemetry collector supports OTLP and Zipkin receivers and exporters by default.
If using a vendor specific OpenTelemetry collector (e.g. the Splunk OpenTelemetry distribution) it is common to have wide support of receivers but very limited support of exporters (often just OTLP and native vendor exporters). In those cases you will need to create your own OpenTelemetry distribution to support your vendor’s exporters as well as the Zipkin exporter, if it is required to feed the span data back to TSB. If the OTLP export is available, a daisy-chained OpenTelemetry collector solution is possible, although inefficient.
If setting up an OpenTelemetry collector to support the here presented use case, the helm chart values object could look like this:
This is not a production ready OpenTelemetry configuration.
fullnameOverride: otel-collector # this sets to otel service name, default is very verbose
Installation of OpenTelemetry Collector through helm looks like this:
helm install otel-trace open-telemetry/opentelemetry-collector \
Setting up a Jaeger backend
In this example we assume the availability of a Jaeger backend. Here is a demo configuration to deploy Jaeger using the Jaeger operator.
Install the Jaeger operator:
kubectl create namespace observability
kubectl create -n observability -f \
This is not a production ready Jaeger configuration.
After the Jaeger deployment has successfully been installed, you can create the required Jaeger deployment configuration. For this example we’ll use the demo all-in-one image with in-memory storage and enable Jaeger’s Zipkin collector.
Apply the Jaeger object to configure and deploy the Jaeger All-in-one solution:
kubectl apply -f jaeger.yaml
After applying the object, you should see the jaeger instance being deployed in the default namespace:
kubectl get pods -l app.kubernetes.io/instance=jaeger
By default, Jaeger operator creates an ingress route for you to access the UI. You can retrieve the address information by executing the following command:
kubectl get ingress
The default ingress creation behavior of Jaeger is quite insecure. If you are uncomfortable with this behavior please adjust the Jaeger configuration. For more information on Jaeger configuration, see the Jaeger operator documentation.
Using the Telemetry API
By enabling the extension providers we have muted the traditional Zipkin instrumentation. TSB will not trace requests without specifying the required behavior through the Istio Telemetry API.
The Istio Telemetry API specification dictates that only one globally scoped
Telemetry object can be applied to the root namespace
istio-system. TSB v1.7
automatically creates a global Telemetry object named
part of the installation/upgrade process where it handles improved global
metrics configuration for the Istio deployment it uses. If you have created a
custom global Telemetry object for tracing in TSB's previous version v1.6, you
will need to remove it before upgrading TSB to v1.7 and augment the Telemetry
configuration from this object into the
xcp-mesh-default object after the
To enable a mesh-wide default using the “both-b3” tracing configuration you’ve
created earlier, you can edit the global Telemetry object
like the one below.
kubectl edit telemetry xcp-mesh-default -n istio-system
- name: prometheus
# we add our tracing section at the bottom of the spec
- name: "both-b3" # use one of the extension provider tracing configurations here
value: "app-cluster-1" # use the TSB clusterName here!
tracer: # it is smart to add a tracer tag to highlight the used configuration
randomSamplingPercentage: 100.0 # use the desired sampling rate here
By switching the tracing provider configuration name you can switch between B3 and W3C context propagation as well as sending straight to TSB, straight to Jaeger, or the OpenTelemetry collector for feeding both TSB and Jaeger.
For more information and examples see the Istio Telemetry API documentation.