Skip to main content
logoTetrate Istio SubscriptionVersion: Next

Tetrate Premium Enterprise Support Best Practices

Tetrate Customer Support is on standby 24/7 to help you resolve your Service Mesh issues. However, we need your cooperation to achieve the best outcome.

This guide is a fast 10-minute read and is an essential tutorial on how to make the most of your Tetrate Istio Subscription (TIS) Premium Enterprise Support.

We recommend that you:

  1. Acquire the concepts and tools in the "Pre-requisite" section before you deploy TIS in production or take over operational responsibilities for TIS.
  2. Bookmark this page and be prepared to go through the steps in "Self-service troubleshooting", so you are ready to self-diagnose and provide essential troubleshooting information to Tetrate's support team to reduce your resolution time.

Pre-requisites

Knowledge you need

We recommend that you read through the Introducing Tetrate Istio Subscription and install Tetrate Config Analyzer (TCA)

Additionally, we recommend that you take the self-paced Istio course (free) offered by Tetrate so you are familiar with Istio concepts.

Self-diagnose procedure

The fastest way to resolve your issue is to go through this checklist as soon as an error occurs while gathering the essential information required for an effective troubleshooting by the Tetrate Customer Support team in case is it necessary.

  1. Restore

    1. The first step must always be to restore the environment to a working state.
    2. If you are deploying a change, roll it back.
    3. If you have high availability, fall-back to the working system.
  2. Identify the area and troubleshoot, for example, it could be one of the following:

    1. Config issues:
      • Run TCA to identify common errors.
    2. Sidecar Injection problems:
      • If your pods are failing to start, look into the MutatingAdmissionWebhook istio-sidecar-injector. When a pod is created, the Kubernetes api-server will call the sidecar injector service (Istiod). Errors during injection, or failure to connect to the service, can result in pods not being created. These errors may look something like failed calling webhook "sidecar-injector.istio.io": Post https://istiod.istio-system.svc:443/inject?timeout=30s: context deadline exceeded.
    3. Istiod problems:
      • To capture logs: kubectl logs -n istio-system -l app=istiod --tail=100000000 -c discovery > istiod.log.
      • To capture mesh config: kubectl get configmap -n istio-system -o jsonpath={.data.mesh} istio > meshconfig.yaml
      • To capture a proxy config dump from Istiod perspective: kubectl exec ISTIOD_POD -- curl 'localhost:8080/debug/config_dump?proxyID=POD_NAME.POD_NAMESPACE'
      • If you are experiencing performance issues with Istiod, such as excessive CPU or memory usage, memory leaks, etc, it is helpful to capture profiles. Please see (this page)[https://github.com/istio/istio/wiki/Analyzing-Istio-Performance] for help.
    4. Common Issues:
      • gRPC config stream closed: 13 or gRPC config stream closed: 0 in proxy logs, every 30 minutes. This error message is expected, as the connection to Pilot is intentionally closed every 30 minutes.
      • gRPC config stream closed: 14 in proxy logs. If this occurs repeatedly it may indicate problems connecting to Pilot. However, a single occurance of this is typical when Envoy is starting or restarting.

If you can't find the cause of the issue, create a new ticket using the Tetrate Customer Support portal.

How to File a new support ticket

All tickets must be officially initiated via Tetrate's support portal (JIRA), even if you are in touch with a Tetrate employee via other means of communication (e.g., phone, mail or Slack). Officially filing a ticket will ensure visibility to a broader pool of experts and better response times, resulting in an improved experience for you.

Step 1: Write a detailed description with proper context

The description should have quality context about the issue and the findings/hypothesis from your self-diagnose procedure. This allow us to onboard faster and be more effective.

Please, be sure to include the following in your description:

  1. State clearly what is the problem you're experiencing.
  2. If you have a hypothesis, let us know it so we can validate together.
  3. Elaborate on the chain of events prior to the issue. For example, changes to the configurations or to the faulty app.
  4. Indicate platform specifics that may be related. For example, nodes upgrades or security group changes.
  5. Inform if the configs provided include debug changes that are not part of the intended final state.

Step 2: Include the key information from self-diagnose

Attach logs, TCA output if relevant, and configurations from the self-diagnose steps in a way that makes sense to somebody not acquainted with your particular issue. Informative file naming and comments within the files are a good way to start.

If you think it is useful and only when necessary, provide the istioctl bug-report output.

For long code/log snippets, prefer files rather than direct pasting in the portal. For short code/log snippets, use the code formatter so the original indentation and any other particular is kept as the original.

Step 3: Choose a priority

Choose one of the following:

  • Severity 1: a Production System is severely impacted and completely shut down, or the system operations or mission-critical applications are down, due to a Tetrate software failure.
  • Severity 2: a Production System performance is degraded or restricted, but still operational.
  • Severity 3: a Non-production System is non-operational or completely shut down.
  • Severity 4: a Non-production System performance is degraded or restricted, but still operational.

Step 4: Answer back

Please answer to Tetrate's support team promptly. Clearly identify the information and/or procedures that are asked from you and act accordingly. If there are 3 Action Items from you, try your best in providing answer to all of them.

Once the issue is resolved, inform us and provide technical feedback so we understand how it was solved. It will help us understand your set up better for future requirements.