Troubleshooting across multiple services

Since most of the functions of the system will involve interactions between multiple microservices, it's important to be able to follow a request coming in across all those microservices and various data stores. One of the best ways to accomplish this is distributed tracing, where you tag each request and can follow it from beginning to end.

The subtleties of debugging distributed systems in general and microservice-based ones take a lot of expertise. Consider the following aspects along the path of a single request through the system:

  • The microservices processing the request may use different programming languages.
  • The microservices may expose APIs using different transports/protocols.
  • Requests may be part of asynchronous workflows that involve waiting in queues and/or periodical processing.
  • The persistent state of the request may be spread across many independent data stores controlled by different microservices.

When you need to debug a problem across the entire swath of microservices in the system, the autonomous nature of each microservice becomes a hindrance. You must build explicit support to be able to gain system-level visibility by aggregating internal information from multiple microservices.