Running distributed queries

We agree that each microservice should have its own data store. This means that the overall state of the system will be distributed across multiple data stores, accessible only from their own microservices. Most interesting queries will involve data available in multiple data stores. Each consumer could just access all these microservices and aggregate all the data to satisfy their query. However, that is sub-optimal for several reasons:

  • Consumers are intimately aware of how data is managed by the system.
  • Consumers need to get access to each and every service that stores data relevant to the query.
  • Changing the architecture might require changes to a lot of consumers.

There are two common solutions to address this issue: CQRS and API composition. The cool thing about it is that the services that enable both solutions have the same API, so it is possible to switch from one solution to another, or even mix and match without impacting users. This means that some queries will be serviced by CQRS and others by API composition, all implemented by the same service. Overall, I recommend to start with API composition and transition to CQRS only if the proper conditions exist and benefits are compelling, due to its much higher complexity.