A service mesh is an increasingly popular solution in the area of application networking, in Kubernetes and other environments. If you are still not familiar with the concept, in this article you will find everything you need to know before taking a deeper dive.
Over the past few years, we have seen a shift away from approaches based on monolithic code when designing software applications. Instead, modern design is based on microservices architecture. At the end of the day, it is about delivering basically the same business logic, not in the form of a large monolith but as a collection of loosely coupled and independently deployable services.
Why do this? Well, it has its undoubted advantages. For example, microservices can have their own development cycle, can be developed using different programming languages and frameworks, and can be owned and managed by different teams. Such a decomposition into various functional modules allows you to optimize the delivery process of the entire application. Extending the application with new functionalities and updating or replacing individual functional pieces with new ones are much easier. What is more, microservices architecture is by nature perfectly suited to containerization-based deployments (e.g. in such environments as Kubernetes) which is certainly a very powerful feature, especially today.
However, it is worth noting that this approach also brings considerable challenges. If we want an application based on microservices to work reliably, efficiently and meet security requirements (this is kind of a “must” when building modern, scalable, server-side applications) each individual microservice must have appropriate mechanisms that ensure this.
What is important here is that all aspects relating to proper communication and interaction between microservices are complex. Theoretically, they could be handled by the microservices' development teams through appropriate implementations within the services themselves, but this is not an optimal approach. Firstly, because these tasks are rather time-consuming, they require some effort and are therefore prone to errors. Secondly, developers should focus on implementing the actual business logic of the given microservice.
This is where the service mesh concept comes in handy. A service mesh is a kind of “network” that connects all the microservices within a given application. In fact, it is extra software that acts as an intermediary layer between services and provides functions such as service discovery, service-to-service authentication, load balancing and traffic management, monitoring, resilience, etc.
The logical architecture of a service mesh is quite simple because it only contains two components: a control plane and a distributed data plane (see Figure 1 below). The data plane consists of proxies installed next to each service instance. They basically act as L7 proxies (working in both “forward” and “reverse” mode) and handle calls to and from the services. When a particular service mesh implementation (such as Istio or Linkerd, for example) is deployed in the Kubernetes cluster the proxies are sidecar containers that run in the same Pods as the service containers. The control plane provides the core functionality of the service mesh system. In short, it configures and coordinates the behavior of the proxies.
Fig. 1 Service mesh architecture overview
So what can service mesh do for you? Well, depending on the specific implementation you want to use, the feature set may differ slightly. However, the key functions are laid out below, broken down into categories.
Routing and Traffic Management
- Performing dynamic service discovery and proxying/L7-load-balancing for various types of protocols, e.g. HTTP/1.x, HTTP2 (including gRPC), WebSocket and other TCP-based protocols.
- Supporting different load-balancing algorithms (e.g. those based on Round Robin, Least Connection, etc.) and mechanisms (percentage-based, header-based, path-based traffic splits, etc.); supporting deployments based on a canary strategy (splitting traffic based on application versions).
Reliability and Resilience
- Defining policies for request retries and timeouts (when calling the services instances).
- Launching a circuit breaker (a mechanism that allows you to block incoming connections to the application when the given conditions are met, for example the maximum number of concurrent connections has been reached or a failure threshold for an app has been exceeded).
- Enabling artificial fault injection (to test the resiliency of applications).
- Collecting various types of metrics for services, e.g. request volumes, success rates, latencies, etc.
- Tracing (e.g. gathering data needed to troubleshoot latency issues).
- Offering verbose logging capabilities.
- Integration with different observability backends, such as Prometheus (monitoring), Zipkin or Jaeger (tracing), Fluentd (logging), etc.
- Drawing service topology graphs.
- Enabling secure communication between service instances (by providing mutual TLS functionality).
- Providing advanced authentication and authorization mechanisms (sophisticated policies for enforcing fine-grained security control within the cluster of microservices).
There are many different service mesh implementations available today. They differ both in terms of maturity and the number of supported functionalities. Example solutions are:
You can get more information about each of them by following the corresponding links above.
How has the concept of service meshes been adopted so far? To answer this question, the CNCF Service Mesh Survey report will be our reference here. The survey was conducted between November and December 2021, with 253 members of the CNCF and Kubernetes communities, to discover how organizations are adopting service meshes.
The results of the survey show that 60% of survey respondents are running a service mesh in production and 10% in a development environment. Another 19% has begun the evaluation of such solutions. Only 9% stated that they had not started using service mesh solutions so far.
Fig. 2 Interest in service meshes amongst organizations (source: CNCF Service Mesh Survey)
The respondents were also asked what kinds of features offered by service meshes drive their organization’s adoption of such solutions. Security and observability capabilities were found to be of paramount importance (with 79% and 78% positive responses respectively). In third place (62%) traffic management functions (e.g. advanced L7 routing and load balancing) were indicated. And finally, reliability and resilience functions came fourth (56%).
Fig. 3 Features driving organizations’ adoption of service meshes (source: CNCF Service Mesh Survey)
In order to see the full results of the survey you can read the official CNCF report.
Interest in service meshes is growing year on year. Introducing this component to the architecture of your products and solutions certainly requires engineering expertise and some integration effort. It is therefore not a trivial task. However, it may be the right step considering the functional benefits it can bring and the problems it can help you solve. Therefore, it is worth paying attention to this concept.