Table of Contents
1. Summary
The service mesh plays an essential role in cloud-native development, enabling fast, secure communication across microservices-based applications. Typically implemented alongside the workload as a sidecar proxy, a service mesh eliminates the complexity, fragmentation, and security vulnerabilities of repeatedly coding service-to-service communications by outsourcing the management of requests to an out-of-process application. Moreover, implementing a service mesh affects operational procedures and requires DevOps personnel to become familiar with new concepts and technologies.
As service mesh is an emerging technology undergoing rapid innovation, decision makers must take the time to carefully evaluate the landscape, considering the additional complexity, latency, and resource consumption involved. With a variety of open-source and commercial vendors targeting a broad range of application environments and deployment options, this GigaOm radar report provides an overview of the service mesh landscape based on the following table stakes, which are mature, stable solution features common across all service meshes:
- Dedicated infrastructure layer: Delivering fast, reliable, and secure service-to-service communications, a service mesh is a dedicated infrastructure layer fully integrated within the distributed application to control the delivery of service requests. The infrastructure layer provides several functions, including service discovery, authentication and authorization, health checks, failure recovery, load balancing, and observability via the data plane.
- Sidecar implementation: Like a sidecar attached to a motorcycle, a service mesh sidecar provides third-party functionality alongside the actual workload within the container. For example, a service proxy—such as Envoy—is attached to a workload during deployment to manage service-to-service communications within a service mesh. All management capabilities required by the workload—including monitoring, control, and security—are implemented without changing a single line of application code. However, it should be noted that innovation is taking place around optimized low-resource, low-latency sidecar-less architectures—including the extended Berkeley Packet Filter (eBPF), Istio Ambient Mesh—offering full interoperability with Envoy sidecars.
- Control plane configuration: Comprising a set of APIs and tools controlling proxy behavior across the mesh, the control plane automatically configures data plane service proxies. The control plane transforms a collection of isolated, stateless sidecar proxies into a distributed system and implements policies across all data planes within the mesh.
- Control plane telemetry: Besides configuring and managing proxies used to route traffic and enforce policies, the control plane collects telemetry data for each request. The detailed statistics, logging, and distributed tracing data provide observability into service behavior for troubleshooting, maintenance, and service optimization.
With different service mesh options and a rapidly evolving landscape, choosing the best service mesh for your organization depends on your use cases, existing software stack, architectural choices, and in-house capabilities. In addition, your internal resources and skill sets will influence your decision on whether you adopt a lightweight, developer-friendly service mesh or a full-featured solution requiring professional services. Figure 1 provides a list of service meshes included in this report and their acquisition options.
Note: Providing governance for open-source, vendor-neutral cloud-native projects, the Cloud Native Computing Foundation (CNCF) hosts several community-driven open-source projects with varying maturity levels: sandbox (early stage), incubating (stable), or graduated (widely deployed in production environments).
Figure 1. Service Mesh Projects and Vendors
This GigaOm Radar report provides an overview of notable service mesh projects and vendors and their available offerings. The corresponding GigaOm report “Key Criteria for Evaluating Service Mesh Solutions” outlines critical criteria and evaluation metrics for selecting a service mesh. Together, these reports offer essential insights for cloud-native application development initiatives, helping decision makers evaluate solutions before deciding where to invest.
How to Read this Report
This GigaOm report is one of a series of documents that helps IT organizations assess competing solutions in the context of well-defined features and criteria. For a fuller understanding, consider reviewing the following reports:
Key Criteria report: A detailed market sector analysis that assesses the impact that key product features and criteria have on top-line solution characteristics—such as scalability, performance, and TCO—that drive purchase decisions.
GigaOm Radar report: A forward-looking analysis that plots the relative value and progression of vendor solutions along multiple axes based on strategy and execution. The Radar report includes a breakdown of each vendor’s offering in the sector.
Solution Profile: An in-depth vendor analysis that builds on the framework developed in the Key Criteria and Radar reports to assess a company’s engagement within a technology sector. This analysis includes forward-looking guidance around both strategy and product.
2. Market Categories and Deployment Models
To better understand the market and vendor positioning (Table 1), we assess how well an open-source or vendor service mesh supports different target markets and deployment models.
For the service mesh sector, we recognize five target markets:
- Cloud service provider (CSP): Providers delivering on-demand, pay-per-use services to customers over the internet, including infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS).
- Network service provider (NSP): Service providers selling network services—such as network access and bandwidth—provide entry points to backbone infrastructure or network access points (NAP). In this report, NSPs include data carriers, ISPs, telcos, and wireless providers.
- Managed service provider (MSP): Service providers delivering managed application, communication, IT infrastructure, network, and security services and support for businesses at either the customer premises or via MSP (hosting) or third-party data centers (colocation).
- Small-to-medium business (SMB): Small businesses (<100 employees) and medium-sized businesses (100-1,000 employees) with limited budgets and constrained in-house resources for planning, building, deploying, and managing their applications, IT infrastructure, networks, and security in either an on-premises data center or a colocation facility.
- Large enterprise: Enterprises of 1,000 or more employees with dedicated IT teams responsible for planning, building, deploying, and managing their applications, IT infrastructure, networks, and security in either an on-premises data center or a colocation facility.
For the service mesh sector, we recognize various deployment models, including single or multiple clusters, single or multiple networks, single or multiple control planes, and single or multiple meshes:
- Single or multiple cluster: Service meshes can be configured as either a single cluster or as a single mesh including multiple clusters. A single cluster deployment may offer simplicity, but it lacks features such as fault isolation, failover, and project isolation available in a multicluster deployment.
- Single or multiple network: Workload instances directly connected without using a gateway reside in a single network, enabling the uniform configuration of service consumers across the mesh. A multinetwork approach allows a service mesh to span various network topologies or subnets, providing compliance, isolation, high availability, and scalability.
- Single or multiple control plane: The control plane configures all communication between workload instances within the mesh. Deploying multiple control planes across clusters, regions, or zones provides configuration isolation, fine-grained control over configuration rollouts, and service-level isolation. Moreover, in the event one control plane becomes unavailable, the impact of the outage is limited to the workloads managed by that control plane.
- Single or multiple mesh: While a single mesh can span one or more clusters or networks, service names are unique within the mesh. Since namespaces are used for tenancy, a federated mesh is required to discover services and enable communication across mesh boundaries. Additionally, each mesh reveals services that can be consumed by other services, providing line-of-business boundaries and isolation between test and production workloads.
Table 1. Vendor Positioning
Market Segment |
Deployment Model |
||||||||
---|---|---|---|---|---|---|---|---|---|
CSP | NSP | MSP | SMB | Large Enterprise | Single or Multiple Cluster | Single or Multiple Network | Single or Multiple Control Plane | Single or Multiple Mesh | |
Anthos Service Mesh (Google) | |||||||||
Aspen Mesh (F5 Networks) | |||||||||
AWS Service Mesh (Amazon) | |||||||||
Cilium (CNCF) | |||||||||
Gloo Mesh (Solo.io) | |||||||||
greymatter.io | |||||||||
HashiCorp Consul (HashiCorp) | |||||||||
Istio | |||||||||
Kong Mesh (Kong) | |||||||||
Kuma (CNCF) | |||||||||
Linkerd (CNCF) | |||||||||
Network Service Mesh (CNCF) | |||||||||
NGINX Service Mesh (F5 Networks) | |||||||||
Open Service Mesh (CNCF) | |||||||||
OpenShift Service Mesh (Red Hat) | |||||||||
Tanzu Service Mesh (VMware) | |||||||||
Traefik Mesh (Traefik Labs) |
|
Exceptional: Outstanding focus and execution |
|
Capable: Good but with room for improvement |
|
Limited: Lacking in execution and use cases |
|
Not applicable or absent |
3. Key Criteria Comparison
Following the general criteria introduced in GigaOm’s “Key Criteria for Evaluating Service Mesh Solutions,” Tables 2, 3, 4, and 5 summarize how well each vendor included in this research performs in the areas we consider differentiating and critical for the sector.
- Key criteria differentiate solutions based on features and capabilities, outlining the primary criteria to be considered when evaluating a service mesh, including built-in resilience, converged security, and AIOps automation.
- Evaluation metrics provide insight into the impact of each product’s features and capabilities on the organization, reflecting fundamental aspects including configurability, interoperability, and observability.
- Emerging technologies and trends identify the most compelling and potentially impactful technologies emerging in a product or service sector over the next 12 to 18 months.
- Specific service mesh capabilities differentiate one service mesh from another based on the specific functionality required to deliver fast, resilient, and secure service-to-service communications.
The objective is to give the reader a snapshot of the technical capabilities of available solutions, define the perimeter of the market landscape, and gauge the potential impact on the business.
Table 2. Key Criteria Comparison
Key Criteria |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Platform Support | Service Proxy | Resource Consumption | Flexible Routing | Low Latency | Built-In Resilience | Converged Security | AIOps Automation | Service Mesh as a Service | |
Anthos Service Mesh (Google) |
|
|
|
|
|
|
|
|
|
Aspen Mesh (F5 Networks) |
|
|
|
|
|
|
|
|
|
AWS Service Mesh (Amazon) |
|
|
|
|
|
|
|
|
|
Cilium (CNCF) |
|
|
|
|
|
|
|
|
|
Gloo Mesh (Solo.io) |
|
|
|
|
|
|
|
|
|
greymatter.io |
|
|
|
|
|
|
|
|
|
HashiCorp Consul (HashiCorp) |
|
|
|
|
|
|
|
|
|
Istio |
|
|
|
|
|
|
|
|
|
Kong Mesh (Kong) |
|
|
|
|
|
|
|
|
|
Kuma (CNCF) |
|
|
|
|
|
|
|
|
|
Linkerd (CNCF) |
|
|
|
|
|
|
|
|
|
Network Service Mesh (CNCF) |
|
|
|
|
|
|
|
|
|
NGINX Service Mesh (F5 Networks) |
|
|
|
|
|
|
|
|
|
Open Service Mesh (CNCF) |
|
|
|
|
|
|
|
|
|
OpenShift Service Mesh (Red Hat) |
|
|
|
|
|
|
|
|
|
Tanzu Service Mesh (VMware) |
|
|
|
|
|
|
|
|
|
Traefik Mesh (Traefik Labs) |
|
|
|
|
|
|
|
|
|
|
Exceptional: Outstanding focus and execution |
|
Capable: Good but with room for improvement |
|
Limited: Lacking in execution and use cases |
|
Not applicable or absent |
Table 3. Evaluation Metrics Comparison
Evaluation Metrics |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Openness | Interoperability | Configurability | Extensibility | Observability | Manageability | Vendor Support | Pricing & TCO | Vision & Roadmap | |
Anthos Service Mesh (Google) |
|
|
|
|
|
|
|
|
|
Aspen Mesh (F5 Networks) |
|
|
|
|
|
|
|
|
|
AWS Service Mesh (Amazon) |
|
|
|
|
|
|
|
|
|
Cilium (CNCF) |
|
|
|
|
|
|
|
|
|
Gloo Mesh (Solo.io) |
|
|
|
|
|
|
|
|
|
greymatter.io |
|
|
|
|
|
|
|
|
|
HashiCorp Consul (HashiCorp) |
|
|
|
|
|
|
|
|
|
Istio |
|
|
|
|
|
|
|
|
|
Kong Mesh (Kong) |
|
|
|
|
|
|
|
|
|
Kuma (CNCF) |
|
|
|
|
|
|
|
|
|
Linkerd (CNCF) |
|
|
|
|
|
|
|
|
|
Network Service Mesh (CNCF) |
|
|
|
|
|
|
|
|
|
NGINX Service Mesh (F5 Networks) |
|
|
|
|
|
|
|
|
|
Open Service Mesh (CNCF) |
|
|
|
|
|
|
|
|
|
OpenShift Service Mesh (Red Hat) |
|
|
|
|
|
|
|
|
|
Tanzu Service Mesh (VMware) |
|
|
|
|
|
|
|
|
|
Traefik Mesh (Traefik Labs) |
|
|
|
|
|
|
|
|
|
|
Exceptional: Outstanding focus and execution |
|
Capable: Good but with room for improvement |
|
Limited: Lacking in execution and use cases |
|
Not applicable or absent |
Table 4. Emerging Technologies and Trends Comparison
Emerging Technologies |
||||
---|---|---|---|---|
WebAssembly | Open Policy Agent | 5G & Edge | eBPF | |
Anthos Service Mesh (Google) | ||||
Aspen Mesh (F5 Networks) | ||||
AWS Service Mesh (Amazon) | ||||
Cilium (CNCF) | ||||
Gloo Mesh (Solo.io) | ||||
greymatter.io | ||||
HashiCorp Consul (HashiCorp) | ||||
Istio | ||||
Kong Mesh (Kong) | ||||
Kuma (CNCF) | ||||
Linkerd (CNCF) | ||||
Network Service Mesh (CNCF) | ||||
NGINX Service Mesh (F5 Networks) | ||||
Open Service Mesh (CNCF) | ||||
OpenShift Service Mesh (Red Hat) | ||||
Tanzu Service Mesh (VMware) | ||||
Traefik Mesh (Traefik Labs) |
|
Exceptional: Outstanding focus and execution |
|
Capable: Good but with room for improvement |
|
Limited: Lacking in execution and use cases |
|
Not applicable or absent |
Table 5. Specific Service Mesh Capabilities Comparison
Specific Service Mesh Capabilities |
|||||||
---|---|---|---|---|---|---|---|
Service Discovery | Load Balancing | Encryption | Circuit Breaker | Fault Injection | Advanced Routing | Distributed Tracing | |
Anthos Service Mesh (Google) | |||||||
Aspen Mesh (F5 Networks) | |||||||
AWS Service Mesh (Amazon) | |||||||
Cilium (CNCF) | |||||||
Gloo Mesh (Solo.io) | |||||||
greymatter.io | |||||||
HashiCorp Consul (HashiCorp) | |||||||
Istio | |||||||
Kong Mesh (Kong) | |||||||
Kuma (CNCF) | |||||||
Linkerd (CNCF) | |||||||
Network Service Mesh (CNCF) | |||||||
NGINX Service Mesh (F5 Networks) | |||||||
Open Service Mesh (CNCF) | |||||||
OpenShift Service Mesh (Red Hat) | |||||||
Tanzu Service Mesh (VMware) | |||||||
Traefik Mesh (Traefik Labs) |
|
Exceptional: Outstanding focus and execution |
|
Capable: Good but with room for improvement |
|
Limited: Lacking in execution and use cases |
|
Not applicable or absent |
By combining the information provided in the tables above, the reader can understand the technical solutions available in the market.
4. GigaOm Radar
This report synthesizes the analysis of key criteria and their impact on evaluation metrics to generate the GigaOm Radar in Figure 2. The radar is a forward-looking perspective on all the vendors in this report based on their products’ technical capabilities and feature sets.
The GigaOm Radar plots vendor solutions across a series of concentric rings, with those set closer to the center judged to be of higher overall value. The chart characterizes each vendor on two axes—Maturity versus Innovation and Feature Play versus Platform Play—while the length of the arrow indicates the predicted evolution of the solution over the coming 12 to 18 months.
Figure 2. GigaOm Radar for Service Mesh
As seen in Figure 2, there are six Leaders (greymatter.io, HashiCorp Consul, Istio, Linkerd, Gloo Mesh, and Tanzu Service Mesh), eight Challengers (AWS App Mesh, Aspen Mesh, Anthos Service Mesh, Kong Mesh, Kuma, Network Service Mesh, OpenShift Service Mesh, and Traefik Mesh), and three New Entrants (Cilium Service Mesh, NGINX Service Mesh, and Open Service Mesh).
It should be noted that Maturity (that is, being positioned in the top two quadrants) does not exclude Innovation. Instead, it identifies the solution as being proven in a production setting compared to a newer solution undergoing innovation to achieve customer acceptance and adoption. In addition, the length of the arrow (Forward Mover, Fast Mover, or Outperformer) is based on customer adoption and execution against roadmap and vision (based on project or vendor input from last year’s report and in comparison to improvements made across the industry in general).
Furthermore, positioning in the Platform-Play quadrants indicates that the service mesh includes the functionality generally expected from a service mesh and can be deployed on a wide range of platforms even if the project or vendor is focused on a limited set of use cases. In contrast, some service meshes are positioned in the Feature-Play quadrants for the following reasons:
- The service mesh supports a limited range of platforms (AWS App Mesh, Anthos Service Mesh, and OpenShift Service Mesh).
- The service mesh has a limited set of features (Network Service Mesh and Open Service Mesh).
- The service mesh includes the functionality generally expected from a service mesh but has an architecture that differs from that defined in the table stakes (Traefik Mesh).
New additions to the list of service meshes compared to the 2021 GigaOm Radar for Service Mesh are: AWS App Mesh, Cilium Service Mesh, Anthos Service Mesh, Network Service Mesh, Open Service Mesh, and OpenShift Service Mesh.
Gloo Mesh, greymatter.io, Kong Mesh, and Linkerd are recognized as Outperformers. Gloo Mesh continues to be the leading Istio-based service mesh, incorporating Istio Ambient Mesh’s sidecar-less architecture, built-in best practices for extensibility and security, and simplified, centralized Istio and Envoy lifecycle management. Pushing the boundaries through continuous innovation, greymatter.io offers exceptional Layer 3, 4, and 7 visibility, unmatched intelligence, built-in support for emerging use cases, and automated performance optimization. A highly portable, cloud agnostic full-stack platform running everywhere, Kong Mesh offers ease of use and built-in automation capabilities as an alternative to more complex open-source solutions that are difficult to deploy and manage. Last but not least, Linkerd continues to gain rapid adoption due to its being ultralight, ultrafast, and operationally simple to deploy.
One service mesh to keep an eye on is Cilium Service Mesh. Competing with Istio Ambient Mesh, the Cilium project is one of the first service meshes to offer the flexibility of running a service mesh in either a sidecar model leveraging the Istio control plane or a sidecar-less model with a choice of control planes for increased efficiencies. While the jury is still out on the benefits and risks of incorporating the eBPF into a service mesh, several projects and vendors are either doing so or have included it on their roadmaps.
Since publishing the 2021 GigaOm Radar for Service Mesh, greymatter.io has moved from being a New Entrant to a Leader due to its rapid innovation, while Kong Mesh has moved from being a Fast Mover to an Outperformer due to its execution against its roadmap. The positioning of NGINX Service Mesh has been corrected from Challenger to a New Entrant. In addition, decision makers should be aware of the current uncertainty surrounding the future of NGINX Service Mesh and should wait for further clarification from F5 Networks.
Inside the GigaOm Radar
The GigaOm Radar weighs each vendor’s execution, roadmap, and ability to innovate to plot solutions along two axes, each set as opposing pairs. On the Y axis, Maturity recognizes solution stability, strength of ecosystem, and a conservative stance, while Innovation highlights technical innovation and a more aggressive approach. On the X axis, Feature Play connotes a narrow focus on niche or cutting-edge functionality, while Platform Play displays a broader platform focus and commitment to a comprehensive feature set.
The closer to center a solution sits, the better its execution and value, with top performers occupying the inner Leaders circle. The centermost circle is almost always empty, reserved for highly mature and consolidated markets that lack space for further innovation.
The GigaOm Radar offers a forward-looking assessment, plotting the current and projected position of each solution over a 12- to 18-month window. Arrows indicate travel based on strategy and pace of innovation, with vendors designated as Forward Movers, Fast Movers, or Outperformers based on their rate of progression.
Note that the Radar excludes vendor market share as a metric. The focus is on forward-looking analysis that emphasizes the value of innovation and differentiation over incumbent market position.
5. Vendor Insights
Anthos Service Mesh: Google
Announced in September 2019, Anthos Service Mesh (ASM) is Google’s Istio implementation available either with Anthos (Google’s platform for modernizing applications and accelerating hybrid cloud adoption) or as a standalone service with Google APIs used to determine billing. Providing service management for Anthos, ASM is a limited Anthos-tested Istio distribution enabling customers to deploy a fully supported service mesh on-premises using Google Kubernetes Engine (GKE) On-Prem, on Google Cloud, or as a hybrid solution. ASM leverages Istio APIs and core components to deliver agility, observability, and security for services deployed to Anthos GKE or hybrid cloud and on-premises deployments with container- and virtual machine (VM)-based services.
Figure 3. Anthos Service Mesh at-a-Glance
Replacing Istio on GKE, Google offers Anthos Service Mesh as an on-premises, in-cluster control plane, a fully managed service mesh, or as a hybrid service mesh spanning both Google Cloud and on-premises deployments. Catering to the needs of existing VMware customers with familiar management and operating environments, the on-premises version uses GKE On-Prem running on top of VMware vSphere on customer hardware.
The fully managed version of ASM provides an optionally managed data plane and a Google-managed control plane operating outside of Anthos GKE clusters, reducing the management overhead while ensuring the highest possible availability. Minimizing manual user maintenance, Google manages the control plane’s availability, scalability, and security, including software patching and upgrades. In addition, using the Google-managed control plane simplifies multicluster mesh configuration and reduces the Kubernetes Engine privileges needed to install Anthos Service Mesh.
The managed ASM control plane comprises Traffic Director, Managed CA, and Google Cloud’s operations tooling (formerly Stackdriver). In addition to directing service mesh ingress and egress traffic, Google Cloud’s fully managed traffic control plane, Traffic Director, translates Istio API objects into configuration information for the distributed proxies. Managed CA is a centralized certificate authority (CA) responsible for providing authentication information and secure sockets layer (SSL) certificates to each distributed proxy.
Google Cloud’s operations tooling provides a managed ingestion point for observability and telemetry. The monitoring, tracing, and logging data generated by each proxy powers the Anthos Service Mesh Observability dashboard, enabling service operators to visually inspect services and service dependencies and implement site-reliability engineering (SRE) best practices for establishing service-level objectives (SLOs) and monitoring service-level indicators (SLIs).
The Google-managed data plane is enabled by simply adding an annotation to the namespaces, which installs an in-cluster controller to manage the sidecar proxies. The data plane is deployed as a set of distributed proxies that mediate all inbound and outbound network traffic between individual services. The proxies are configured using a centralized control plane and an open API, enabling the automation of everyday networking tasks, including implementing traffic splitting or steering between services and enabling service-to-service authentication and encryption.
While fully managed ASM reduces the need for in-house resources and increases availability and stability, it has numerous limitations, including no support for custom Envoy filters, IPv6, TCP in-proxy cloud monitoring, whitebox sidecars, or multinetwork environments. Environments external to the Google Cloud—including Anthos on-premises, Anthos on other public clouds, Amazon EKS, Microsoft AKS, and other Kubernetes (K8s) clusters—are not supported. Tracing is limited to Google Cloud Trace, with Jaeger and Zipkin tracing available only as a customer-managed option. In addition, all GKE clusters must be in a single region with a limit of 1000 services and 5000 workloads per cluster.
Strengths: Fully managed Anthos Service Mesh delivers basic service mesh capabilities for existing Google Anthos customers. Leveraging GKE On-Prem, the on-premises version caters mainly to existing VMware customers looking for familiar management and operating environments.
Challenges: Tying users to the Google ecosystem, Anthos Service Mesh is a light version of Istio with numerous features removed, including support for Istio CA and Istio Operator. Some elements—such as Anthos Service Mesh certificate authority (Mesh CA) and the Anthos Service Mesh dashboards in the console—aren’t available in all Anthos environments. In addition, while you can install Anthos Service Mesh if you aren’t an Anthos subscriber, certain UI elements and features in the Google Cloud console are available only to Anthos subscribers. Potential users should carefully evaluate ASM’s limitations before initiating a PoC, especially considering the uncertainty surrounding the future of Anthos given its limited adoption.
Aspen Mesh: F5 Networks
A startup incubated within F5 Networks, Aspen Mesh was released in December 2017 as a fully supported carrier-grade, production-ready, and security-hardened Istio-based service mesh distribution built to handle complex mature K8s infrastructures. Supporting mobile service providers requiring dual-stack, IPv4/IPv6 ingress and egress for control, data, and signaling, Aspen Mesh incorporates multicloud zero-trust security, compliance policy enforcement, protocol-level observability, and SRE-based application optimization. Leveraging F5 Networks’ global infrastructure, Aspen Mesh offers 24×7 white glove and concierge support for production environments, with follow-the-sun options and on-demand native-speaking support engineers.
Figure 4. Aspen Mesh at-a-Glance
Aspen Mesh reduces the complexity of Istio through lifecycle management, long-term support (LTS) releases, and additional services, adding advanced features to the open-source distribution. These include simplified mTLS management, fine-grained role-based access control (RBAC), Istio Vet (for discovering incompatible user application and Istio component configuration in a K8s cluster), and single sign-on (SSO). In addition, objective-driven, artificial intelligence and machine learning (AI/ML)-powered insight recognition policy frameworks allow users to specify, measure, and enforce business goals.
A cloud-native dashboard offers an intuitive user experience, simplifying day-to-day operations and making it possible to securely run thousands of containers with standardized deployment, scaling, security policy enforcement, and issue resolution. Supporting a distributed and highly scalable, data-driven infrastructure, Aspen Mesh’s observability framework, Rapid Resolve, uses robust data analytics and ML designed to deliver actionable insights in real-time and reduce the mean-time-to-resolution (MTTR) with advanced troubleshooting and environment reporting capabilities. In addition, Aspen Mesh’s Packet Inspector provides protocol-level observability with telemetry data delivered in standardized formats for the telecom industry.
A joint solution with F5, BIG-IP Next Service Proxy for Kubernetes (BIG-IP Next SPK) brings critical network capabilities to a K8s environment, meeting the demands of a service provider network. BIG-IP Next SPK supports ingress/egress control for 4G and 5G signaling while streamlining transitions to both standalone (5G-SA) and non-standalone 5G (5G-NSA) while leveraging investments in 4G. The solution offers authentication, encryption, observability, security, policy management, and packet capture of east/west traffic within each 5G core K8s cluster. At the same time, a per-service secure proxy and firewall protect north/south traffic flowing into and out of containerized 5G services. In addition, Aspen Mesh has added customized Istio capabilities, including Elliptic Curve Cryptography and advanced certificate management.
One of the primary contributors to the Istio and Envoy communities, Aspen Mesh holds seats on the Istio Technical Oversight and Steering Committees and was the first non-founding vendor to release and manage a version of Istio. However, it should be noted that F5 Networks supports two service meshes—Aspen Mesh and NGINX Service Mesh—on the premise that customers often hold strong opinions on their choice of infrastructure stack based on existing investments in underlying technologies. While both support K8s clusters, F5 provides options for either standardizing on NGINX infrastructure or adopting a service mesh based on open-source Istio and Envoy.
Strengths: As an F5 Networks incubation, Aspen Mesh is uniquely positioned to help network service providers address the challenges of transitioning to 5G and cloud-native technologies. Aspen Mesh and F5 Networks are the only ones delivering an Istio-based solution that’s able to be deployed as part of a 5G-SA or 5G-NSA core supporting the migration and deployment of 4G services on a 5G core. Currently in beta testing, the hosted Aspen Mesh SaaS Platform offers multicluster and multicloud capabilities with 360° performance insights, health monitoring, operational support, and real-time SLO metric tracking for Aspen Mesh users.
Challenges: With several vendors providing enterprise-grade support for Istio, Aspen Mesh must find ways to differentiate itself for enterprise customers. While BIG-IP Next SPK is a crucial differentiator for NSPs, Aspen Mesh needs to simplify the Istio experience for mature enterprises with complex infrastructures and develop actionable, machine-assisted insights to help address customers’ challenges.
AWS App Mesh: Amazon
Launched at AWS re:Invent 2018, AWS App Mesh is a fully managed service bringing the benefits of a service mesh to Amazon Web Services (AWS) customers using compute and container services. Providing application-level networking for running applications at scale, AWS App Mesh can be used with microservice containers managed by Amazon Elastic Container Services (ECS), Amazon Elastic Container Service for Kubernetes (EKS), AWS Fargate, Kubernetes on AWS, and services running on Amazon Elastic Compute Cloud (EC2). App Mesh also integrates with AWS Outposts for applications running on-premises. In addition, App Mesh uses a customized version of the open-source Envoy proxy, making it compatible with a wide range of AWS partner and open-source tools.
Figure 5. AWS App Mesh at-a-Glance
Supporting both containers and VMs, AWS App Mesh creates an abstraction layer based on virtualized nodes, routers, routes, and services. The App Mesh control plane is designed to support AWS compute services, and the Envoy proxy is customized to support the control plane. Users include the proxy as part of each microservice’s task or pod definition and configure the service’s application container to communicate directly with the proxy. Agent for Envoy monitors Envoy proxies and helps keep them healthy, making applications more resilient to failures. In addition, App Mesh provides an API to configure traffic routes and other controls among mesh-enabled microservices, allowing users to route traffic based on path or weights to specific service versions.
Customers can leverage App Mesh by adding the Envoy proxy image to the task definition (Amazon ECS and AWS Fargate) using the open-source AWS App Mesh controller, either mutating the webhook admission controller (EKS) or running the Envoy proxy as a container or process on an EC2 instance and redirecting network traffic through the proxy. When each service starts, the proxy automatically connects with the control plane and is configured by App Mesh. Once configured, App Mesh manages the Envoy configuration to provide service mesh capabilities, automatically load balancing traffic from all clients in the mesh, and exporting metrics, logs, and traces to the endpoints specified in the Envoy bootstrap configuration.
App Mesh uses mutual transport layer security (mTLS) for service-to-service transport layer authentication, allowing customers to extend the security perimeter by provisioning certificates from AWS Certificate Manager Private Certificate Authority or a customer-managed CA, enforcing automatic authentication for client applications connecting to services. In addition, the telemetry generated by AWS App Mesh—such as error rates and connections per second—can be exported to Amazon CloudWatch and AWS X-Ray, or streamed to third-party monitoring services, including Flagger, Grafana, Jaeger, Prometheus, and Splunk, as well as open-tracing solutions like LightStep and Zipkin.
There is no additional charge for using AWS App Mesh. Customers pay only for the AWS resources (EC2 instances or requested Fargate CPU and memory) consumed by the Envoy proxy deployed alongside their containers.
Strengths: AWS was the first major cloud provider to implement a native, K8s-pluggable service mesh. A highly available managed service, AWS App Mesh is fully integrated into the AWS landscape, making it easy for customers to monitor and manage communications for microservices without needing to install or manage additional application-level infrastructure. The extensive AWS ecosystem, installed base, and market position will continue to drive the adoption and development of AWS App Mesh.
Challenges: As a managed service, AWS App Mesh is limited to support for applications running on AWS and cannot be migrated to other environments. In addition, App Mesh is proprietary, uses a customized version of Envoy, does not support the Service Mesh Interface (SMI), and can be more complex to set up than other K8s-native service meshes.
Cilium Service Mesh: CNCF Project
Created by Isovalent and donated to the CNCF as an incubation project in October 2021, Cilium is an open-source plug-in providing networking, observability, and security for bare metal servers, K8s clusters and other container orchestration platforms, and VMs. As a container network interface (CNI), Cilium uses eBPF to dynamically insert powerful control logic into the Linux kernel, enabling Cilium security policies to be applied and updated without requiring any changes to the application code or container configuration. Launched in July 2022, Cilium Service Mesh extends Cilium’s capabilities to the application protocol level and is the first service mesh to offer the flexibility of running either a sidecar model leveraging the Istio control plane or a sidecar-less model with a choice of control planes.
Figure 6. Cilium Service Mesh at-a-Glance
While a typical proxy-based service mesh decouples numerous functions—including service discovery, transport layer security (TLS), retries, and load balancing—from the application code and puts them in a sidecar, Cilium Service Mesh takes decoupling one step farther. Combining the Layer 7 policies, observability, and traffic management capabilities of the Envoy proxy with the kernel-level eBPF technology capabilities for Layer 4 and below network traffic, Cilium allows those same functions to be run per node rather than per pod.
Cilium Service Mesh allows enterprises to choose between an Envoy-based sidecar model and an Envoy plus eBPF-based sidecar-less model. The sidecar-less model supports both Envoy CRD and K8s Ingress as control plane options. The integrated Ingress controller, which leverages Envoy and eBPF, can be applied to traffic entering a K8s cluster and across clusters for rich Layer 7-aware load-balancing and traffic management, including path-based routing, TLS termination, and sharing a single load-balancer IP for multiple services. Future releases will support additional service mesh control planes, starting with SMI, Secure Production Identity Framework for Everyone (SPIFFE), and the K8s Gateway API and its GAMMA initiative for service mesh use cases.
The sidecar-less approach promises reduced complexity, lower latency, and more efficient resource consumption due to sidecar start-up and shut-down performance and contention. In addition, since many packets don’t need to be routed through the proxy to access Layer 7 information, performance is increased by passing them straight through eBPF to the network interface, reducing latency and accelerating pod start-up. However, packets can be routed whenever necessary via the Envoy proxy for Layer 7 termination.
A feature for power users, Cilium Service Mesh includes CiliumEnvoyConfig (CEC), a low-level abstraction for programming Envoy proxies directly with a new K8s custom resource definition (CRD) for advanced Layer 7 use cases to make the full Envoy feature set available to users. Simplifying the integration of additional service mesh control planes, CEC allows the Cilium Ingress Controller to specify Envoy listeners and other resources, making it possible to transparently redirect traffic destined to specific K8s services to these Envoy listeners.
In addition to supporting Jaeger, OpenTelemetry, and Prometheus, Cilium Service Mesh provides observability via Cilium Hubble and Cilium Tetragon, eBPF-based networking and security observability, and runtime enforcement platforms. Moreover, Cilium ClusterMesh allows services running across multiple clusters to be grouped as a single global service, providing the ability to see security events across multiple clusters. Finally, Isovalent—the original creator of the Cilium project—offers enterprise support and various enhancements, including advanced network observability and a highly-available DNS proxy.
Strengths: Cilium Service Mesh allows users to run a service mesh with or without sidecars based on availability, resource management, and security considerations. A choice of control planes offers a balance between simplicity (Kubernetes Ingress and Gateway API) and power (Envoy and Istio). Designed for platform teams responsible for deploying and managing distributed applications running in cloud deployments, Cilium Service Mesh reduces the need for solid networking skills by bringing increased control and visibility up the stack to the application layer.
Challenges: While deploying a service mesh and offloading work to eBPF when it makes sense is understandable, decoupling the proxy from the application in the sidecar-less model introduces an additional layer of operational and security complexity and unpredictability. In addition, the performance benefits of Cilium’s simpler, low latency, and efficient sidecar-less option may be offset by the default integration with complex, resource-intensive Istio for sidecar-based Layer 7 use cases. As the latest service mesh and the first to market with eBPF capabilities, interested parties would do well to explore the full implications of deploying per-host proxies and wait for alternative control plane options to be made available before initiating a proof of concept.
Gloo Mesh: Solo.io
Launched in early 2019, Gloo Mesh is a modern K8s-native control plane enabling the configuration and operational management of multiple heterogeneous service meshes across multiple clusters via a unified API. The Gloo Mesh API integrates with leading service meshes and abstracts away differences between their disparate APIs, streamlining the configuration, operation, and lifecycle management of multicloud, multimesh, and multitenant environments. Gloo Mesh comes in two editions: an open-source version with limited features and the commercial, enterprise-ready Gloo Mesh Enterprise, sold as a standalone product.
Figure 7. Gloo Mesh at-a-Glance
Gloo Mesh can be run either in its own cluster or co-located with an existing mesh, enabling global traffic routing, load balancing, access control, and centralized observability of multicluster environments. It discovers meshes and workloads and establishes a federated identity, facilitating the configuration of different service meshes through a single API. In addition, Gloo Mesh supports multiplatform service meshes spanning clouds and zones (including AWS App Mesh, open-source Istio, and CNCF’s Open Service Mesh), locality-aware routing, and cross-cluster failover supporting zero-trust networks.
An enhanced version of open-source Istio (as opposed to a fork), Gloo Mesh Enterprise also includes an extended version of the Envoy proxy. This capability enables the consistent configuration and orchestration of services across multiple VMs, clusters, clouds, and data centers from a single control point. Focusing on ease of use, Gloo Mesh Enterprise validates upstream Istio software and incorporates built-in best practices for extensibility and security, including role-based APIs.
In addition, Solo.io recently announced support for Istio Ambient Mesh, a new, layered Istio dataplane architecture offering simplified operations, broader application compatibility, and reduced costs. An alternative to Envoy sidecars, Istio Ambient Mesh splits Istio’s functionality into a secure overlay layer and a Layer 7 processing layer, each offering the relevant telemetry, traffic management, and zero-trust security capabilities. Fully interoperable with sidecar deployments, Ambient Mesh’s layered approach allows users to adopt Istio incrementally per namespace, transitioning first to a secure overlay before implementing full Layer 7 processing.
Gloo Mesh Enterprise includes a FIPS 140-2-ready Istio-based service mesh with automated service and API discovery enforcing zero-trust security with authentication, authorization, and encryption. The Gloo Mesh Gateway offers end-to-end encryption, security, and traffic control, incorporating traffic management into both east/west and north/south data transfer flows. In addition, Gloo Mesh extensions allow customers to extend and customize their API infrastructure with pre-built extensions and tooling for WebAssembly, plug-ins, and operators, extending custom Envoy proxy capabilities. A self-service portal enables developers to catalog, publish, and share APIs in a secure environment.
Solo.io also provides Gloo Edge, a decoupled control plane for the Envoy Proxy. It allows customers to iteratively add service-mesh capabilities to their cluster ingress without investing in a full-blown service mesh. Moreover, it integrates with Flagger (a delivery tool that automates the release process for Kubernetes workloads) for canary automation and natively with Consul, Istio, and Linkerd service-mesh implementations. Solo.io also offers Gloo GraphQL, the industry’s only implementation of the GraphQL engine embedded in Envoy.
Gloo Mesh is designed to simplify the operations and lifecycle management of multicloud, multimesh environments, providing both graphical and command-line UIs, multicluster observability, and debugging tools. Solo.io also created the WebAssembly Hub, a streamlined service for building, sharing, discovering, and deploying WASM extensions for managing traffic and delivering near-native performance of Envoy Proxy-based service meshes. Solo.io also offers a Cilium add-on module for Gloo Mesh, supporting eBPF-based sidecar acceleration and observability for a more cohesive, secure, and performant Layers 2 through 7 application networking architecture—with the option to adopt Cilium’s sidecar-less approach to service mesh in the future.
While the community supports the open-source version of Gloo Mesh, Gloo Mesh Enterprise provides production and LTS with patches and hotfixes for the last four releases (N-4) of validated upstream Istio implementations with dedicated SLAs. In addition to traditional support channels, Solo.io also provides a Slack-based support channel for customers.
Strengths: Gloo Mesh Enterprise is an Istio-based service mesh and management plane that simplifies and unifies the configuration, operation, and visibility of the service-to-service connectivity within distributed applications. Solo.io offers enhanced distributions of upstream open-source Istio (including FIPS, ARM, LTS) and Envoy Proxy, production support, and simplified, centralized Istio and Envoy lifecycle management for greenfield and brownfield environments. Solo.io is the first vendor to support Istio Ambient Mesh, offering a sidecar-less alternative for Gloo Mesh Enterprise.
Challenges: Despite being contributors to the Istio and Envoy projects and investing heavily in talent and innovation, Solo.io is still dependent on open-source Envoy and Istio for its core offerings. In addition, while Solo.io offers extended Istio support, forced periodic refreshes have the potential for disruption. However, with Istio’s move to the CNCF, Solo.io now has the opportunity to stamp its authority and take the lead in influencing Istio’s direction. Gloo Mesh currently lacks a robust GUI for less experienced development and operations teams or customers less comfortable with APIs. Though it has the expertise and installed base, Solo.io does not offer service mesh as a service (SMaaS).
greymatter.io
Developed in-house from the ground up and released in February 2019 by greymatter.io (previously Decipher Technology Studios), the greymatter.io platform is an enterprise-proven, application networking platform offering zero-trust security, exceptional Layer 3, 4, and 7 visibility, unmatched business intelligence, and automated performance optimization. Addressing many of the challenges introduced by a service-based architecture (SBA), the greymatter.io platform is built on cloud-native principles and open-source technologies, enabling granular service mesh-enabled observability, analytic heuristics and insights, and automation to optimize traffic throughput across on-premises, multicloud, or hybrid environments.
Figure 8. greymatter.io at-a-Glance
Bridging the gap between legacy and modern software applications, the platform comprises an internally developed control plane for SBAs and an Envoy-proxy sidecar data plane with extended filters for east/west internal traffic routing. An API gateway controls north/south traffic flows. The greymatter.io platform provides developer-friendly, template-driven declarative app network layer integration with CI/CD delivery pipelines spanning any on-premises and multicloud environments. In addition, the platform integrates with Open Policy Agent (OPA) for zero-trust, policy-based access control at every point on the mesh and is flexible and open enough to interoperate with other service meshes.
Designed to treat proxy-based service mesh telemetry as a source of business intelligence, greymatter.io leverages AI and ML to analyze data, including Layers 3, 4, and 7 network insights, for automated performance optimization and resource control. Powered by recurrent neural autoencoders, the platform’s anomaly-detection capabilities capture minute operational inconsistencies, predict potential issues, and alert users to inconsistencies via an intuitive contextual UI for remedial action.
The greymatter.io platform supports a variety of emerging use cases, including cybersecurity and data meshes. A cybersecurity mesh is a foundational layer enabling discreet security services to work together seamlessly, creating a dynamic security environment based on a zero-trust architecture. Enabling federated data ownership and distributed governance, a data mesh facilitates rapid zero-trust secure sharing of sensitive data objects and capabilities, including policy-based data provenance and lineage tracking. The platform works with third parties in both cases to enable intelligent network decision-making for enhanced cybersecurity and data protection.
Greymatter.io is designed to be platform agnostic and polyglot. The platform wraps existing IT investments in a ubiquitous Layers 3, 4, and 7 network, securely connecting existing operations and business support system (OSS/BSS) layers to cloud-native technologies. Capable of operating in any public, private, hybrid, multicloud, or container orchestration platform, the platform comes with built-in support for K8s, AWS EKS, AKS, OpenShift OCP, OKD, Konvoy, and bare metal. It is also container agnostic, supporting Docker, CoreOS, K8s, OpenShift, Rancher, and other containers—or no containers. In addition, the platform supports seamless integration with enterprise observability frameworks, including DataDog, Elasticsearch, Grafana, Jaeger, LightStep, Splunk, and Zipkin.
Delivering a comprehensive audit-compliance engine and SPIFFE/SPIRE identity authorization out of the box, greymatter.io provides service audit compliance reporting without special instrumentation. Real-time audit taps at Layers 3, 4, and 7 provide a single source of truth for every user and action on the mesh throughout the lifespan of each object.
Supporting qualified customers in both private hosted and public clouds, greymatter.io offers both SMaaS and fully managed services.
Strengths: In addition to providing a robust, enterprise-ready, container-agnostic, multi-environment platform, greymatter.io’s heuristics-based AI health sub-system offers insights into the overall health of the network with the ability to conduct root-cause analysis and discover new operational knowledge about how the network is being utilized. In addition, out-of-the-box GitOps infrastructure as code (IaC) capabilities enable the seamless and consistent application of service fixes and release upgrades while reducing operational risks such as workload configuration drift. Greymatter.io is experiencing rapid adoption across multiple market sectors due to its ability to support global multicloud, hybrid, and edge zero-trust cyber, data, and service mesh operations.
Challenges: Despite using the Consul and Envoy proxies, the greymatter.io platform does not yet provide an open-source solution, which limits widespread adoption and third-party innovation. In addition, greymatter.io is in a growth phase as it shifts from a small, bootstrapped company predominantly focused on US government and Department of Defense clients to a venture capital corporation supporting global customers spanning a variety of industry sectors. Finally, the company has embarked on a talent acquisition spree for marketing, R&D, and sales resources, but we expect the results of rapid onboarding to materialize only over the next 12 to 18 months.
HashiCorp Consul: HashiCorp
Developed internally from the ground up and released as a service mesh in October 2018, HashiCorp Consul provides consistent discovery capabilities and secure service-to-service communication across any environment. Initially designed as a simple service discovery and key/value store before containers became mainstream, Consul has evolved to become a full-featured service mesh for both containerized and non-containerized applications, allowing users to control north/south and east/west traffic patterns. As the primary maintainer, HashiCorp offers an open-source version of Consul and an enterprise version with additional functionality and support. In addition, HCP Consul is a fully managed service mesh as a service running on the HashiCorp Cloud Platform (HCP), offering push-button and self-service deployments.
Figure 9. HashiCorp Consul at-a-Glance
HashiCorp Consul provides a full-featured control plane with service discovery, configuration, dynamic load balancing, and segmentation functionality, allowing each feature to be used independently as needed. Closing the gap between applications and networking, Consul provides a step-by-step approach, allowing organizations to deploy service discovery and service registry before building out the service mesh implementation. It also offers networking infrastructure automation for dynamic IP environments. The platform works out of the box with a simple built-in Layer 4 proxy and supports third-party proxy integrations, including Envoy. Built on the Kubernetes Gateway API, the Consul API Gateway determines how clients interact with Consul service mesh applications. Moreover, unlike many other service meshes, Consul can run in a VM-only environment without requiring K8s.
Offered as either a self-hosted or managed solution—providing flexibility for enterprises of all sizes—HashiCorp Consul provides discovery and secure connectivity for any application running on any infrastructure or runtime. Consul enforces mutual authentication between services using ACLs, mTLS, and CA distribution, provides multitenancy capabilities, and supports granular traffic management rules based on service identity and request attributes. Additionally, Consul integrates with HashiCorp Vault, which includes using Vault’s CA to generate, store, and auto-rotate TLS certificates for both the HashiCorp Consul control and the data plane.
HashiCorp Consul also offers progressive delivery capabilities—supporting canary deployments, Layers 4 and 7 traffic management, and advanced observability—for containers, VMs, and bare-metal environments. While not a typical service mesh feature, Consul can also automate Layer 3 networking tasks, including dynamic firewalling, automated load balancing, and endpoint visibility. HashiCorp Consul integrates with Terraform for automating networking tasks called Consul-Terraform-Sync (CTS). As services scale or new services become available on the network, CTS will automatically update network load balancers and firewalls, enabling new services to be seamlessly discovered and consumed.
HashiCorp Consul provides a consistent view of all services on the network, irrespective of different programming languages and frameworks, for real-time services like health and location monitoring. Consul captures service-level data and presents it to users via a built-in UI or through integrations with third-party application tracing solutions, including Jaeger, OpenTelemetry, and Zipkin.
An extensible, multiplatform solution with flexible procurement options, HashiCorp Consul supports both on-premises (virtualized and bare metal) and cloud deployments, as well as multiple runtimes, including Amazon ECS, AWS Lambda, HashiCorp Nomad, K8s distributions, and VMs. It also offers native capabilities and integrations for proxies (including Envoy, HAProxy, and NGINX), ingress solutions (including Ambassador and Nginx), and application performance monitoring (APM) solutions such as AppDynamics, Datadog, Dynatrace, Grafana, Prometheus, and Splunk.
Strengths: HashiCorp Consul is a simple, flexible service mesh offering multicluster support and integrations with external non-service-mesh workloads. While the free, open-source version depends on community support, Consul’s enterprise version provides additional functionality supporting the core workflows of service-based networking. Unlike many other service meshes, Consul can run in a VM-only environment without requiring K8s. In addition, HashiCorp Consul is tightly integrated with HashiCorp’s portfolio, and the SMaaS offering, HCP Consul, is an attractive option for HashiCorp customers looking for push-button and self-service deployments.
Challenges: With only a small open-source community providing support for non-HashiCorp users, HashiCorp Consul’s primary value is for existing HashiCorp users wishing to incorporate K8s into their HashiCorp stack. Consul’s ecosystem is limited compared to its competitors, lacking support for K8s integrations such as Flagger, Prometheus service monitor, and OPA for authorization policies. In addition, HashiCorp Consul currently lacks AIOps automation and fault injection capabilities and support for WebAssembly (WASM), OPA, and eBPF, and 5G and edge use cases. HashiCorp is currently developing out-of-the-box observability capabilities and simplifying its current model for federating HashiCorp Consul data centers to eliminate customer complexity.
Istio
Released in May 2017, Istio was an ongoing collaboration between Google, IBM, Lyft, Red Hat, and other key contributors. However, following concerns expressed by IBM, Oracle, and the open-source and cloud-native communities over the project’s governance and Google’s decision to donate the trademark to the Open Usage Commons (OUC), the Istio Steering Committee announced its intention to join the CNCF as an incubating project in April 2022. The move replaces Google’s control over trademark and licensing with a neutral entity and the potential for broader adoption. In addition, the transition unites Istio with Envoy and K8s under a single umbrella and common governance.
Figure 10. Istio at-a-Glance
One of the more mature—and complex—service meshes available, Istio offers a rich feature set based on the Envoy Proxy, including dynamic service discovery, service-to-service authentication, load balancing, monitoring, policy creation, and traffic routing. In addition, the Istio project recently announced Istio Ambient Mesh, a layered, sidecar-less architecture offering seamless interoperability with the Istio sidecar-centric data plane. Istio Ambient Mesh allows users to mix and match sidecar and sidecar-less capabilities based on the specific needs of each application.
Designed for extensibility, Istio offers a robust, unified K8s-based control plane for managing both K8s and VM data planes, supporting a diverse range of deployment needs. It should be noted, however, that Istio does not have a built-in dashboard. Instead, a third-party solution, Kiali, has been designed as an add-on for managing, visualizing, validating, and troubleshooting Istio.
Istio has strong out-of-the-box identity-based authentication, authorization, and encryption capabilities, with service communications secured by default for consistent policy enforcement. Istio also offers fine-grained control of traffic behavior supporting A/B testing, canary rollouts, and staged rollouts with percentage-based traffic splits. It also provides out-of-the-box failure recovery features with advanced routing policies and management, including circuit breakers, failovers, fault injection, health checks, retries, and staged rollouts. Moreover, its configuration API and policy layer support access, quota, and rate controls, while detailed logs, metrics, and traces provide in-depth observability throughout the cluster with integrated, preconfigured Grafana and Prometheus dashboards for observability.
However, as new features and functions are added, Istio has become notoriously tricky to install, configure, and manage. The original idea of separating components based on operations and maintenance roles increased complexity and costs, especially for companies at which one person or team was responsible for the entire service mesh. Istio is addressing this complexity by abandoning its microservices architecture in favor of a monolithic approach, merging multiple, previously separate functions to simplify the service mesh and minimize the tradeoffs. While retaining its microservices approach with strict boundaries between the code and what were formerly independent services, Istio’s functions are presented to the cluster administrator as a single process.
Re-architecting Istio is ongoing, with a centralized, multicluster controller, additional enhancements for supporting VMs, and security and stability improvements included in recent releases. While this approach may be good from an engineering perspective, Istio’s quarterly release cycle may impact operational stability. Moreover, Istio’s complexity has resulted in a growing ecosystem with several vendors—including F5 (Aspen Mesh), Google (Anthos Service Mesh), Red Hat (OpenShift Service Mesh), Solo (Gloo Mesh), Tetrate (Tetrate Service Bridge), and VMware (Tanzu Service Mesh)—emerging to provide Istio-based service meshes underpinned by enterprise-grade services and support. Istio is also offered as a managed add-on for IBM Cloud.
Strengths: Due to the marketing efforts of Google and IBM, “Istio” is often used interchangeably with “service mesh,” positioning it as the go-to solution for adding observability, security, and traffic management to the cloud-native stack. Compared to other service meshes, Istio’s maturity, out-of-the-box features, adoption by major industry players, and incubation as a CNCF project will ensure its inclusion in any service mesh shortlist. In addition, Istio is offered as a managed service by F5 Networks, Google, IBM, and Red Hat for various environments. Tetrate provides a complete portfolio of design, deployment, and management services.
Challenges: Due to its advanced features and complex configuration requirements, Istio is not as user- or developer-friendly as other service meshes. It often requires either a dedicated team or third-party professional services to assist in costly, resource-intensive, and time-consuming implementations. Moreover, with the Istio project supporting only the three latest releases (N-2), a quarterly release cycle can be overwhelming for teams with limited capacity and skills. Istio can significantly impact development and operational budgets with extended deployment times and significant resource overhead.
Kong Mesh: Kong
Released for general availability in August 2020, Kong Mesh is a modern, enterprise-ready control plane for service mesh and microservices built on top of Envoy and Kuma, the open-source project authored by Kong and donated to the CNCF. Kong Mesh extends Kuma’s existing advanced feature set by including critical functionality for running enterprise workloads. Kong Mesh also provides additional service mesh features and integrations for the Kong Konnect platform, a full-stack connectivity platform delivered as-a-service for multicloud environments.
Figure 11. Kong Mesh at-a-Glance
Deployed as a turnkey service mesh via a single command, Kong Mesh allows multiple service meshes to be managed as tenants of a single control plane, increasing scalability and reducing operational costs. Once installed, Kong Mesh improves service connectivity via policies that can be added to each mesh, service, or attribute that qualifies a traffic path, accelerating developer efficiency, cost reduction, General Data Protection Regulation (GDPR) compliance, and zero-trust security.
The latest release includes integrations with OPA, the open-source, policy-as-code tool for Layer 7 policy support, automatic configuration of Envoy for FIPS 140-2 compliance, and authentication between global and isolated control planes. Furthermore, Kong Mesh automates the distribution of those policies throughout multicluster and multi-region deployments, eliminating the need for manual configuration. It also extends the service mesh and OPA to include legacy infrastructure such as VMs.
Focused on ease of use, Kong Mesh leverages Kuma to deliver a supported, multimesh product that can scale across teams and lines of business while simultaneously providing cross-cluster and cross-cloud connectivity for modern architectures. Accelerating configuration and deployment, Kong Mesh abstracts away the complexity of setting up a service mesh by encapsulating Envoy within its own processes. In addition, a native GUI provides quick visual feedback on what is happening in the system.
Supporting both K8s and VM workloads, Kong’s “run anywhere” philosophy allows Kong Mesh to be deployed across any environment, including multicluster, multicloud, and multiplatform. Organizations can either use Kong Mesh’s CRDs to natively manage service meshes in K8s or start with a service mesh in VM environments and migrate to K8s at their own pace. In a multizone deployment, Kong Mesh supports multiple environments without increasing complexity.
Automating distributed service mesh policy propagation, Kong Mesh’s universal mode provides advanced, multizone support with out-of-the-box discovery and connectivity of clouds, platforms, hybrid containers, and VMs, along with automatic policy reconciliation across multiple zones. Kong Mesh also supports zones on non-K8s containerized environments, including AWS ECS and AWS Fargate.
Strengths: Kong Mesh’s ease of use and built-in automation capabilities offers an alternative to some complex open-source solutions that are difficult to deploy and manage. Security-conscious enterprises will be attracted by Kong Mesh’s FIPS 140-2 compliance and consistent application of security policies across all environments. Kong’s customer reliability engineering (CRE) team offers 24x7x365 support using an industry-standard, follow the sun model for all Kong products. In addition, Kong claims to be the fastest growing second-wave service mesh based on GitHub starts.
Challenges: As a relatively new entrant dependent on a sandbox CNCF project, only time will tell if Kong’s goal of counteracting the trend for cloud vendors to release their own deeply embedded service meshes with a highly portable full-stack platform running everywhere will become a reality. Kong currently lacks a true SaaS connectivity platform allowing customers to centrally manage, deploy, and secure connectivity across their entire environment. In addition, while the company offers a “predictable and linear, pay-as-you-go” pricing model, calculating the cost based on the number of data plane proxies connected to the control plane is challenging. Kong needs to find a way to simplify its pricing model and make it easier for customers to calculate the potential total cost of ownership (TCO).
Kuma: CNCF Project
Created by Kong and donated to the CNCF as a sandbox project in June 2020, Kuma is an open-source service mesh using Envoy as the data plane proxy and a control plane developed by Kong. Built to support both greenfield and legacy enterprise applications, Kuma offers scalable, multizone connectivity across multiple clusters and clouds using bare metal, K8s, or VMs with one-click transparent proxying. In addition, Kuma automatically keeps an inventory of all data plane proxy sidecars running across every zone, allowing the service mesh to scale to any number of zones and sidecars.
Figure 12. Kuma at-a-Glance
Unlike other service mesh solutions, Kuma provides native support for both K8s and VMs on both control and data planes, with multimesh support spanning boundaries, including K8s namespaces. Designed for the enterprise architect, Kuma ships with both standalone and advanced multizone and multimesh support, enabling cross-zone communication across different clusters and clouds with its global control plane separation. In addition, flexible traffic routing can be applied to entire zones, individual services, or custom traffic paths using source and destination selectors.
Kuma’s architecture includes control plane separation, with each zone allocated its own horizontally scalable control plane to minimize the possibility of one zone affecting other zones if it goes down. The global control plane also automatically propagates service mesh policies across every zone, including automated handling of failures and reconciliations. While all zones are centrally managed through the unified, global control plane, each zone has its own control plane—that can also be scaled horizontally—so that policies can be rapidly applied to the zone’s data plane proxies. Kuma scales linearly and horizontally by adding more control planes, scaling to over 100,000 data planes spanning ten or more zones.
A single pane of glass for the entire enterprise, the global control plane can be integrated with existing CI/CD workflows via CRDs, an HTTP API, or Kuma’s command-line interface (CLI). With an out-of-the-box Layer 4 and Layer 7 policy architecture enabling discovery, decentralized load balancing, automated self-healing, observability, routing, traffic reliability, and zero-trust security, Kuma abstracts everyday use cases and automatically propagates service mesh policies across the infrastructure to support a multimesh, multitenant environment on the same control plane. In addition, out-of-the-box multicloud, multicluster, and multizone support with attribute-based policies provide automatic policy synchronization and connectivity to support custom workload attributes for GDPR and Payment Card Industry (PCI) compliance.
Kuma provides foundational authentication, authorization, encryption, and policy controls spanning environments, containers, and virtual machines. In addition, Kuma integrates natively with API gateways to support other authN/authZ schemes when exposing services to other applications, teams, or external parties at the edge.
Offering native service discovery, Kuma supports a wide range of containers, operating systems, and cloud infrastructures, each running either its own mesh or a hybrid service mesh running on bare metal, K8s, and VMs, with simplified migration between environments. Easy to use with no Envoy expertise required, Kuma packages Envoy with every installation, automatically injecting the sidecar proxy into workloads for global and remote deployment modes and native integration with API management solutions.
Strengths: While most service meshes prioritize K8s/container-driven applications, Kuma also supports any existing applications running on bare metal, K8s, or VMs. Kuma supports single and multiple clusters through its standalone and multizone deployment options without increasing deployment or management complexity. In addition to being deployed in Fortune 500 companies, Kong estimates Kuma to be the fastest growing of the second wave of service meshes based on GitHub public stars.
Challenges: While claiming to address the limitations of first-generation service mesh technologies by enabling seamless management of any service on the network, Kuma is a relatively new entrant compared to other cloud-native service meshes such as Consul, Istio, and Linkerd. Kuma’s success will depend mainly on its adoption by the open-source community and its promotion by Kong as the underlying technology of Kong Mesh. In addition, as a CNCF sandbox project, Kuma does not provide enterprise support.
Linkerd: CNCF Project
The original “service mesh” released in 2016, Linkerd is an open-source, CNCF-hosted security-first service mesh providing observability, reliability, and security for K8s applications running on bare metal or in the cloud without adding complexity. As the only CNCF-graduated service mesh, Linkerd offers an ultralight, ultrafast, and operationally simple approach to deploying a service mesh on any existing platform. Targeting every K8s adopter irrespective of the organization’s size, Linkerd installs in minutes, requires zero configuration, and can be added incrementally to an application without disruption. In addition, Linkerd comes with preconfigured, out-of-the-box Grafana and Prometheus dashboards and support for OpenTelemetry.
Figure 13. Linkerd at-a-Glance
Adopting a problem-centric approach, Linkerd’s strategy is to solve immediate, concrete problems—in as general a way possible—without attempting to build the ultimate platform addressing all use cases. While other service meshes trend toward adding features supporting multiple use cases but requiring extensive configuration and tuning, Linkerd concentrates on limited use cases to reduce its footprint, automate as much as possible, and minimize the operational burden.
Much of Linkerd’s simplicity can be attributed to its data plane implementation using the internally-developed Linkerd2-proxy—a lean, modern, scalable, and high-performance Rust-based network “micro-proxy”—rather than the commonly used Envoy Proxy. Since a fully-deployed service mesh can run thousands—or tens of thousands—of micro-proxies, the impact on resource consumption and latency compounds quickly. Utilizing the Linkerd2-proxy allows Linkerd to maximize the speed and security of the data plane while optimizing resource consumption. Benchmarks conducted by Kinvolk GmbH (an open-source engineering and technology company recently acquired by Microsoft) found Linkerd significantly faster than open-source Istio while consuming an order of magnitude less data plane memory and CPU.
Leveraging K8s’ security primitives rather than inventing new ones, Linkerd’s security-first approach is designed to improve the overall security of the environment. Zero-trust ready, Linkerd uses mTLS to provide workload identity authentication, confidentiality, and integrity for all communication between meshed pods. Eliminating the security vulnerabilities common to C and C++ projects such as Envoy, Linkerd uses Rust as the data plane programming language, protecting sensitive customer data within a minimalist runtime footprint while retaining native code performance. In addition, the simplicity of Linkerd minimizes the risk of misconfiguration or avoidance of security features due to the high cost of adoption.
As the original creator of Linkerd, Buoyant recently launched Buoyant Cloud, an automated and unified service mesh dashboard built to monitor, assess, and validate the health of Linkerd clusters. Tracking data and control plane metrics, Buoyant Cloud identifies data plane inconsistencies, manages mesh lifecycles and versions, and proactively issues alerts. In addition, enterprise support for Linkerd is available from Buoyant and other third-party companies.
Strengths: Designed from the ground up as a lightweight, security-first service mesh supporting mission-critical features for cloud-native applications using K8s, Linkerd is the only service mesh committed to operational simplicity and low resource consumption. Linkerd is deployed long-term in tens of thousands of K8s clusters worldwide, with CNCF predicting faster adoption than other service meshes. Linkerd has an aggressive roadmap, including recently released cross-cluster failover capabilities and the Kubernetes Gateway API for standardizing pod-to-pod communications.
Challenges: Linkerd’s focus on limited use cases may restrict its application for particular enterprises and organizations. Moreover, Linkerd’s data plane proxy currently supports only K8s workloads running on bare metal or in the cloud. Data plane support for VMs and hybrid environments expanding support beyond K8s clusters is expected later in 2022. Support for Linkerd is primarily provided by the open-source community. However, Linkerd’s creator, Buoyant, and other third-party companies offer paid support for enterprise clients.
Network Service Mesh: CNCF Project
Donated to the CNCF in April 2019, Network Service Mesh (NSM) is a community-driven sandbox project rapidly gaining momentum because of its ability to simplify connectivity among workloads—regardless of where they are running. As a hybrid, multicloud IP service mesh, NSM extends IP reachability to workloads running on-premises, in legacy environments, across multiple clusters, and in public clouds, communicating using existing protocols. Furthermore, since individual workloads only need connectivity to a limited selection of other workloads, NSM provides hybrid, multicloud IP connectivity for applications and application service meshes without requiring any changes.
Figure 14. Network Service Mesh at-a-Glance
Built from the ground up, NSM shifts IP networking from infrastructure to a selection of network services. By connecting an individual workload—or K8s pod—to a network service via a simple set of APIs, NSM enables the infrastructure to remain immutable while meeting a wide variety of requirements. NSM also allows individual workloads to connect to a network service via a WireGuard vWire injected into the pod as a secondary, non-conflicting interface. Finally, by matching the selection of network services to the granularity of the workload—rather than the cluster—NSM allows different workloads to consume different, potentially conflicting network services.
As an additional infrastructure layer running on top of out-of-the-box K8s, NSM maps the concept of a service mesh from Layer 7 workloads to Layer 2 and Layer 3 workloads, providing additional connectivity, observability, and security at the network layers. Complementing higher-level application service meshes by treating them as part of a network service, a Consul, Istio, Linkerd, or other service mesh can run as a single instance on top of NSM’s virtual Layer 3 spanning multiple clusters, clouds, or organizations.
NSM loosely couples workloads to relevant network services independently of the underlying environment, enabling individual workloads to join multiple network services simultaneously, with each network service having its own control plane segmented along the logical lines of the service. As a result, the service mesh delivers the operational simplicity of a single cluster solution while allowing workloads running in multiple clusters across multiple clouds to connect via a shared network service, irrespective of location.
When installed on a K8s cluster, NSM simplifies sophisticated network connectivity for the developer. Designed to operate at internet scale, network service endpoints running anywhere can advertise network services in a network service registry domain. In turn, NSM allows any authorized workload—located anywhere—to request a published network service from one or more service registries. No changes are made to either K8s or to the CNI plug-in being used.
In addition to running on bare metal, NSM has been tested with Amazon EKS, GKE, Microsoft Azure Kubernetes Service (AKS), and across public clusters. NSM is managed via a CLI and well-defined gRPC APIs for registering network services and network service endpoints with its registry server. In addition, NSM includes auto-healing capabilities, uses OPA to enforce admissions policies based on SPIFF and SPIRE identities, and integrates with Prometheus and OpenTelemetry for observability. (Note: SPIRE is a production-ready implementation of SPIFFE.)
Strengths: Network Service Mesh is the only service mesh operating on Layer 2 and 3 workloads. Adopted by Cisco, Ericsson, and Intel for next-generation architectures, NSM complements Layer 7 service meshes by providing additional connectivity, observability, and security. In addition, Ericsson is actively contributing to NSM to enable 5G-specific use cases for cloud-native network functions.
Challenges: While NSM offers tangible benefits and has attracted significant interest from leading industry players, it lacks widespread adoption. However, with several NSM-based solutions targeted for live deployment by the end of 2022, we expect adoption to increase.
NGINX Service Mesh: F5 Networks
Released in May 2021, NGINX Service Mesh (NSM) is a developer-friendly, fully integrated, lightweight service mesh that leverages a data plane powered by NGINX Plus (a cloud-native, easy-to-use reverse proxy, load balancer, and API gateway) to manage container traffic in K8s environments. NSM implements SMI, which defines a standard interface for service meshes on K8s and provides SMI extensions to update apps incrementally with minimal effort and interruption to production traffic. Potential customers should be aware that the product portfolio, which includes NGINX Service Mesh, is currently under review and that the functionality and positioning contained in this report may change significantly.
Figure 15. NGINX Service Mesh at-a-Glance
Integrating natively with NGINX Ingress Controller, NGINX Service Mesh creates a unified data plane to centralize and streamline the configuration of ingress and egress (north-south) traffic management at the edge with service-to-service (east-west) reverse proxy sidecar traffic management. NSM offers a robust set of traffic distribution models, including rate shaping, quality of service (QoS), service throttling, blue-green deployments, canary releases, circuit breaker pattern, A/B testing, and API gateway features. Easy to use and infrastructure agnostic, the lightweight control plane manages NGINX Plus reverse proxy sidecars and data plane. In NGINX Service Mesh, eBPF is used to redirect UDP traffic to the sidecar proxy.
Unlike other service meshes, NSM does not automatically inject a sidecar into each workload, including NGINX Ingress Controller. Policies for manual or auto-injection depend on the deployment options chosen, allowing users to minimize latency and reduce complexity within K8s environments if desired.
Extending mTLS encryption and Layer 7 protection down to the individual microservice, NGINX Service Mesh enables advanced security features, including configuration gating and governance, and zero-trust end-to-end encryption and service authorization. In addition, the NGINX Plus-based version of NGINX Ingress Controller provides default blocking of north/south traffic to internal services and edge firewalling with NGINX App Protect.
NGINX Service Mesh is instrumented for metrics collection and analysis with the option to install an observability stack using OpenTelemetry and Prometheus. NGINX Service Mesh sidecars use the OpenTelemetry NGINX module to export tracing data to an OpenTelemetry Collector, which can be configured to export tracing data to upstream collectors like DataDog, Jaeger, and LightStep. In addition, the built-in Grafana dashboard can be used to visualize granular metrics, day-over-day overlays, and traffic spikes.
NSM is supported on several K8s platforms, including Amazon EKS, Microsoft AKS, GKE, VMware vSphere, and stand-alone bare-metal clusters. It also integrates with several additional open-source solutions, including NATS, K8s Ingress controllers, and SPIRE.
Strengths: NGINX is an established open-source leader in application delivery, with over 400 million websites worldwide relying on NGINX Open Source and NGINX Plus to deliver content quickly, reliably, and securely. For enterprises that want to avoid the complexity that comes with K8s or Istio-based service mesh deployments, NGINX Service Mesh offers a simple multicloud solution supporting scalability, security, reliability, and enterprise readiness.
Challenges: NGINX Service Mesh relies heavily on core NGINX components, generally limiting its application to customers committed to NGINX—and now F5—infrastructure. F5 addressed this limitation by acquiring and incubating Istio- and Envoy-based Aspen Mesh. While NGINX Service Mesh can be downloaded for free on F5’s website, support is available only through the open-source community unless other F5 or NGINX products are purchased. Potential customers should be aware that the product portfolio, which includes NGINX Service Mesh, is currently under review and may change significantly.
Open Service Mesh: CNCF Project
Donated by Microsoft to CNCF as a sandbox project in August 2020, Open Service Mesh (OSM) is a lightweight, extensible cloud-native service mesh enabling users to consistently deploy, manage, and secure highly dynamic microservices environments with out-of-the-box observability. An open-source implementation of SMI—a standard interface with a set of portable APIs for deploying a service mesh on K8s—OSM is a production-ready service mesh based on the Envoy proxy.
Figure 16. Open Service Mesh at-a-Glance
Comprising a full-featured control plane and Envoy-based data plane, OSM is designed to be intuitive, scalable, and easy to troubleshoot. Configured using SMI APIs, OSM is relatively easy to install, maintain, and operate compared to some other service meshes. A key benefit of the SMI specification to be derived when configuring a service mesh is that users don’t need to be specific about which service implementation they’re running in the cluster. However, they may still rely on the SMI specification to reference services participating in the service mesh.
Alternatively, OSM can use a permissive traffic policy mode that—while injecting new pods with Envoy—does not require SMI APIs to route traffic within the mesh, allowing traffic to flow through the proxy without being blocked by access control policies. In addition, for brownfield deployments in which it may take time to create SMI policies, permissive traffic policy mode allows users to design and configure SMI policies with existing services continuing to operate as they did before OSM was installed.
Once the SMI policies are correctly configured, services can communicate with each other using either permissive traffic policy mode or SMI traffic policy mode. In permissive traffic policy mode, traffic between application services is automatically configured and access control policies defined by SMI traffic targets are not enforced. In SMI policy mode, all traffic is denied by default unless explicitly allowed using a combination of SMI access and routing policies. In addition, all traffic is encrypted to ensure secure service-to-service communication via mTLS, whether using access control policies or permissive traffic policy mode.
Injecting a reverse proxy as a sidecar container for each microservice within the application, the Envoy proxy contains and executes rules around access control policies, implements routing configuration, and captures metrics. In addition, the OSM control plane ensures proxies are healthy and monitors them to ensure policies and routing rules are up to date.
Despite being relatively lightweight, OSM provides traffic splitting for load balancing among multiple K8s services, integration with external certificate management services via a pluggable interface, and fine-grained access control policies for services. In addition, observability and insights into application metrics for debugging and monitoring services are provided with Grafana, Prometheus, or Zipkin.
OSM is implemented and managed using the OSM repository on GitHub. To simplify management, OSM comes with a limited CLI for installing or uninstalling OSM in the context of a K8s cluster, onboarding and removing K8s namespaces, and accessing Grafana dashboards and Prometheus metrics.
Strengths: Supported by Microsoft and implementing the SMI specification, OSM builds on work done in other service mesh projects. However, while other service meshes may provide more features, OSM is simpler to deploy and manage. In addition, using Envoy builds on existing skill sets, reducing the learning curve while allowing users to step beyond the limited set of SMI functions to more complex Envoy features when necessary.
Challenges: OSM has a limited number of features. Microsoft was adamant from the beginning that the initial releases of OSM should be focused purely on implementing the SMI specification. As a result, features like service discovery, fault injection, and distributed tracing are not available. In addition, while OSM uses Envoy as the proxy between pods, it does not use it for ingress. The choice of ingress controllers is limited to Azure Application Gateway Ingress Controller, Contour, Solo.io’s Gloo API Gateway, and the NGINX Ingress Controller. If you need to use a different ingress controller, it may work but is not officially supported.
OpenShift Service Mesh: Red Hat
Announced in August 2019, Red Hat OpenShift Service Mesh (OSSM) provides a uniform way to connect, manage, and observe microservices-based applications running within the OpenShift Container Platform, a private PaaS developed by Red Hat for enterprises running OpenShift on on-premises or public cloud infrastructure. Based on the open-source Istio project, OSSM provides behavioral insight and operational control of Maistra Service Mesh, an opinionated distribution of Istio designed to work with OpenShift. OpenShift Service Mesh bundles Maistra Service Mesh—incorporating specific Istio features using the Envoy proxy—with Grafana, Jaeger, and Kiali into a platform providing discovery, service-to-service authentication, load balancing, failure recovery, metrics, and monitoring.
Figure 17. OpenShift Service Mesh at-a-Glance
Engineered to be production-ready, OSSM increases developer productivity and accelerates application time to value by integrating policy-based service-to-service communications without modifying application code or integrating language-specific libraries. Tested with other Red Hat products, OSSM installs easily on Red Hat OpenShift and comes with enterprise-grade support, simplifying and streamlining management for operations personnel.
In addition, OSSM uses Grafana, Jaeger, Kiali, and out-of-the-box security to trace, observe, and secure intra-service communications. An open, composable, and interactive observability and data visualization platform, Grafana enables users to query, visualize, understand, and trigger alerts for metrics regardless of where they are stored. Jaeger, an open-source, end-to-end distributed tracing system, monitors and troubleshoots transactions in complex distributed systems. Optional but installed by default, Jaeger allows users to track a single request as it makes its way among different services—or even inside a service—providing insight into the entire request process from start to finish.
The management console for OSSM, Kiali, another open-source project, is designed specifically for configuring, validating, visualizing, monitoring, and troubleshooting Istio service meshes in near-real time to increase availability and performance. Delivering an intuitive, end-to-end view of all microservices, Kiali displays the structure of the service mesh by inferring traffic topology and using service metrics to indicate application health, reliability, and performance, providing visibility into features such as circuit breakers and request rates. In addition, Kiali integrates with Jaeger to troubleshoot and isolate bottlenecks in end-to-end request paths.
Providing out-of-the-box security for distributed applications, OSSM securely connects services by default using transparent mTLS encryption and enforces a zero-trust network security model with fine-grained traffic policies based on application identities. In addition, the service mesh offers traffic management capabilities to facilitate failovers, canary deployments, traffic mirroring, and A/B testing. Controlling the flow of traffic and API calls between services, OSSM improves service reliability with automatic request retries, timeouts, and circuit breakers, making applications more resilient.
Differing from upstream Istio deployments, OSSM offers features to ease deployment on Red Hat OpenShift and help resolve issues, including the installation of a multitenant control plane, extending RBAC features, replacing BoringSSL—an OpenSSL derivative—with OpenSSL, and enabling Kiali and Jaeger by default. In addition, rather than automatically injecting Envoy sidecars into K8s pods, OSSM requires an annotation, providing more control by allowing users to select the services to be included in the mesh.
Strengths: OSSM provides a uniform way to connect, manage, and observe microservices-based applications running within a Red Hat OpenShift environment. Designed to integrate with other Red Hat products, OSSM installs easily on Red Hat OpenShift and includes enterprise-grade support, simplifying and streamlining management for operations personnel.
Challenges: OSSM’s underlying Maistra Service Mesh is a fork of Istio and lags behind upstream Istio versions. Despite being open, adopting a forked service mesh has the potential of locking users into the vendor product and stalls the speed of upstream changes. OSSM does not support VMs, integration with external certificate signing requests (CSRs) for custom CAs, or plug-ins for external authorization and rate-limiting systems. In addition, while OSSM does not support control plane canary upgrades, it does use the OpenShift Service Mesh operator for installation and upgrades.
Tanzu Service Mesh: VMware
Announced in December 2018 as NSX Service Mesh and launched as Tanzu Service Mesh (TSM) in March 2020, TSM is an Istio-based, enterprise-class service mesh providing consistent connectivity and security for microservices across multicluster and multicloud K8s environments. TSM integrates with Tanzu Kubernetes Grid, VMware’s K8s platform, and Tanzu Mission Control (TMC) in a loosely-coupled model to provide standard service mesh capabilities via the Istio API. TSM also supports AKS, EKS, GKE, OpenShift, and other K8s distributions to create a cross-platform service mesh. In addition, TSM layers unique end-to-end use case support and integrated solutions that are challenging to achieve with service mesh technologies alone. Operated as a SaaS, TSM’s global controller is a fully managed solution operated and maintained by VMWare.
Figure 18. Tanzu Service Mesh at-a-Glance
Tanzu Service Mesh includes the TSM Global Controller—a control plane provided as a SaaS managed by VMware—and the TSM Data Plane running across customers’ K8s clusters. Based on open-source Istio and Envoy, the TSM Data Plane delivers typical services such as authentication and authorization, circuit breaking, rate limiting, timeouts and retries, traffic shifting, and other features. The TSM Data Plane also includes the TSM Agent, which provides a secure connection between the customers’ clusters and the TSM Global Controller for managing the configuration and policies enforced in the TSM Data Plane.
Tanzu Service Mesh includes a unique application abstraction layer called Global Namespace (GNS), which acts as a logical grouping for microservices. Managed through a declarative API model and an intuitive UI, GNSs provide modern applications with simplified configurability, API-driven automation, isolation, and operational consistency for DevOps and security, irrespective of the underlying platform or cloud. They also provide automated service discovery and naming (DNS), resiliency policies, security policies, service graphs, and traffic routing.
Enabling full automation of multicluster configurations (in a federated model where each cluster’s control plane is independent and cross-cluster traffic is restricted to the data plane), ingress and egress configurations, and seamless cross-cloud application portability, GNS supports microservices within a single cluster and microservices distributed across multiple clusters and clouds. In addition, integration with the Tanzu Application Platform (TAP) provides an enhanced developer experience, enabling connectivity, resiliency, and security intent to be pre-configured into a GNS and then automatically deployed to TAP applications.
In addition to zero-trust multicluster support and improved service-level visibility, TSM supports interoperability with multiple meshes through the Hamlet protocol, an open-source project led by VMware with contributions from Google and HashiCorp. Hamlet enables interoperability between service meshes provided by different vendors, including discovery, routing, and secure connectivity.
TSM offers complete lifecycle management of the service mesh with automated cluster onboarding during the Istio installation; one-click operations to upgrade, patch, rollback, or remove the TSM Data Plane from clusters on any K8s platform or cloud environment; and automated data plane health checks and management to minimize configuration drift. TSM operates either standalone or as part of a fully-integrated lifecycle management workflow managed by TMC.
Providing Istio cluster onboarding and automated health monitoring and lifecycle management of the Istio/Envoy data plane, TSM is integrated with the full-stack capabilities of Tanzu—VMware’s portfolio for modernizing applications and infrastructure. In addition, TSM works with VMware’s NSX Advanced Load Balancer (formerly Avi Networks) to provide multicloud support, unified policies, load balancing, ingress, container networking, and observability across VMware and third-party K8s environments.
Moreover, contextual API security (based on VMware’s March 2021 acquisition of Mesh7) allows developers and security teams to better understand when, where, and how applications and microservices are communicating via APIs—even across multicloud environments—enabling better DevSecOps. In addition, Intel and VMware are working together to optimize and accelerate the microservices middleware and infrastructure with software—including eBPF—with a focus on improving performance, crypto accelerations, and security for building distributed workloads.
Strengths: Leveraging open-source Istio, Tanzu Service Mesh provides robust enterprise services—including autoscaling—across multiple K8s clusters, offering operational simplification and automation with advanced resiliency functions. In addition to supporting various application platforms, public clouds, and runtime environments, Tanzu Service Mesh supports federation across multiple clusters for end-to-end connectivity, resiliency, and security.
Challenges: Whether loosely coupled or fully integrated, Tanzu Service Mesh primarily offers value for VMware’s installed base rather than a broader audience. TSM currently lacks end-to-end security capabilities, including extensibility of TSM to both third-party and VMware endpoint detection and response (EDR) and mobile device management (MDM) solutions, including VMware Carbon Black. TSM includes multiple overlapping gateway and load balancing technologies, presenting configuration challenges. During the evaluation phase, prospective customers need to be careful to differentiate between what is already implemented and what is on the product roadmap.
Traefik Mesh: Traefik Labs
Released in September 2019 and known previously as Maesh, Traefik Mesh is a simple, straightforward, and non-invasive service mesh utilizing Traefik Proxy, rather than Envoy, to manage service-to-service communications inside a K8s cluster. Created and maintained primarily by Traefik Labs (previously known as Containous), Traefik Proxy is one of the most used cloud-native application proxies, with over 3 billion downloads and more than 39,500 GitHub stars. Traefik Labs claims Traefik Mesh is the simplest and easiest service mesh to deploy for enhanced control, security, and observability across all east/west traffic flows with minimal overhead.
Figure 19. Traefik Mesh at-a-Glance
Integrating natively with K8s, Traefik Mesh is a lightweight—yet full-featured—service mesh supporting the latest SMI specification. Traefik Mesh is the only mesh included in this report using a per node architecture instead of a sidecar proxy for simplicity and resource conservation. In addition, since Traefik Mesh is opt-in by default, existing services are unaffected until explicitly added to the service mesh rather than being automatically injected into the application.
Since Traefik Mesh does not use any sidecar container, routing is handled through proxy endpoints on each node. Leveraging Traefik Mesh endpoints, the sidecar-less architecture means that Traefik Mesh does not modify K8s objects or traffic without the user’s knowledge. Supporting multiple configuration options, including annotations on user service objects and SMI objects, the mesh controller runs in a dedicated pod and handles all the configuration parsing and deployment to the proxy nodes.
Designed for simplicity with a focus on efficiency and low-resource utilization, Traefik Mesh is easy to install and configure via a CLI. Its feature set includes traffic management capabilities, such as circuit breakers, load balancing, retries and failovers, and rate limiting. In addition, Traefik Mesh provides observability with out-of-the-box metrics preinstalled with Grafana and Prometheus, and is compatible with Datadog, InfluxData, and StatsD. Tracing is supplied through OpenTelemetry, delivering full compatibility with Haystack, Instana, Jaeger, and Zipkin for resilient, scalable tracing and analysis.
In addition to basic security in the form of mTLS, Traefik Mesh is SMI-compliant and facilitates the fine-tuning of traffic permissions via access control. A specification for service meshes running on K8s, SMI defines a common standard for service mesh providers, covering the most common capabilities and enabling flexibility and interoperability. Furthermore, since SMI is specified as a collection of K8s APIs, users who know K8s can use Traefik Mesh.
Built on top of open source Traefik Proxy and Traefik Mesh, Traefik Enterprise consolidates API management, ingress control, and service mesh within one simple control plane. A unified, cloud-native networking solution, Traefik Enterprise simplifies microservices networking complexity with distributed, highly available, and scalable features combined with premium, subscription-based bundled support for enterprise-grade deployments. In addition, Traefik Enterprise includes an enhanced dashboard with service mesh observability of internal east/west traffic.
Strengths: Traefik Mesh comprises a selection of features to achieve good usability and performance, including circuit breakers, load balancing, rate limiting, retries and failovers, and security, as well as observability and out-of-the-box metrics. Using the popular Traefik Proxy, Traefik Mesh offers lightweight SMI-compliant and non-invasive traffic management with good usability and performance. Instead of a sidecar proxy, Traefik Mesh uses an opt-in, per node proxy connecting services, providing increased control and conserving resources.
Challenges: Traefik Mesh lacks multicluster capabilities, so users requiring a unified control plane spanning clusters, clouds, and meshes should look elsewhere. And while it supports the SMI access control, it doesn’t offer transparent, end-to-end encryption. In addition, Traefik Mesh does not support VMs.
6. Analyst’s Take
Despite only emerging in 2016, the service mesh landscape is turning into a battleground as open-source projects and commercial vendors strive to address the complexity of microservices-based cloud-native deployments. Focused on making their service meshes faster, more efficient, and easier to manage, suppliers are increasing the pace of innovation to meet customer demands.
The past year has seen many innovations, including the emergence of Cilium Service Mesh, which leverages eBPF to increase performance and reduce resource contention, and Istio Ambient Mesh, a new, layered Istio dataplane architecture offering simplified operations, broader application compatibility, and reduced costs. An alternative to Envoy sidecars, Istio Ambient Mesh splits Istio’s functionality into a secure overlay layer and a Layer 7 processing layer, each offering the relevant telemetry, traffic management, and zero-trust security capabilities.
Several suppliers have either incorporated eBPF technology in their platforms or included it on their roadmaps, while Solo.io—a co-creator along with Google—already supports Istio Ambient Mesh in Gloo Mesh 2.1. Greymatter.io continues to push the boundaries, adding a heuristics-based AI health sub-system and out-of-the-box GitOps IaC capabilities. In addition, several other suppliers have significantly enhanced their platforms.
Leading cloud vendors are also rolling out service mesh capabilities embedded within their portfolios to provide consistent network traffic controls, observability, and security. These include Amazon’s AWS AppMesh, Google’s Anthos Service Mesh, and Microsoft’s Open Service Mesh—donated to the CNCF. While this diversification creates a new battleground, efforts have been made to standardize interfaces and unify various workloads deployed across different service meshes. For example, SMI—an open, K8s-native specification project launched by HashiCorp, Kinvolk, Linkerd, Microsoft, Solo.io, and Weaveworks—comprises a set of standard, portable APIs providing developers with interoperability across different service mesh technologies.
However, a recent CNCF microsurvey indicates that among the most significant challenges enterprises face are a shortage of engineering expertise and experience, architectural and technical complexity, and choosing between open-source projects and commercial products. In addition, the survey indicated that ultra-fast, ultra-light, and easy to deploy and manage service meshes—such as Linkerd, Kuma, and Traefik Mesh—are at the top of the shortlist when it comes to addressing the security, observability, reliability, and traffic management concerns of customers.
While Istio is the most widely deployed service mesh today, we expect its market leadership to decline over the next 18 to 36 months. Moreover, with Istio coming under the governance of the CNCF, we expect to see significant changes as Istio-based vendors such as F5 Networks and Solo.io have a greater say in the direction of the platform. And while purists might claim that Istio-based vendors such as F5 Networks, Solo.io, and VMware are not service mesh “providers” in the true sense of the word, we believe that they have established themselves as inextricable cogs in the wheel and will continue to exert significant influence over, not just Istio, but the industry as a whole.
However, while an Envoy or Istio-based service mesh or one with widespread support may be considered the safe choice, that should not be the determining factor. Many use cases can be supported with an easy-to-use, lightweight, and infrastructure-agnostic service mesh incorporating essential functionality and supporting both east/west and north/south traffic. In addition, some of the newer vendors—such as greymatter.io—offer the same service mesh capabilities but with significant differentiation from an AIOps perspective.
Avoid adopting a service mesh based purely on consumer trends, industry hype, or widespread adoption. Instead, take the time to understand in detail the problem you’re trying to solve. Explore the potential tradeoffs in terms of performance and resource consumption. Evaluate your support requirements against your in-house resources and skills (many open-source service meshes rely on community support). Once you’ve created a short list, choose a service mesh—and microservices-based application development partner—that works best with your software stack.
7. About Ivan McPhee
Formerly an enterprise architect and management consultant focused on accelerating time-to-value by implementing emerging technologies and cost optimization strategies, Ivan has over 20 years’ experience working with some of the world’s leading Fortune 500 high-tech companies crafting strategy, positioning, messaging, and premium content. His client list includes 3D Systems, Accenture, Aruba, AWS, Bespin Global, Capgemini, CSC, Citrix, DXC Technology, Fujitsu, HP, HPE, Infosys, Innso, Intel, Intelligent Waves, Kalray, Microsoft, Oracle, Palette Software, Red Hat, Region Authority Corp, SafetyCulture, SAP, SentinelOne, SUSE, TE Connectivity, and VMware.
An avid researcher with a wide breadth of international expertise and experience, Ivan works closely with technology startups and enterprises across the world to help transform and position great ideas to drive engagement and increase revenue.
8. About GigaOm
GigaOm provides technical, operational, and business advice for IT’s strategic digital enterprise and business initiatives. Enterprise business leaders, CIOs, and technology organizations partner with GigaOm for practical, actionable, strategic, and visionary advice for modernizing and transforming their business. GigaOm’s advice empowers enterprises to successfully compete in an increasingly complicated business atmosphere that requires a solid understanding of constantly changing customer demands.
GigaOm works directly with enterprises both inside and outside of the IT organization to apply proven research and methodologies designed to avoid pitfalls and roadblocks while balancing risk and innovation. Research methodologies include but are not limited to adoption and benchmarking surveys, use cases, interviews, ROI/TCO, market landscapes, strategic trends, and technical benchmarks. Our analysts possess 20+ years of experience advising a spectrum of clients from early adopters to mainstream enterprises.
GigaOm’s perspective is that of the unbiased enterprise practitioner. Through this perspective, GigaOm connects with engaged and loyal subscribers on a deep and meaningful level.
9. Copyright
© Knowingly, Inc. 2022 "GigaOm Radar for Service Mesh" is a trademark of Knowingly, Inc. For permission to reproduce this report, please contact sales@gigaom.com.