This GigaOm Research Reprint Expires: Jan 22, 2023

GigaOm Radar for Enterprise Scale-Out File Systemsv2.0

1. Summary

File storage remains one of the most popular ways to store data, both on-premises and in the cloud. Scale-out file storage is becoming the default choice for most organizations for several reasons, including:

  • Scale-out file storage can expand quickly while increasing throughput.
  • Object storage, although very popular, isn’t overshadowing file systems yet. File systems are often accessed via network protocols like NFS and SMB, and are still the data storage system of choice for a large number of workloads, including big data analytics, artificial intelligence/machine learning (AI/ML), high-performance computing (HPC), and more.
  • Modern file systems are much more scalable than in the past, providing a familiar user interface and authentication methods with performance and scalability.
  • Legacy applications continue to drive demand for file storage. Usually written to work with POSIX-compliant file systems, the cost of refactoring such applications to benefit from object storage may outweigh the cost benefits, thus making file storage a preferred option.
  • Modern scale-out solutions are mature and flexible, with most of the complexity now hidden behind the scenes. In the end, managing a large scale-out system is less time-consuming than managing several scale-up systems.
  • Solutions that support data mobility across different environments are becoming increasingly important for executing properly on hybrid IT strategies, and scale-out file storage systems are easy to implement on cloud virtual machine instances. In this regard, GigaOm recently published the report “Key Criteria for Evaluating File-Based Cloud Storage,” because there is a growing demand for sophisticated file services on-premises and in the cloud.

Unstructured data accounts for up to 90% of what is stored in enterprise infrastructures. Therefore, storage that is scalable and fast enough to manage interactive workloads is crucial for responding adequately to business needs. That said, enterprises don’t want to trade scalability and performance for the data services and flexibility they usually get from traditional scale-up network-attached storage (NAS) solutions. Even more so with the advent of multi-cloud, users want the flexibility to move data where it’s needed, increasing the demand for advanced data services. At the same time, users want solutions ready to respond to increasing regulatory needs, data governance tasks, and risks coming from a growing number of security threats, including ransomware. This expansion of the IT mission is why scale-out storage systems are much more balanced than in the past and tend to encompass enterprise features like scalability, flexibility, efficiency, security, and performance characteristics.

How to Read this Report

This GigaOm report is one of a series of documents that helps IT organizations assess competing solutions in the context of well-defined features and criteria. For a fuller understanding consider reviewing the following reports:

Key Criteria report: A detailed market sector analysis that assesses the impact that key product features and criteria have on top-line solution characteristics—such as scalability, performance, and TCO—that drive purchase decisions.

GigaOm Radar report: A forward-looking analysis that plots the relative value and progression of vendor solutions along multiple axes based on strategy and execution. The Radar report includes a breakdown of each vendor’s offering in the sector.

Solution Profile: An in-depth vendor analysis that builds on the framework developed in the Key Criteria and Radar reports to assess a company’s engagement within a technology sector. This analysis includes forward-looking guidance around both strategy and product.

2. Market Categories and Deployment Types

For a better understanding of the market and vendor positioning (Table 1), we assess how well solutions for scale-out file systems are positioned to serve specific market segments.

  • Enterprise: Optimal solutions in this category will have a strong focus on flexibility, data services, and features to improve security and data protection. Scalability is another big differentiator, as is the ability to deploy the same service in different environments.
  • High performance: Optimal solutions will be designed for specific workloads and use cases, such as big data analytics, AI/ML/DL, and high-performance computing (HPC). The key differentiators in this area are performance, scalability, and GPUDirect support.

In addition, we recognize two deployment models for solutions in this report: hardware appliance or software-defined storage.

  • Hardware appliance: Provided as a self-contained physical device, these appliances provide all the components necessary to deliver scale-out file storage capabilities. The device is fully supported by the vendor, and besides management of the platform, all the customer needs to take care of is applying hotfixes or patches. This deployment model delivers simplicity at the expense of flexibility.
  • Software-defined storage: These solutions are meant to be deployed on commodity servers on-premises or in the cloud, allowing organizations to build hybrid or multi-cloud scale-out file storage infrastructures. These solutions provide more flexibility in terms of deployment options, cost, and hardware.

Table 1. Vendor Positioning

Market Segment

Deployment Models

Enterprise High Performance Hardware Appliance Software-Defined Storage
Cohesity
Commvault
Dell Technologies
HammerSpace
NetApp
Nutanix
OSNexus
Pure Storage
Quantum
Qumulo
Quobyte
Red Hat
Scality
SoftIron
VAST Data
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

3. Key Criteria Comparison

Building on the findings from the GigaOm report, “Key Criteria for Evaluating Scale-Out File Storage,” Table 2 summarizes how each vendor included in this research performs in the areas that we consider differentiating and critical in this sector. Table 3 follows this with insight into each product’s evaluation metrics—the top-line characteristics that define the impact each will have on the organization. The objective is to give the reader a snapshot of the technical capabilities of available solutions, define the perimeter of the market landscape, and gauge the potential impact on the business.

Table 2. Key Criteria Comparison

Key Criteria

Integration with Object Storage Integration with Public Cloud New Flash Memory Devices System Management Basic Data Management Security Features Kubernetes Support GPU Direct Support
Cohesity 3 3 1 3 3 3 0 0
Commvault 2 3 1 3 2 3 2 0
Dell Technologies 2 1 1 3 3 2 3 3
HammerSpace 3 3 2 3 2 2 2 1
NetApp 3 3 3 3 3 3 3 3
Nutanix 3 2 1 3 3 3 3 0
OSNexus 1 0 3 2 1 3 1 0
Pure Storage 2 1 3 3 2 3 3 0
Quantum 2 1 1 2 1 2 0 0
Qumulo 3 3 1 3 3 2 1 0
Quobyte 1 3 3 3 1 3 2 0
Red Hat 1 0 3 2 1 2 3 0
Scality 3 2 1 2 2 2 0 0
SoftIron 1 0 1 3 1 2 1 0
VAST Data 3 0 3 3 2 2 2 3
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

Table 3. Evaluation Metrics Comparison

Evaluation Metrics

Scalability Flexibility Performance System Lifespan Usability Security
Cohesity 3 2 2 3 3 3
Commvault 3 3 2 3 3 3
Dell Technologies 3 1 3 2 3 3
HammerSpace 2 3 2 3 3 2
NetApp 3 3 3 3 3 3
Nutanix 2 2 3 3 3 3
OSNexus 3 2 3 3 2 3
Pure Storage 3 2 3 3 3 3
Quantum 3 1 3 2 2 3
Qumulo 3 3 3 3 3 3
Quobyte 3 3 3 3 2 3
Red Hat 2 2 3 3 2 3
Scality 3 2 1 3 2 3
SoftIron 2 2 3 2 2 3
VAST Data 3 2 3 3 3 3
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

By combining the information provided in the tables above, the reader can develop a clear understanding of the technical solutions available in the market.

4. GigaOm Radar

This report synthesizes the analysis of key criteria and their impact on evaluation metrics to inform the GigaOm Radar graphic in Figure 1. The resulting chart is a forward-looking perspective on all the vendors in this report, based on their products’ technical capabilities and feature sets.

The GigaOm Radar plots vendor solutions across a series of concentric rings, with those set closer to the center judged to be of higher overall value. The chart characterizes each vendor on two axes—Maturity versus Innovation, and Feature Play versus Platform Play—while providing an arrow that projects each solution’s evolution over the coming 12 to 18 months.

Figure 1. GigaOm Radar for Enterprise Scale-Out File Systems

As you can see in the Radar chart in Figure 1, NetApp is a clear winner due to the overall company strategy: NetApp offers a unified platform with ONTAP, unmatched cloud integration, and a rich set of data services. Qumulo and Nutanix are two vendors showing formidable feature sets that make these solutions compelling for a large audience. Cohesity and VAST Data, even if more focused on specific use cases, are truly compelling and could easily become strong platforms for consolidating old systems and data pools.

Pure Storage and Dell Technologies both have formidable platforms; however, they are introducing new advanced features at a slower pace, mostly missing on cloud integration. Notably, Dell Technologies is in a unique position, transitioning from a mature platform (moving away from Isilon) to a more innovative approach with PowerScale. Also worth noting is Commvault’s Distributed Storage solution. Although based on a hybrid architecture, the solution provides outstanding capabilities and benefits from Commvault’s comprehensive ecosystem.

Quobyte offers a great and scalable architecture, but object storage integration and data management capabilities are missing. Another very interesting solution is Hammerspace: it provides a scale-out file system that spans across multiple data centers and clouds with a global data environment. Scalability (in terms of site count supported) is currently limited but should be significantly improved in early 2022. OSNexus proposes a great implementation of storage tiers, massive scalability, and a broad set of security features. Cloud integration, basic data management, and Kubernetes support, however, are areas for improvement.

Several solutions cover niche use cases: SoftIron proposes a performant and unique security-focused approach to scale-out file systems, with an efficient hardware platform and auditable supply chain, making it a platform of choice for sensitive projects, notably in the public sector. Quantum focuses on massive, exabyte-scale deployments with a broad choice of appliances, and the company is working on multiple roadmap innovations that should strengthen its position over the next 12-18 months. Scality’s solution is laser-focused on large, sequential, massively parallel workloads, and while the solution may seem limited in capabilities, it truly excels in its focus area. Finally, Red Hat deserves a mention with CephFS, an interesting file system architecture on top of its Ceph distributed object store. CephFS provides outstanding Kubernetes support and support for new flash memory devices, but unfortunately, the solution’s deployment pace is moderate, and it lacks object storage and cloud integration support.

Inside the GigaOm Radar

The GigaOm Radar weighs each vendor’s execution, roadmap, and ability to innovate to plot solutions along two axes, each set as opposing pairs. On the Y axis, Maturity recognizes solution stability, strength of ecosystem, and a conservative stance, while Innovation highlights technical innovation and a more aggressive approach. On the X axis, Feature Play connotes a narrow focus on niche or cutting-edge functionality, while Platform Play displays a broader platform focus and commitment to a comprehensive feature set.

The closer to center a solution sits, the better its execution and value, with top performers occupying the inner Leaders circle. The centermost circle is almost always empty, reserved for highly mature and consolidated markets that lack space for further innovation.

The GigaOm Radar offers a forward-looking assessment, plotting the current and projected position of each solution over a 12- to 18-month window. Arrows indicate travel based on strategy and pace of innovation, with vendors designated as Forward Movers, Fast Movers, or Outperformers based on their rate of progression.

Note that the Radar excludes vendor market share as a metric. The focus is on forward-looking analysis that emphasizes the value of innovation and differentiation over incumbent market position.

5. Vendor Insights

Cohesity

Cohesity’s SmartFiles is a feature-rich, capacity-optimized solution that can be integrated easily with performance-driven systems in the infrastructure to build efficient two-tier architectures. It is based on the same core technology as the rest of Cohesity’s platform and can directly serve most of the workloads that can be found in an enterprise environment through SMB, NFS, S3, and Swift protocols. Data can be accessed concurrently via file and/or object protocols.

The solution is extremely easy to use and users can decide to have it available in the same cluster used for data protection and management or on a dedicated cluster; the type of deployment only depends on the user’s needs and capacities involved. Scalability has been proven in the field, with several multi-petabyte systems already in production. The deployment flexibility is another positive aspect of SmartFiles, with users deploying it at the edge, on single virtualized nodes, in the public cloud, and in large on-premises scale-out systems, creating large distributed infrastructure that can be centrally managed by Helios. From a performance point of view, the solution is designed primarily for capacity, sequential writes, and throughput, sacrificing low latency and small random IO operations for better data footprint reduction, optimization, and efficient snapshot management.

Data management remains one of the key differentiators for Cohesity. In fact, it has implemented a long series of features in this area aimed at simplifying data mobility, protection, security, and lately, governance. SmartFiles includes remote data replication capabilities, automated tiering between different storage systems and the cloud, transparent archiving functionality, data migration, and sophisticated ransomware protection. Of particular interest is the tiering functionality that enables users to identify and move data automatically via user-defined policies from expensive Tier 1 storage to SmartFiles while leaving a symbolic link behind so as not to compromise the user experience. Another unique feature is DataGovern, still in preview but with huge potential when it comes to data classification and compliance.

Cohesity is investing resources to build a marketplace where users can find out-of-the-box applications to extend the functionalities already available. These solutions—available from Cohesity, its partners, and the user community—are easy to install and manage while providing point solutions to common challenges. Challenges include global search, antivirus, log analysis, advanced storage analytics, and compliance.

Strengths: Feature-rich solution that can serve a wide range of capacity-driven workloads while providing seamless integration with high-performance storage systems in large environments. Integrated data management and governance features associated with ransomware protection complete the solution and make it stand out from the crowd.

Challenges: Even though all the information is available via API, a more comprehensive view of the data stored in the system and workloads with additional dashboards and reporting capabilities would be a welcome addition to the product. This is something that Cohesity is already aware of and developing for future product releases.

Commvault

Commvault offers Commvault Distributed Storage (CDS), a software-defined scale-out storage solution built on a distributed architecture and capable of supporting block, file, and object storage. CDS is capable of writing data simultaneously across locations—on premises and on public clouds—providing organizations with a flexible and resilient storage platform. The solution is deployable on industry-standard x86 servers and supports thin provisioning, deduplication, compression, and erasure coding, thus providing greater efficiency and keeping storage costs under control. Furthermore, in addition to supporting snapshots and clones, CDS is capable of maintaining up to six data replicas across sites and can be deployed in active-active stretched clusters with automated failover support.

Object storage integration is excellent; besides its own support for object storage, CDS is capable of tiering data to public cloud storage or to any S3-compatible on-premises storage. When used as a target, CDS supports both S3 and Swift object protocols. The solution also supports multiple cloud platforms such as AWS, Azure, and Google Cloud Platform, and seamlessly integrates with other Commvault offerings such as the SaaS-based Metallic.io data protection platform.

CDS is integrated natively into Commvault Command Center, so organizations can see and configure storage from Command Center and also perform other operations, such as using CDS as a disaster recovery (DR) target or configuring copy data management (CDM). In addition, CDS also provides a comprehensive RESTful API for orchestration and automation.

A rich set of security capabilities is available: the solution implements multi-tenancy, encryption (in-flight and at-rest), and role-based access control. Furthermore, it benefits from Commvault’s expertise in ransomware prevention and implements several features such as Commvault Retention Lock and immutable snapshots in the cloud.

Commvault’s solution supports containerized environments through a Kubernetes plugin that delivers integrated container snapshots, container migration capabilities, and integrated policy automation. Those features aim at providing protection for containers, enabling data movements, and implementing fine-grained policies for snapshot and migration activities. The solution also provides enhanced support for OpenStack with native Cinder and Swift integration.

The solution isn’t designed to support GPU-based workloads and so does not support GPUDirect.

Strengths: Commvault Distributed Storage delivers a compelling solution that is capable of providing unified block, file, and object storage capabilities in a distributed and resilient fashion. The solution offers outstanding integration points into Commvault’s broad ecosystem, notably around cloud integration, data management, and most importantly, data protection.

Challenges: Even though the solution presents very good overall characteristics, it is optimized for hybrid configurations with a mix of flash memory and hard drives. Because Commvault positions CDS as a single storage solution across all workloads, improvements should be made to support all-flash deployments in the future.

Dell Technologies

Dell Technologies provides scale-out file storage services through its PowerScale solution (a successor of the Isilon platform), a distributed file system that scales from 11 TB to 92 PB in a single namespace with fast node addition (60 seconds to add a node to the cluster). PowerScale provides auto-balancing capabilities as well as simultaneous multiprotocol access to the same data through file, object, or Hadoop protocols. With PowerScale, Dell Technologies dissociated the file system, OneFS, from the appliance; organizations can deploy PowerScale either as a software-defined storage solution or pre-installed on purpose-built appliances. Appliances support a broad range of storage media with all-flash (NVMe and SAS flash), hybrid, and archive-oriented appliances.

OneFS supports a broad range of data services including smart quotas, deduplication and inline compression, smart storage pools that benefit from policy-based data tiering, and automated client load-balancing through SmartConnect. Another service, SyncIQ, allows multiple data replication topologies to be configured, not only for data movement but also for high availability and DR use cases.

Currently, the solution is adjacent to cloud (it can be deployed either on-premises or in a colocation facility), but the solution supports policy-based cloud tiering to AWS and ECS. A cloud offering is currently available on Google Cloud, an integrated native Google Cloud service operated by Dell Services. The Google Cloud offering can scale up to 33 PiB effective capacity currently within a single namespace and provides customers integrated billing and support from Google. Dell Technologies also has plans for a cloud-based PowerScale appliance in its roadmap.

PowerScale systems can be managed through their own management interface; however, the solution also seamlessly integrates with CloudIQ, Dell Technologies’ AI-based analytics platform that provides comprehensive data services, automation, and AIOps capabilities.

Data management capabilities are provided through DataIQ, a free software solution provided by Dell that works across various storage platforms to locate, access, and manage data through a single management interface. DataIQ is capable of reporting on data usage and user access patterns and identifying performance bottlenecks. It can identify and automatically tag data and perform data movement activities.

Within OneFS, various security capabilities such as the SmartLock WORM are embedded, allowing long-term retention of data required for regulatory and compliance purposes. OneFS also delivers ransomware attack detection and mitigation capabilities through an API-integrated solution. Hardware security features include secure boot and self-encrypting drives. The solution supports a rich set of security features related to access management such as granular role-based access control, file-level auditing, and file blocking. External key management systems and multi-factor authentication are also supported, and advanced multi-tenancy capabilities are available with access zones working across all protocols.

The PowerScale platform supports Kubernetes integration through Dell’s Container Storage Modules (CSM), a regularly updated open-source suite of modules developed for Dell EMC products. CSM covers storage support (through CSI drivers) and other capabilities such as authorization, resiliency, observability, snapshots, and replication. CSM also provides integration points with Prometheus and Grafana, as well as Ansible and Python; the OpenShift and Docker platforms are supported as well. Other integration points include VMware Tanzu, which can take advantage of PowerScale volume snapshots through the PowerScale CSI driver.

Strengths: PowerScale offers a strong set of capabilities in the security, data management, and data services areas. Support for Kubernetes makes the solution well suited to fit the growing needs of organizations both in cloud-native and AI/ML-based workloads.

Challenges: Putting integration with object storage aside, cloud support is currently limited to Google Cloud.

Hammerspace

Hammerspace’s Global Data Environment is a software-defined storage and data management solution that is a global file system that allows organizations to build a scale-out file system within a datacenter or cloud, as well as to create hybrid clouds that scale across multiple sites. The global data environment can span across multiple on-premises geolocations and multiple public cloud platforms, providing applications and users unified access to the organization’s entire data set.

Built for a global scale, the solution lets customers utilize, access, store, protect, and move data around the world through a single global namespace. Once configured with access to the Hammerspace global data environment, applications and users have no need to know where the data and infrastructure are physically located. Data is made available when and where the applications and users need it through the astute use of metadata across file system standards and includes telemetry data (such as IOPS, throughput, and latency), as well as user-defined and analytics-harvested metadata that allow users to view, filter, and search the metadata rapidly in place, instead of relying on file names.

Hammerspace can be deployed on-premises, either as a bare-metal installation on top of enterprise-grade hardware, or within virtual machines. The solution’s flexible deployment model allows support for new media types including NVMe flash and storage-class memory, and currently supports NFS, pNFS, and SMB protocols. Hammerspace can be deployed in the cloud, with support for AWS, Azure, and GCP. It implements share-level snapshots as well as comprehensive replication capabilities, allowing files to be replicated automatically across different sites through the Hammerspace Policy Engine. Manual replication activities are available on-demand as well. These capabilities allow organizations to implement multi-site, active-active DR with automated failover and failback. Integration with object storage is also a core capability of Hammerspace: data can be replicated or saved to the cloud as well as tiered automatically on object storage, thereby reducing the on-premises data footprint and leveraging cloud economics to keep storage spend under control.

The solution can be managed through a management UI, but also through REST APIs or CLI commands. One of Hammerspace’s key features is its Autonomic Data Management component. This is a machine learning engine that runs a continuous market economy simulation that, when combined with telemetry data from a customer’s environment, helps make real-time, cross-cloud data placement decisions based on performance and cost.

Among security features, ransomware protection is offered by Hammerspace through immutable file shares with global snapshot capabilities as well as an undelete function and file versioning, allowing users to revert back to a file version not affected by ransomware-related data corruption. In addition, Hammerspace also includes automated antivirus scanning through a third-party integration that leverages the iCAP protocol. The solution also supports commercial KMS and nTrust HSM, as well as data encryption before transfers to/from object storage happen. Active Directory integration and role-based access control are also available.

The solution supports Kubernetes through a CSI plugin that presents Hammerspace’s global namespace to clusters and pods. Containerized workloads can access data globally, regardless of whether it is local to the cluster, on-premises but remote, or located in the cloud. The implementation also supports snapshots for data protection and recovery, automated data recovery, and service level objectives (SLOs). These help determine where data should be optimally placed to meet the SLOs.

Although the solution doesn’t support NVIDIA GPUDirect, it provides NFS 4.2 protocol support (particularly pNFS), enabling direct connectivity from a client running GPU workloads to the storage volumes without an intermediate layer.

Strengths: Hammerspace’s Distributed Global Data Environment offers a very balanced set of capabilities with replication and multi- and hybrid cloud capabilities through the power of metadata.

Challenges: Basic data management capabilities are missing.

NetApp

NetApp has an interesting approach to scale-out file systems based on ONTAP. This solution offers many different deployment options for organizations. Besides on-premises deployments, NetApp has forged unique partnerships with the three major public cloud providers to offer a native NetApp experience that is tightly integrated with the public cloud platform, not only from a performance and technical integration perspective, but also with regard to charging and management. This is particularly the case with Azure NetApp Files and now Amazon FSx for NetApp ONTAP. These are Tier 1 cloud file services directly offered and managed by Azure and AWS that provide a seamless experience to the user, regardless of the cloud platform they use.

With Cloud Volumes, NetApp provides the implementation of a global namespace that abstracts multiple deployments and locations regardless of distance. Several intelligent caching mechanisms combined with global file locking capabilities enable a seamless, latency-free experience that makes data accessible at local access speeds from local cache instances.

Based on ONTAP, the NetApp solutions have been architected to support hybrid deployments natively, whether on-premises or in the cloud. NetApp Scale-Out Storage supports NFSv3 and NFSv4, as well as SMB2 and SMB3 and S3. LDAP and AD are supported by ONTAP as well. Cloud Volumes abstracts the underlying cloud infrastructure and deployment model to present users with a single control plane, and the management of all systems is unified under Cloud Manager, NetApp’s management console. Because ONTAP is at the heart of Cloud Volumes, all of the data services provided on-premises or in the cloud are the same: discovery, deployment, protection, governance, migration, and tiering. Tiering, replication, and data mobility capabilities are outstanding and enable a seamless, fully hybrid experience. Organizations can decide where primary data resides, where infrequently accessed data gets tiered to, and where data copies and backups used for DR should be replicated to—notably thanks to NetApp’s Cloud Backup service.

Integration with object storage is a key part of the solution, and policy-based data placement allows automated, transparent data tiering on-premises with NetApp StorageGRID, or in the cloud with AWS S3, Azure Blob Storage, or Google Cloud Storage, with the ability to recall requested files from the object tier. Object storage integration also extends to backup and DR use cases. With Cloud Backup, backup data can be written to object stores using block-level, incremental-forever technology.

Data management capabilities are enabled by consistent APIs that allow data copies to be created as needed; the platform also offers strong data analytics capabilities through Cloud Manager (which has integrated dashboards fed by NetApp’s Cloud Insights service), and particularly through Cloud Data Sense, one of Cloud Manager’s accessible services. This service provides insights around data owners, location, access frequency, and data privileges, as well as potential access vulnerabilities, with manual or automated policy-based actions. Organizations have the ability to generate compliance and audit reports such as DSARs. HIPAA and GDPR regulatory reports also can be run in real time on all Cloud Volumes data stores.

NetApp provides advanced security measures against ransomware and suspicious user or file activities through NetApp FPolicy and snapshot capabilities. The FPolicy solution is integrated into Cloud Volumes ONTAP and provides prevention abilities. It allows file operations to be monitored and blocked as a preventive measure. It includes detection of common ransomware file extensions as well as integration capabilities with third-party technology partners such as Varonis, Veritas, and others. Visibility of FPolicy activities is possible through NetApp Cloud Insights, which analyzes user behavior to identify file access anomalies and preempt possible risks from outsiders, ransomware attacks, or rogue users.

From a remediation perspective, immutable point-in-time NetApp snapshot copies provide the ability to revert to a healthy state. Organizations can enable Cloud WORM—an additional write-once, read-many capability—when they create new Cloud Volumes ONTAP instances. This feature is powered by NetApp SnapLock and provides long-term snapshot retention that can be used for ransomware protection as well as for regulatory and compliance purposes.

The solution supports flexible deployment models that also take into consideration edge use cases. From Cloud Manager, customers can enable the Global File Cache service for branch locations, remote sites, or regional hyperscalers’ points of presence, to enable local-speed, low-latency access to centralized shares through a single global namespace with full global file locking capabilities.

Strengths: With the strength of NetApp ONTAP, the solutions offered are suited very well for the enterprise with a continuous data management plane, complemented by comprehensive and flexible deployment models. The NetApp scale-out file systems offer multi-cloud options and great monitoring, security, and management capabilities.

Challenges: In the enterprise NAS arena, NetApp maintains an enviable leadership position, and is now transforming its vision and business to become appealing to cloud users. That said, the company is now challenged by many solutions with modern and flexible architectures while it still has to demonstrate that it will be able to transform itself into a hybrid-cloud vendor.

Nutanix

Nutanix provides scale-out file storage capabilities through Nutanix Files, a distributed storage solution that supports NFS and SMB protocols, and runs on top of Nutanix Cloud Infrastructure (NCI). The solution can be deployed either as a standalone instance, as distributed scale-out storage for larger scale and better cost efficiency, or integrated alongside Nutanix HCI deployments. Nutanix Files scales linearly both in capacity and performance, and it can support clusters with up to 48 physical nodes per cluster, delivering up to tens of PBs in capacity. Based on NCI, Nutanix Files supports all Nutanix-compatible servers, which implies support for media types ranging from NVMe flash all the way to hard drives. Nutanix also supports storage-class memory devices, and Nutanix Files benefits from these efficiencies.

Nutanix Files supports object storage on-premises or in the cloud with any S3-compatible endpoint for data lifecycle management. The solution supports analytics-driven tiering, is capable of querying Nutanix Data Lens (more below) to understand data usage, and performs data movement based on predefined policies.

Besides on-premises deployments, organizations can consume Nutanix Files in the cloud through Nutanix’s AWS-based Enterprise Cloud. The solution will be available very soon on Microsoft Azure as well.

Nutanix Files is managed through Prism Central, Nutanix’s centralized management console. Prism Central allows management of multiple Nutanix clusters across locations, and it also manages standalone storage capabilities such as Nutanix Files, Nutanix Objects, and Nutanix Volumes. The platform is built with simplicity in mind and offers a modern interface to enable easier day-to-day management, faster deployment, and reduced maintenance activity.

Nutanix offers compelling data management capabilities with two products: Nutanix Data Lens and Nutanix Files Analytics. Nutanix File Analytics is an on-premises analytics platform that will meet the needs of smaller organizations due to its one-to-one relationship with a cluster. It supports audit trails, providing a historical overview of file access. The solution also includes an anomaly detection engine that monitors data activities such as mass file deletions or mass permission changes; administrators can define policies and alerts to get informed of potential threats. Finally, Nutanix Files Analytics is now capable of detecting and preventing ransomware attacks. Recently, and most importantly, Nutanix has designed and made available Nutanix Data Lens, an enterprise-grade, SaaS-based data governance platform. While it provides services similar to those of Nutanix File Analytics, the solution allows monitoring of Nutanix Files deployments globally, breaking the per-cluster limitation of Nutanix Files Analytics. Furthermore, Data Lens is capable of baselining normalized cluster behavior across thousands of deployments, and so provides better anomaly detection capabilities. Finally, it also embeds up-to-date malware signatures for better malware identification and mitigation.

Security features are comprehensive and include storage analytics, end-to-end encryption, role-based access, multi-factor authentication, and encrypted syslog integration. Furthermore, Nutanix offers Security Central, a security-oriented management platform that allows security monitoring and management across multiple Nutanix deployments. The solution also has strong support for compliance, with several standards supported (FIPS, DoDIN APL, etc.), and embeds a fully-fledged STIG compliance setup.

Nutanix Files can provide persistent storage to Kubernetes clusters through Nutanix’s unified CSI driver (which supports Nutanix Files and Nutanix Volumes). The driver supports dynamic NFS share creation and integrates with Prometheus to provide metrics around Kubernetes storage consumption.

Strengths: Nutanix Files delivers an impressive solution that meets expectations at almost every level. It offers multiple deployment options, embeds a comprehensive set of services, and provides compelling data analytics integrated with storage tiering.

Challenges: While not a major concern for enterprise use cases, the solution does not support GPUDirect.

OSNexus

OSNexus proposes QuantaStor, a software-defined scale-out file system built on Ceph that can be deployed on industry-standard servers through reference configurations. The solution offers unified block, file, and object storage capabilities on top of its storage grid technology, a globally distributed file system that can be managed as a single entity from anywhere.

The solution can be deployed on-premises or in the cloud through a virtual storage appliance. It supports all major media types such as NVMe/SAS flash and HDDs. Intel Optane storage-class memory and QLC 3D NAND are also supported. QuantaStor allows administrators to configure how media tiers are to be used (data, metadata or write log), to take advantage of each media’s inherent capabilities. The solution also supports the NVMe-oF protocol and integrates with Western Digital OpenFlex systems.

QuantaStor provides object storage and cloud integration through a NAS gateway that provides access to cloud-based S3 buckets through the SMB and NFS protocols. A feature called Backup Policies allows data replication and movement to cloud-based object storage. When files are moved, stubs are left behind and allow access to the cloud-based object.

To manage QuantaStor deployments, organizations can take advantage of a globally distributed management platform, which is available on all nodes and so can be accessed from anywhere. Although the management platform UI remains the preferred method of accessing OSNexus’ customer base, users also can leverage REST APIs, a CLI, and a Python Client interface to automate operations. In addition, QuantaStor can integrate with Grafana.

A feature branded Report Schedules provides basic data management capabilities, with information about where capacity is being used, what folders are top consumers, and other capacity-related metrics. The feature also provides a health report that flags systems with potentially risky configuration settings.

Security capabilities include role-based access control, multi-factor authentication, and multi-tenancy capabilities. It also supports encryption, compliance with audit logging, password policy enforcement, and integrated firewall management. Finally, the solution integrates S3 object locking to enforce data governance and compliance requirements.

Kubernetes support is relatively limited and relies on Ceph open-source modules that are included with Kubernetes distributions. QuantaStor provides high bandwidth through 2x 100 GbE interfaces per QuantaStor node for best performance and redundancy.

Strengths: A solidly engineered solution that takes advantage of Ceph and wraps it around enterprise-grade data services. One of the strongest value points of QuantaStor is its flexibility, both in deployment models and in terms of supported configurations and media types.

Challenges: Basic data management capabilities could be improved; Kubernetes support is very limited.

Pure Storage

Pure Storage proposes a unified fast file and object (UFFO) storage solution through its FlashBlade appliances. These appliances target unstructured file and object datasets on which organizations cannot afford to compromise between scale and performance. FlashBlade systems support a variety of use cases such as AI/ML, real-time analytics, cloud-native applications, data warehouses, HPC, EDA, and rapid data restores.

The solution is built for massive scalability, starting at seven blades (DirectFlash modules) in a single chassis, and scaling up to 150 blades across a total of 10 fully populated chassis. DirectFlash modules are proprietary NVMe flash modules that embed only flash chips, as the global Flash Translation Layer (FTL) is distributed and managed by the operating system, Purity//FB. Although the FlashBlade solution is architected around performance, Pure Storage has demonstrated its ability to deliver compelling $/GB in the primary storage space with QLC NAND on their FlashArray//C series. The FlashBlade platform could therefore benefit from future adaptations to deliver even greater capacity, while keeping performance acceptable through a combination of QLC-based DirectFlash modules and perhaps DirectCache modules based on Intel Optane storage-class memory.

Purity//FB implements multiple data efficiency mechanisms such as always-on inline deduplication, compression, and pattern removal, which can significantly improve the raw capacities given above. In addition to these, deep reduction algorithms can be applied to data at rest to further improve the data consolidation ratios provided by inline deduplication. Other data services include snapshots/clones (including SafeMode read-only snapshots) and advanced data replication capabilities; backups can be made to NFS targets or in the cloud to AWS S3 and Microsoft Azure Blob targets with Purity CloudSnap. FlashBlade supports asynchronous replication between FlashBlade systems, and data can be replicated to/from FlashArray systems or Cloud Block Store for AWS, an additional cloud integration capability of the solution. Unfortunately, the solution doesn’t offer simultaneous access to the same data through either file or object protocols.

Advanced management is provided by Pure1, a management platform common to all Pure Storage solutions, which combines AI-based analytics with AIOps and self-driving storage capabilities. Besides proactive monitoring and reporting of issues, Pure1 includes AI-driven recommendation capabilities that simulate the impact of net new workloads on an existing environment, and provide the ability to estimate storage costs for Pure-as-a-Service. Further, Pure1 can be used to assess whether SafeMode (immutable) snapshots are enabled across all Pure storage arrays, a feature also supported by FlashBlade systems. Pure1 also provides a unified set of REST APIs as well as a digital marketplace where organizations can consume Pure Storage products and services directly, including STaaS with the Pure-as-a-Service solution. Pure1 can report on a variety of basic data management criteria such as per-client statistics, storage consumption per user/groups, storage quotas, and overall capacity consumption.

Security-wise, FlashBlade supports SafeMode immutable snapshots (both on-premises and on S3), encryption, role-based access control, AD/Kerberos authentication, and multi-tenancy. Additional security capabilities include cross-protocol access security, NFS encryption, S3 IAM user policies, and audit logs.

FlashBlade systems provide advanced Kubernetes support capabilities, thanks to a deep integration of Portworx, Pure Storage’s flagship Kubernetes storage solution. Customers can use a cost-free, FlashBlade-specific version of Portworx Essentials for which the node count limit has been lifted. Organizations can start their cloud-native journey with Portworx Essentials directly on top of FlashBlade, without having to plan for additional investments, and they can seamlessly upgrade to Portworx Enterprise later as they advance through their journey and need to scale Kubernetes services.

Strengths: FlashBlade offers great scalability and outstanding performance, while it boasts a broad set of compelling capabilities. The solution seamlessly integrates with Pure Storage’s broader portfolio, acting either as a source or target, and benefits from other services such as the Pure1 management platform and Portworx Kubernetes services. Organizations will also find great value in Pure Storage’s STaaS offering.

Challenges: The solution doesn’t support concurrent object and file access, though this is not a major challenge for general-purpose, enterprise scale-out file system use cases.

Quantum

With StorNext, Quantum offers a massively scalable, software-defined scale-out file system that supports a broad choice of hardware appliances and performance tiers. The solution starts at 10 TB but can support a file system of up to 18 EB, with appliances scaling up to 98 PB each. From StorNext 7, the solution’s architecture has been virtualized and containerized. This enables easier upgrades and faster delivery of new features and capabilities, while it also decouples software from the underlying hardware platform. The solution supports a variety of media tiers including NVMe flash. It allows flexible composition of storage tiers by combining multiple media types in various pools, then allows operators to define data movement criteria to achieve optimal cost efficiencies. Although QLC NAND isn’t currently used in Quantum appliances, the H-Series systems make use of storage-class memory (NVMe-addressable NVDIMM modules) as a cache.

The solution supports object storage as a secondary storage tier, as a hot archive, or as cold data storage. It provides S3 bucket import capabilities, the ability to tag objects with additional, StorNext-related metadata (object location and name in source file system), and supports a broad set of on-premises object stores. In addition, Quantum’s CatDV solution integrates with StorNext and can index large sets of rich media data to make it available for subsequent searches. Object storage support also extends to the cloud with support for multiple services on AWS, Azure, and Google.

Cloud integration doesn’t end with cloud-based object storage support: StorNext can be deployed in AWS and uses EBS as its storage backend; Quantum also has plans to make its solution available on additional cloud providers. Also worth mentioning is the solution’s ability to support AWS Snowball and Microsoft Data Box.

Management capabilities are provided currently through two management interfaces: a newer one that is focused on primary storage systems, and an older one that focuses on secondary storage systems including tape management, long-term retention, and so on. These management solutions are complemented by Quantum Cloud-Based Analytics (CBA), a recently launched SaaS offering. CBA’s environment view is shared with Quantum’s support organization for remote support purposes, and it offers a comprehensive view of the managed environment, remote monitoring, and automated support.

Security features of StorNext include in-flight data encryption both on-premises and to the cloud. The solution also supports self-encrypting drives on some of its appliances, and encrypted tape, object, and cloud storage. StorNext integrates with common directory services, implements a granular role-based access control system, and maintains activity logs. The solution also supports cross-platform, operating system-level file immutability for Windows, Mac, and Linux filesystems. Immutable snapshots are not supported by StorNext.

Although based on a containerized architecture, the solution has no particular Kubernetes support capabilities.

Strengths: Quantum offers massive scalability and a broad choice of appliances and storage tiers, with comprehensive object storage support and emerging cloud integrations.

Challenges: Management platforms need to be consolidated to streamline storage management operations. There are some gaps, such as the lack of immutable snapshots, and an absence of support for Kubernetes. While the former is not a major concern, the lack of support for Kubernetes may become a challenge as the footprint of cloud-native workloads grows within organizations.

Qumulo

Qumulo has developed a software-defined, vendor-agnostic scale-out file system that can be deployed on-premises, in the cloud, or even delivered through hardware vendor partnerships. The solution supports hybrid and cloud-based deployments; it provides a comprehensive set of enterprise-grade data services branded Qumulo Core. These handle core storage operations (scalability, performance) as well as data replication and mobility, security, ransomware protection, data integration, and analytics.

The solution scales linearly, from both a performance and capacity perspective, providing a single namespace with limitless capacity that supports billions of large and small files and provides the ability to use nearly 100% of usable storage through efficient erasure code techniques. It also supports automatic data rebalancing when nodes or instances are added. The namespace allows for real-time queries and aggregation of metadata, greatly reducing search times. Qumulo supports NVMe flash as well as SATA/SAS SSD and HDDs.

Data protection and replication, as well as mobility use cases, are well covered and include snapshots and snapshot-based replication to the cloud, continuous replication, and DR support with failover capabilities. Qumulo SHIFT is a built-in data service that enables bidirectional data movements to and from AWS S3 object stores with built-in replication, including support for immutable snapshots, and providing organizations with more flexibility and better cost control.

Cloud services are delivered through Cloud Q, a set of solutions designed specifically for the cloud that leverage Qumulo Core services. Organizations can either deploy Cloud Q through their preferred public cloud marketplace (the solution supports AWS, Azure, and GCP) or choose to deploy Qumulo as a fully managed SaaS offering on Microsoft Azure. AWS Outposts is supported as well, and a comprehensive partnership with AWS is in place (WAF certification, AWS Quick Start, and so on). Qumulo is expanding its delivery models as well through storage-as-a-service partnerships with HPE GreenLake and others.

Qumulo can be managed through an on-cluster or a cloud-based interface. It also includes a comprehensive set of REST APIs that can be used not only to perform proactive management but also to automate file system operations. The solution comes with a powerful data analytics engine that provides real-time operational analytics (across all files, directories, metrics, users, and workloads), capacity awareness, and predictive capacity trends, with the ability to “time travel” through performance data.

The solution supports role-based access controls, as well as quotas to control access to and use of data. Software and hardware-based encryption are supported, as well as data at-rest encryption. Encryption is available also in the cloud by leveraging cloud provider capabilities. Advanced security features include ransomware protection snapshots and immutable snapshots replicated to the cloud, as well as audit logging to review user activity.

Qumulo offers standard Kubernetes support via the Kubernetes NFS persistent storage.

Strengths: Qumulo offers a comprehensive scale-out file system solution that is simple to manage and implement. It has a rich and complete data services set combined with a broad choice of deployment models including seamless hybrid and cloud-based deployments, making it one of the most flexible solutions currently available.

Challenges: Although the solution is very complete, some important features are currently still on the roadmap. Among these, data reduction improvements and multi-tenancy should be mentioned.

Quobyte

Quobyte offers a software-defined, scale-out file storage solution based on a parallel distributed, POSIX-compliant file system. The solution scales linearly in capacity and performance, providing full mesh communication with up to hundreds of thousands of clients, and thousands of Quobyte servers. It provides a single namespace common to all interfaces (Linux, Windows, S3, macOS, HDFS, NFS, etc.), and allows file and object access to the same datasets. The solution is engineered to support always-on operations. As a result, usual maintenance activities such as software updates, node addition/removal, hardware replacement, policy reconfiguration, and data movements are non-disruptive. Quobyte clusters support heterogeneous server configurations that can vary in specifications, generation, capacity, and models. Because Quobyte presents a single namespace, it pools local media on servers and offers transparent migration and tiering on top of NVMe, SSDs, and HDDs.

Although Quobyte offers multiprotocol access, tiering to object storage isn’t available yet and is currently on the roadmap. Organizations can deploy Quobyte in the cloud on Google and Oracle platforms directly through the marketplace. Availability on AWS and Azure is on the roadmap.

Although the solution follows an API-first approach, it can be managed also through an extensive web-based user interface, through command-line tools, and can integrate with Prometheus. Quobyte provides real-time performance analytics but does not boast any particular basic data management insights.

Quobyte supports end-to-end encryption, access keys, and TLS encryption. The solution is multi-tenant, and access can be restricted to specific users, volumes, and tenants. At the file system level, unified ACLs are enforced across all protocols. Finally, file immutability (OS-level) and immutable snapshots are supported.

The solution supports Kubernetes through a CSI plugin that provides volumes with quotas, snapshots, an access key, and multi-tenancy support. The Quobyte solution can be deployed itself through containers on Kubernetes, and the company provides a Helm chart to simplify installation and updates of Quobyte in containerized environments.

Strengths: Quobyte offers linear scalability and great flexibility on top of a robust architecture. Organizations will appreciate its multiprotocol support as well as its non-disruptive operations model.

Challenges: Multiple key capabilities for scale-out file systems are either limited or absent. Among these, object storage and cloud integration are crucial features that Quobyte needs to develop further.

Red Hat

With CephFS, Red Hat provides scale-out file storage capabilities on top of its Ceph software-defined object storage platform. The solution is available under an open-source model and presents an interesting architecture. On the backend, data and metadata are stored in separate storage pools in RADOS, Ceph’s distributed object store. The solution supports snapshots, mirroring, and quotas. Regarding new media type support, CephFS supports a broad range of storage media tiers, including NVMe flash, QLC NAND, and storage-class memory.

Although based on object storage, CephFS currently provides only limited object storage integration. Cloud integration capabilities are also missing.

Available on top of Ceph, CephFS benefits from system management capabilities of the Ceph platform. Besides a comprehensive set of CLI commands and API-based integrations, CephFS boasts a simple and easy to use management dashboard through Cephadm; however, basic data management capabilities remain limited.

There are no particular security features specific to CephFS, as most of the security capabilities are developed and implemented at Ceph’s object storage layer.

CephFS supports Kubernetes and allows volumes to be dynamically provisioned or existing volumes to be mounted to pods. The solution supports mounting volumes to multiple pods to be accessed by multiple writers simultaneously.

There is no GPUDirect support in CephFS, however community-supported initiatives allow Ceph (and potentially also CephFS) to interconnect GPUs and storage through RDMA.

Strengths: CephFS is an interesting niche solution that will appeal to organizations already leveraging Ceph and looking for scale-out file storage options. CephFS also has strong integration points with Red Hat OpenShift.

Challenges: Integration with object storage, cloud integration, and basic data management capabilities are three major areas for improvement where further development is needed to fully unlock the potential of CephFS.

Scality SOFS

Scality SOFS (for Scale-Out File System) provides a scale-out file system implementation based on Scality RING. Each Scality RING on which Scality SOFS runs offers a global namespace, making global metadata searches possible within that RING.

Organizations considering Scality SOFS should take into account the specificity of the solution and its focus on high throughput and sequential workloads. Use cases that are ideal for Scality SOFS include data lakes for massive log retention, long-term data retention of medical imaging data, and as HPC storage systems, all solutions for which scale and aggregate throughput or sequential access are essential.

Besides the on-premises offering, a cloud-based solution is built to run specifically on Azure. The cloud-based SOFS implementation consists of a POSIX layer integrated into an object-based storage architecture that uses a database (Azure CosmosDB) for a POSIX representation of objects (metadata) into the object store (Azure Blob Storage). SOFS is deployed via stateless VM images for Azure Cloud, and the solution can be scaled as necessary by adding more VMs. This solution offers high aggregate throughput via the SMB, NFS, and FUSE protocols, with hundreds to thousands of Mbps of throughput per interface, and is positioned for throughput and sequential access workloads.

Object storage is at the heart of Scality SOFS since the file system runs on top of object storage, and the cloud implementation relies on Azure Blob Storage as its storage backend. Although only Azure Blob Storage is used, this cloud object storage solution has multiple tiers, and Scality SOFS is able to take advantage of the various tiers to optimize the balance between performance and cost.

SOFS allows concurrent data access from multiple storage connectors including SMBv3, NFSv4, Linux FUSE, and REST. It supports an unlimited number of volumes and files, offers multi-tenancy by using different volumes (namespaces providing logical data separation), and integrates a distributed lock manager to ensure consistent views. A volume protection feature allows data to become read-only automatically after a specific period of time (which can be configured upfront) and implements WORM semantics. Soft and hard quotas can be set up at the volume, user, and group levels.

Management is performed through the RING Supervisor GUI, which provides standard analytics capabilities. This GUI also manages SOFS volumes and connectors and provides insights into the infrastructure as well as utilization metrics that can be used for capacity planning (although an integrated capacity planning function doesn’t exist). Metrics can be exported to Grafana for organization-wide monitoring. Quota enforcement can be monitored through the management interface, and basic data management metrics such as user/group usage can also be visualized.

The solution embeds several security features including role-based access control, file system versioning and volume protection, authentication through Kerberos for NFS, and Active Directory for NFS. Encryption at rest is supported and integrates with key management systems. Protection against ransomware is implemented through immutable (WORM) volumes, a feature that is configurable at the volume level.

Kubernetes support is currently absent. Due to the specificity of the solution, GPUDirect support is neither supported nor relevant.

Strengths: A robust scale-out file system solution specifically designed for sequential and throughput-intensive workloads, Scality SOFS provides an excellent cloud implementation; it takes advantage of inexpensive object storage to deliver massive scalability, parallelism, and aggregated throughput.

Challenges: The solution’s niche focus and limited capabilities make it unsuitable for general-purpose deployments. The cloud edition of Scality SOFS currently supports only Microsoft Azure.

SoftIron

With its HyperDrive solution, SoftIron implements a scale-out file storage platform based on CephFS (a file system built on top of Ceph distributed object storage), SoftIron Linux, and an optimized hardware platform. The approach is unique in several aspects. SoftIron uses a dedicated, low-power hardware architecture based on ARM processors to improve efficiency, maximize I/O, and reduce heat and power consumption. Additionally, the company puts great emphasis on controlling its supply chain and maintaining traceability of physical components and software, offering an auditable value chain that will appeal to both organizations operating in sensitive verticals as well as government agencies. Appliances are available in NVMe flash, SATA flash, and hybrid configurations; NVMe-based systems leverage AMD EPYC processors but use in-house layering to optimize NVMe traffic.

Based on Ceph’s object-native storage system, HyperDrive supports native S3 access to objects as well as several file systems. Cloud integration is limited to standard file and application integration methods, as well as using cloud storage endpoints as tiering or replication targets.

System management is performed through SoftIron’s Storage Manager, a simple and straightforward management interface with modern characteristics. Although Storage Manager provides real-time and historical capacity and throughput and I/O consumption metrics, the solution’s multi-tenant security design prevents direct access to tenant data or data structures, and so it cannot report any data insights.

HyperDrive supports a standard set of security capabilities including Active Directory/LDAP authentication, role-based access control, and end-to-end in-flight and at-rest encryption (either software or hardware-based), as well as immutable snapshots. SoftIron has a key differentiator in the security area due to its tightly-controlled manufacturing and supply chain auditing, offering customers a transparent and comprehensive overview of the solution and its components, whether physical or software-based.

SoftIron supports several container infrastructure integrations through Kubernetes CSI, Rancher, and OpenStack plug-ins.

Strengths: SoftIron proposes a solid implementation of CephFS on top of efficient hardware, delivering a scale-out file system that is tuned for performance. Its strong focus on security and supply chain control and auditing makes the solution ideally suited to support sensitive projects and initiatives and particularly appeals to government agencies.

Challenges: Cloud integration is very limited and Kubernetes integration is average at best. Basic data management capabilities are absent and cannot be implemented under the current architecture design, preventing organizations from obtaining valuable insights on data usage.

VAST Data

VAST Data offers a modern, massively scalable storage architecture built around new flash media technologies that can deliver exabyte-scale file and object storage. The solution currently uses Intel Optane and QLC flash but is also capable of supporting new SCM solutions, such as Kioxia FL6 SCM devices, as they enter the market. This is now possible because VAST Data has provided its solution under a disaggregated storage model since early 2021. Branded Gemini, this model empowers customers to order hardware from AVNET at cost (without additional markup), and procure the SDS solution directly from VAST Data, which then takes care of the integration. In fact, the solution is deployed as a managed service, and an L3 engineer is assigned to each customer as a “co-pilot.”

From an architectural standpoint, VAST Data provides a universal storage plane that delivers file and object capabilities across a single high-performance storage tier backed by high-capacity QLC flash and ultra-low latency, high endurance, and high throughput storage-class memory. The solution leverages NVMe-oF and introduces a disaggregated, shared-everything design composed of VAST server containers running the logic of VAST (providing a global namespace accessible through NFS, NFS over RDMA, SMB, S3, and containers), and VAST NVMe enclosures providing high-density flash storage (combining SCM and QLC flash). Those components are interconnected by a low-latency interconnect through 100 GbE connectivity or InfiniBand. An efficient implementation of data writes (via large sequential stripes) significantly reduces media wear on QLC drives and allows VAST Data to provide a 10-year guarantee on QLC flash durability.

VAST Data has added several capabilities to its solution recently, including native replication, backups to S3-compatible object storage, support for NFSv4, new snapshot policies (including “indestructible” snapshots that can be unlocked through multi-factor authentication), and telemetry/call home data. Worth noting, the Snap to Object feature allows replication of data to any S3-compatible object store, whether in the cloud (AWS and Azure) or on-premises. The data is compressed and stored in large objects for higher efficiency.

The solution offers a modern cloud-based management platform, which boasts a comprehensive set of data management features such as data flow visualization. Data flow allows users to understand how data flows across their entire system and view those flows by users, locations, data types, destinations, and more. Other data management capabilities include capacity use projections and a dynamic wheel of data usage. Besides the modern GUI, a full REST API is also available.

Security features include granular role-based access control (organized around multiple realms of management), data-at-rest encryption, and ransomware protection.

VAST Data provides a Kubernetes CSI driver that is used by over 30% of its customer base. The driver provides NFS over RDMA support for improved performance, and implements a concept of storage “pools” that can be assigned to specific clusters or containers. This concept supports policy-based communication (which containers or clusters can talk to which pools) and supports QoS as well. In addition, Red Hat OpenStack is supported through a specific plug-in branded Manilla.

GPUDirect support is available with an early implementation that takes advantage of VAST Data’s NFS multipathing capabilities, using all ports available in a GPU and using them at maximum speed, then upstreaming them in the Linux NFS client. This multipathing makes the solution particularly suited to cater to GPU-based workloads.

Strengths: VAST Data implements a breakthrough architecture built around cutting-edge new flash media types, providing a single storage tier that does not compromise between performance and capacity. The solution offers a broad set of capabilities and modern management tools, while it fully embraces emerging use cases such as containers and GPU-based workloads.

Challenges: Other than object storage integrations, cloud support capabilities are currently limited.

6. Analyst’s Take

The scale-out file storage market is very active. Roadmaps show a general trend toward hybrid cloud use cases, more data management capabilities, and greater attention to security (including ransomware protection).

The rise of hybrid cloud use cases can be seen in two ways:

  1. Integration with object storage is being looked at with greater scrutiny. Platforms that integrate with object storage offer better opportunities for cost optimization, and some offerings are capable of analyzing demand or access patterns of certain data sets and subsequently automating data movement to either local or cloud-based object tiers, including long-term retention.
  2. Organizations want to bring data closer to cloud-based workloads, but also to weave cloud locations into their distributed data fabric, or ensure certain data sets are served from specific cloud regions.

Today, almost all solutions support several storage tiers such as NVMe flash, SAS/SATA SSDs, and HDDs. Some vendors are looking beyond these storage options and incorporating new flash media types such as storage-class memory for improved performance, as well as QLC NAND to improve capacity. There is a clear distinction between solutions that embrace new technologies (and some that are outright built upon them), and others whose approach is more cautious. Concerns around storage-class memory are caused primarily by the availability of a single solution on the market (Intel Optane), and this should no longer be a concern once storage-class memory based on Compute Express Link (CXL) hits the market. The other concern is around QLC NAND low durability, although the real challenge lies in re-architecting solutions to find efficient methods to write to QLC drives while avoiding media wear.

For the enterprise segment, data management capabilities are becoming crucial. As the focus shifts away from storage and moves toward extracting the value of data, organizations critically require data insights to consume scale-out file storage services optimally. Those capabilities tie into the earlier points around cloud and object storage integration: with data management capabilities, organizations can make better-informed decisions around data placement.

Security is another vital attribute. It is now so essential that it could soon become table stakes: every organization is rightfully expecting storage solutions to protect their data in the most comprehensive way possible. Among the key expectations, role-based access control and data encryption (in flight and at rest) should be mandatory. Interestingly, a rise in ransomware protection capabilities (starting with immutable snapshots) can be observed. This is often coupled with advanced analytics platforms capable of detecting anomalies.

Finally, Kubernetes integration should not be overlooked. Container-based workloads, while not yet predominant in enterprises, are becoming the default way of developing new applications, and most modern workloads (including GPU-based computing, AI/ML, and DL) use Kubernetes. Scale-out file systems are a treasure trove from which existing datasets can be reused and fed into these modern, container-based workloads to identify potential new outcomes and extract more valuable findings.

7. About Enrico Signoretti

Enrico Signoretti

Enrico Signoretti has more than 25 years in technical product strategy and management roles. He has advised mid-market and large enterprises across numerous industries, and worked with a range of software companies from small ISVs to global providers.

Enrico is an internationally renowned expert on data storage—and a visionary, author, blogger, and speaker on the topic. He has tracked the evolution of the storage industry for years, as a Gigaom Research Analyst, an independent analyst, and as a contributor to the Register.

8. About Max Mortillaro

Max Mortillaro

Max Mortillaro is an independent industry analyst with a focus on storage, multi-cloud & hybrid cloud, data management, and data protection.

Max carries over 20 years of experience in the IT industry, having worked for organizations across various verticals such as the French Ministry of Foreign Affairs, HSBC, Dimension Data, and Novartis to cite the most prominent ones. Max remains a technology practitioner at heart and currently provides technological advice and management support, driving the qualification and release to production of new IT infrastructure initiatives in the heavily regulated pharmaceutical sector.

Besides publishing content/research on the TECHunplugged.io blog, Gestalt IT, Amazic World, and other outlets, Max is also regularly participating in podcasts or discussion panels. He has been a long-time Tech Field Day Alumni, former VMUG leader, and active member of the IT infrastructure community. He has also continuously been running his own technology blog kamshin.com since 2008, where his passion for content creation started.

Max is an advocate for online security, privacy, encryption, and digital rights. When not working on projects or creating content, Max loves to spend time with his wife and two sons, either busy cooking delicious meals or trekking/mountain biking.

9. About Arjan Timmerman

Arjan Timmerman

Arjan Timmerman is an independent industry analyst and consultant with a focus on helping enterprises on their road to the cloud (multi/hybrid and on-prem), data management, storage, data protection, network, and security. Arjan has over 23 years of experience in the IT industry and worked for organizations across various verticals such as the Shared Service Center for the Dutch Government, ASML, NXP, Euroclear, and the European Patent Office to just name a few.

Growing up as an engineer and utilizing that knowledge, Arjan currently provides both technical and business architectural insight and management advice by creating High-Level and Low-Level Architecture advice and documentation. As a blogger and analyst at TECHunplugged.io blog, Gestalt IT, Amazic World, and other outlets, Arjan is also from time to time participating in podcasts, discussion panels, webinars, and videos. Starting at Storage Field Day 1 Arjan is a long-time Tech Field Day Alumni, former NLVMUG leader, and active member of multiple communities such as Tech Field Day and vExpert.

Arjan is a tech geek and even more important he loves to spend time with his wife Willy, his daughters Rhodé and Loïs and his son Thomas sharing precious memories on this amazing planet.

10. About GigaOm

GigaOm provides technical, operational, and business advice for IT’s strategic digital enterprise and business initiatives. Enterprise business leaders, CIOs, and technology organizations partner with GigaOm for practical, actionable, strategic, and visionary advice for modernizing and transforming their business. GigaOm’s advice empowers enterprises to successfully compete in an increasingly complicated business atmosphere that requires a solid understanding of constantly changing customer demands.

GigaOm works directly with enterprises both inside and outside of the IT organization to apply proven research and methodologies designed to avoid pitfalls and roadblocks while balancing risk and innovation. Research methodologies include but are not limited to adoption and benchmarking surveys, use cases, interviews, ROI/TCO, market landscapes, strategic trends, and technical benchmarks. Our analysts possess 20+ years of experience advising a spectrum of clients from early adopters to mainstream enterprises.

GigaOm’s perspective is that of the unbiased enterprise practitioner. Through this perspective, GigaOm connects with engaged and loyal subscribers on a deep and meaningful level.

11. Copyright

© Knowingly, Inc. 2021 "GigaOm Radar for Enterprise Scale-Out File Systems" is a trademark of Knowingly, Inc. For permission to reproduce this report, please contact sales@gigaom.com.