Table of Contents
File storage is a critical component of every hybrid cloud strategy, and enterprises often prefer it over block and object storage, in particular for big data, artificial intelligence (AI), and collaboration. We therefore decided to focus our assessment of the cloud-based file storage sector on two areas: big data and AI in this report on cloud file systems, and collaboration in our companion Radar on distributed cloud file storage.
Cloud providers didn’t initially offer file storage services, and this spurred multiple storage vendors to jump in with products and services to fill that gap. The requirements that emerged during the COVID-19 pandemic are still relevant: with the increasing need for data mobility and the large number of workloads moving across on-premises and cloud infrastructures, file storage is simply better—easier to use and more accessible than other forms of storage.
Lift-and-shift migrations to the cloud are increasingly common scenarios, and enterprises often want to keep the environment as identical as possible to the original one. File storage is a key factor in accomplishing this, but simplicity and performance are important as well.
File systems still provide the best combination of performance, usability, and scalability for many workloads. It is still the primary interface for the majority of big data, artificial intelligence/machine learning (AI/ML), and high-performing computing (HPC) applications, and today it usually offers data services such as snapshots to improve data management operations.
In recent years, file systems also have become more cloud-friendly, showing better integrations with object storage, which enables better scalability, a better balance of speed and cost, and advanced features for data migration and disaster recovery.
Both traditional storage vendors and cloud providers now offer file services or solutions that can run both on-premises and in the cloud. Their approaches are different, though, and it can be very difficult to find a solution that both meets today’s needs and can evolve to face future challenges. Cloud providers generally offer the best integration across the entire stack but also raise the risk of lock-in, and services are not always the best in class. On the other hand, solutions from storage vendors typically provide more flexibility, performance, and scalability but can be less efficient or lack the level of integration offered by an end-to-end solution.
This GigaOm Radar report highlights key cloud file systems vendors and equips IT decision-makers with the information needed to select the best fit for their business and use case requirements. In the corresponding GigaOm report “Key Criteria for Evaluating File-Based Cloud Storage Solutions,” we describe in more detail the key features and metrics that are used to evaluate vendors in this market.
How to Read this Report
This GigaOm report is one of a series of documents that helps IT organizations assess competing solutions in the context of well-defined features and criteria. For a fuller understanding, consider reviewing the following reports:
Key Criteria report: A detailed market sector analysis that assesses the impact that key product features and criteria have on top-line solution characteristics—such as scalability, performance, and TCO—that drive purchase decisions.
GigaOm Radar report: A forward-looking analysis that plots the relative value and progression of vendor solutions along multiple axes based on strategy and execution. The Radar report includes a breakdown of each vendor’s offering in the sector.
Solution Profile: An in-depth vendor analysis that builds on the framework developed in the Key Criteria and Radar reports to assess a company’s engagement within a technology sector. This analysis includes forward-looking guidance around both strategy and product.
2. Market Categories and Deployment Types
For a better understanding of the market and vendor positioning (Table 1), we assess how well solutions for cloud file systems are positioned to serve specific market segments:
- Small-to-medium enterprise: In this category, we assess solutions on their ability to meet the needs of organizations ranging from small businesses to medium-sized companies. Also assessed are departmental use cases in large enterprises where ease of use and deployment are more important than extensive management functionality, data mobility, and feature set.
- Large enterprise: Here, offerings are assessed on their ability to support large and business-critical projects. Optimal solutions in this category will have a strong focus on flexibility, performance, data services, and features to improve security and data protection. Scalability is another big differentiator, as is the ability to deploy the same service in different environments.
- Specialized: Optimal solutions will be designed for specific workloads and use cases, such as big data analytics and high-performance computing.
In addition, we recognize two deployment models for solutions in this report: cloud-only (SaaS), and hybrid and multicloud.
- SaaS: The solution is available in the cloud as a managed service. Often designed, deployed, and managed by the service provider or the storage vendor, it is available only from that specific provider. The big advantages of this type of solution are its simplicity and the integration with other services offered by the cloud service provider.
- Hybrid and multicloud solutions: These solutions are meant to be installed both on-premises and in the cloud, allowing customers to build hybrid or multicloud storage infrastructures. Integrating with a single cloud provider could be limited compared to the other option and more complex to deploy and manage. On the other hand, these solutions are more flexible, and the user usually has more control over the entire stack with regard to resource allocation and tuning. They can be deployed in the form of a virtual appliance, like a traditional NAS filer but in the cloud, or as a software component that can be installed on a Linux virtual machine (VM)—that is, a file system.
Table 1. Vendor Positioning
|Small-to-Medium Enterprise||Large Enterprise||Specialized||SaaS||Hybrid & Multicloud|
|Exceptional: Outstanding focus and execution|
|Capable: Good but with room for improvement|
|Limited: Lacking in execution and use cases|
|Not applicable or absent|
3. Key Criteria Comparison
Building on the findings from the GigaOm report “Key Criteria for Evaluating File-based Cloud Storage Solutions,” Table 2 summarizes how each vendor included in this research performs in the areas we consider differentiating and critical in this sector. Table 3 follows this summary with insight into each product’s evaluation metrics—the top-line characteristics that define the impact each will have on the organization.
The objective is to give the reader a snapshot of the technical capabilities of available solutions, define the perimeter of the market landscape, and gauge the potential impact on the business.
Table 2. Key Criteria Comparison
|Global Namespace||Hybrid & Multicloud||Integration with Object Storage||Data Management||Analytics||Advanced Security||Edge Deployments|
|Exceptional: Outstanding focus and execution|
|Capable: Good but with room for improvement|
|Limited: Lacking in execution and use cases|
|Not applicable or absent|
Table 3. Evaluation Metrics Comparison
|Architecture & Scalability||Flexibility||Efficiency||Performance||Manageability & Ease of Use||Security|
|Exceptional: Outstanding focus and execution|
|Capable: Good but with room for improvement|
|Limited: Lacking in execution and use cases|
|Not applicable or absent|
By combining the information provided in the tables above, the reader can develop a clear understanding of the technical solutions available in the market.
4. GigaOm Radar
This report synthesizes the analysis of key criteria and their impact on evaluation metrics to inform the GigaOm Radar graphic in Figure 1. The resulting chart is a forward-looking perspective on all the vendors in this report based on their products’ technical capabilities and feature sets.
The GigaOm Radar plots vendor solutions across a series of concentric rings, with those set closer to the center judged to be of higher overall value. The chart characterizes each vendor on two axes—balancing Maturity versus Innovation and Feature Play versus Platform Play—while providing an arrow that projects each solution’s evolution over the coming 12 to 18 months.
Figure 1. GigaOm Radar for Cloud File Systems
As you can see in the Radar chart in Figure 1, vendors are mostly Fast Movers or Outperformers. Most vendors are primarily investing in under-the-hood performance and stability improvements while also focusing on their core differentiators. These trends are highlighted by the distribution of the vendors across the Innovation half of the Radar and the arrow lengths.
In the lower-right quadrant are solutions that address the GigaOm key criteria with an innovative approach. Among these, NetApp continues to deliver an enterprise-grade cloud file system based on NetApp ONTAP, which offers seamless interoperability across clouds and on-premises deployments, now also on Google Cloud Platform (GCP). With NetApp BlueXP, a next-generation unified management solution that includes storage and data services, organizations can orchestrate multicloud operations and enable data services via a single console, regardless of where their data is stored.
The WEKA Data Platform still remains a solution of choice for high-performance workloads such as AI/ML, HPC, and high-frequency trading (HFT). The company is working on a comprehensive roadmap to further expand capabilities and use cases.
Qumulo offers an enterprise-grade solution with a broad set of services covering data replication, data mobility, data integration, and analytics. Services such as Qumulo SHIFT are now in production, and the company has a good roadmap, though some of the planned improvements are taking time to materialize.
Nasuni’s solution has been enriched with advanced ransomware protection (detection and remediation) capabilities and has been re-architected to provide a modular interface that allows pluggable data services to better interact with its core filesystem. While data management capabilities are missing, a recent acquisition should address this gap in 2023. In a similar approach, Panzura has redesigned its solution to separate the core filesystem from data services. The solution offers great ransomware protection and includes data management capabilities. It’s worth noting that both Nasuni and Panzura are primarily aimed at distributed cloud file storage use cases. Though they can also address cloud file systems requirements, their respective architectures may pose challenges in addressing the performance, throughput, and latency requirements of demanding workloads typically running on cloud file systems.
Finally, Hammerspace provides one of best global namespace implementations and focuses primarily on the performance and stability of its platform, with improvements in custom metadata handling and tagging, as well as a significant increase in the number of deployment sites and a notable technology partnership with Snowflake.
In the lower-left quadrant are the Feature-Play solutions, those that only partially cover some of the GigaOm key criteria. These include four hyperscalers—Amazon Web Services (AWS), Microsoft Azure, GCP, and Oracle Cloud Infrastructure (OCI)—as well as one software-defined storage solution, ObjectiveFS. The positioning of the hyperscalers in this area reflects that each has a more or less comprehensive portfolio of cloud file system services, though those services are intended for specific use cases and thus don’t always include the full spectrum of key criteria.
AWS offers the most comprehensive portfolio, with services oriented at cloud-native file storage (Amazon EFS), compatibility with existing file systems (Amazon FSx for NetApp ONTAP, Windows Server, Lustre, and OpenZFS, the latest addition to the FSx family), and analyzing data no matter where it resides, including hybrid cloud workloads with the new Amazon File Cache. Each of these services is targeted at specific use cases and persona within enterprises, and organizations will need a good understanding of Amazon’s approach to make the most informed choices. As a first-party solution, FSx for NetApp ONTAP offers the highest level in terms of enterprise-grade capabilities and multicloud interoperability, while EFS provides fully elastic file storage for cloud-native workloads. It’s worth noting that all of the FSx file systems can be managed within a single console using Amazon CloudWatch.
Microsoft offers several cloud file system solutions—Azure NetApp Files, Azure Files, and Azure Blob, each with multiple performance and cost tiers—that deliver great flexibility in terms of consumption and deployment options. Among these, the most mature enterprise-grade offering is Azure NetApp Files, a Microsoft and NetApp collaboration with almost global availability. Microsoft’s other offerings have more limited namespace sizes, but the company is working on increasing those limits.
OCI offers File Storage, which was built around strong data durability, high availability, and massive scalability. What’s interesting about this solution currently is its robustness and its focus on data management, while other areas need further development to catch up with the competition. Oracle also offers compelling options for HPC-oriented organizations with its OCI HPC File System stacks, and it provides an OCI ZFS image as well.
Finally, GCP has expanded its cloud file system solution, which, in addition to Filestore, now offers NetApp Cloud Volumes for Google Cloud as a SaaS option. Filestore has been improved in the last year, but the solution could be expanded further. On the other hand, the availability of a NetApp solution on Google Cloud provides a significant credibility jump by bringing enterprise-grade cloud file storage features directly to GCP customers.
The only non-hyperscaler in this area of the Radar, ObjectiveFS’s solution has primarily shown under-the-hood stability and performance improvements. Designed with a focus on demanding workloads, and with support for on-premises, hybrid, and cloud-based deployments, the solution will suit organizations seeking strong performance, though it does lack data management capabilities.
The third group of vendors are in the Maturity half of the Radar. IBM Spectrum Scale, the cloud file system with the greatest longevity, continues to demonstrate its relevance with steady improvements, including snapshot immutability, ransomware protection, and containerized S3 access services for high-performance workloads, with concurrent file and object access. DDN maintains its strong focus on AI and HPC workloads with an updated Lustre-based Exascaler EXA6 appliance, which delivers scalability, performance, and multitenancy—key capabilities for these types of workloads. The company also announced new reference architectures developed with NVIDIA.
Finally, Zadara enters the mature Platform-Play area, moving from its position last year in the Innovative half of the Radar. Zadara delivers its solution as a service primarily via managed service providers (MSPs), with some deployments directly on customer premises. Its comprehensive platform integrates storage, compute, and networking with great analytics and excellent data mobility and data protection options. In the cloud file system space, improvements in the past year have primarily focused on stability and performance, but the company has a promising roadmap in terms of monitoring and data management capabilities.
Inside the GigaOm Radar
The GigaOm Radar weighs each vendor’s execution, roadmap, and ability to innovate to plot solutions along two axes, each set as opposing pairs. On the Y axis, Maturity recognizes solution stability, strength of ecosystem, and a conservative stance, while Innovation highlights technical innovation and a more aggressive approach. On the X axis, Feature Play connotes a narrow focus on niche or cutting-edge functionality, while Platform Play displays a broader platform focus and commitment to a comprehensive feature set.
The closer to center a solution sits, the better its execution and value, with top performers occupying the inner Leaders circle. The centermost circle is almost always empty, reserved for highly mature and consolidated markets that lack space for further innovation.
The GigaOm Radar offers a forward-looking assessment, plotting the current and projected position of each solution over a 12- to 18-month window. Arrows indicate travel based on strategy and pace of innovation, with vendors designated as Forward Movers, Fast Movers, or Outperformers based on their rate of progression.
Note that the Radar excludes vendor market share as a metric. The focus is on forward-looking analysis that emphasizes the value of innovation and differentiation over incumbent market position.
5. Vendor Insights
Amazon offers a robust portfolio of file-based services including Amazon FSx, Amazon EFS, and Amazon File Cache. Amazon FSx provides a fully compatible cloud-based extension of existing and broadly popular file systems, enabling workload portability and hybrid cloud capabilities. Amazon EFS delivers storage for cloud-native workloads that follow elastic, bottomless, and serverless principles. Finally, Amazon File Cache enables data sharing across cloud resources and locations.
The Amazon FSx family provides four popular file system offerings: Windows File Server, NetApp ONTAP, Lustre, and OpenZFS. These services can be managed and monitored through the Amazon FSx console, which provides unified monitoring capabilities across all four FSx offerings.
Amazon FSx for Windows File Server provides fully managed, native Windows file-sharing services using the server message block (SMB) protocol. It supports user quotas, user-initiated file restores, and Windows access control lists. The service integrates with Windows-based Active Directory (AD) or AWS Microsoft-managed AD and leverages distributed file system (DFS) to provide single namespace capabilities.
FSx for NetApp ONTAP provides a proven, enterprise-grade cloud file storage experience for the AWS platform that is natively integrated into the AWS consumption model. It offers superior technical capabilities thanks to NetApp’s presence on all major cloud platforms. FSx for ONTAP supports network file system (NFS), SMB, and Internet Small Computer System Interface (iSCSI), and can be deployed across multiple availability zones (AZs, One Zone storage classes) through an active-standby model that supports synchronous replication. In case an AZ becomes unavailable, failover/failback operations are automated and transparent. The solution follows NetApp ONTAP principles and scales to multiple petabytes in a single namespace.
Amazon FSx for Lustre implements a POSIX-compliant file system that natively integrates with Linux workloads and is accessible by Amazon EC2 instances or on-premises workloads. The solution is linked with AWS S3 buckets, whose objects are then transparently presented as files. Applications can manipulate those objects as files while FSx automatically ensures changes are committed in the object back end.
Recently added, Amazon FSx for OpenZFS brings OpenZFS native capabilities to the cloud, including snapshots, cloning, and compression. The offering simplifies migration of OpenZFS-based data-intensive Linux workloads to AWS.
Amazon EFS is a massively parallel file storage solution based on the NFS 4.1 protocol that is consumable as a fully managed service. The service is designed around cloud-native principles to deliver an elastic, bottomless, and serverless experience. It provides shared access to thousands of workloads (EC2 instances, ECS, EKS, Fargate, and Lamba), making it particularly suited for scaling from gigabytes to petabytes of data without needing to provision storage to handle latency-sensitive workloads with high throughput requirements.
The solution can scale up to petabytes, and file systems can be hosted within a single AZ or across multiple AZs when applications require multizone resiliency. EFS offers two storage classes: EFS Standard and EFS Standard-IA (infrequent access). Data is available through a single storage namespace and is moved transparently between tiers based on file usage patterns, an automatic tiering feature that reduces storage costs for infrequently accessed files. Centralized data protection capabilities are available with AWS Backup, and data mobility is supported with AWS DataSync.
Finally, Amazon File Cache, introduced at the end of September 2022, aims to provide a high-speed cache for datasets stored anywhere, whether in the cloud or on-premises. The solution provides a single, unified view across multiple datasets, whether data is stored on NFS file systems or S3 buckets, effectively acting as a global namespace, and provides high-performance access to cached data with high throughput and low latency.
Analytics and monitoring are handled through several endpoints. The AWS console provides a general overview, with specialized consoles such as the Amazon EFS or Amazon FSx consoles presenting metrics about current usage, the number of mount targets, and lifecycle state. Logging and auditing can be performed through the AWS Cloudwatch consoles.
Some of the services provide snapshot immutability (which can be used for ransomware mitigation and recovery), but configuration, recovery, and orchestration must be performed manually.
Strengths: Amazon offers an extensive set of cloud file storage solutions that can address the needs of a broad spectrum of personas and use cases, providing great flexibility and compatibility with popular file system options through its FSx service while also delivering a cloud-native experience with EFS and hybrid cloud options with Amazon File Cache.
Challenges: Amazon’s extensive portfolio requires organizations to properly understand the company’s offerings and the alignment of services to specific use cases. The rich ecosystem is also complex, with analytics and management capabilities. The platform can deliver outstanding value, but it requires a thorough understanding of its full potential.
DDN’s EXAScaler delivers a parallel file system that provides performance, scalability, reliability, and simplicity. With it, DDN offers a data platform that enables and accelerates a wide range of data-intensive workflows at scale.
DDN EXAScaler was created with the fast and scalable open parallel file system Lustre, which is the most popular file system for scale-out computing and has been proven and hardened in the most demanding HPC environments. Lustre and EXAScaler continue to be developed by a very active, dedicated, and talented team, most of whom now work at DDN.
The DDN EXAScaler appliances combine the parallel file system software with a fast hyperconverged data storage platform in a package that’s easy to deploy and is managed and backed by the leaders in data at scale. Built with AI and HPC workloads in mind, DDN excels in graphics processing unit (GPU) integration, with the first GPU-direct integration. The EXAScaler client is deployed into the GPU node, enabling remote direct access memory (RDMA) access as well as the monitoring of application access patterns from the GPU client all the way to the disk, providing outstanding workload visibility. DDN is also the only certified/supported storage for NVIDIA DGX SuperPOD, a feature that allows DDN customers to run the solution as a hosted AI cloud. In September 2022, DDN also revealed new reference architectures to support NVIDIA DGX Pod and DGX SuperPOD.
DDN EXAScaler’s fast parallel architecture enables scalability and performance, supporting low-latency workloads and high-bandwidth applications such as GPU-based workloads, AI frameworks, and Kubernetes-based applications. Moreover, the DDN EXAScaler solution can grow with data at scale and its intelligent management tools manage data across tiers.
Data security is built in, with secure multitenancy, encryption, end-to-end data protection, and replication services baked into the product and providing a well-balanced solution to the customer. In addition, Lustre’s capabilities around changelog data and audit logs are built into the EXAScaler product, providing better insights for customers into their data. Unfortunately, ransomware protection is not yet completely incorporated into the solution.
Besides the physical EXA6 appliance, a cloud-based solution branded EXAScaler Cloud runs natively on AWS, Azure, and GCP, and can be obtained easily from each cloud provider’s marketplace. Features such as cloud sync enable multicloud and hybrid data management capabilities within EXAScaler for archive, data protection, and bursting of cloud workloads.
Also worth a mention is DDN DataFlow, a data management platform that is tightly integrated with EXAScaler. Although it’s a separate product, a broad majority of DDN users rely on DataFlow for platform migration, archiving, data protection use cases, data movement across cloud, repatriation, and so forth.
Strengths: DDN EXAScaler is built on top of the Lustre parallel file system and offers a scalable and performant solution that gives its customers a secure and flexible system with multitenancy, encryption, replication, and more. The solution particularly shines thanks to its outstanding GPU integration capabilities, an area where DDN is recognized as a leader.
Challenges: Ransomware protection capabilities are missing.
Previously limited to Google Filestore, cloud file system offerings on GCP now also include NetApp Cloud Volumes for Google Cloud.
Based on NetApp Cloud Volumes ONTAP, NetApp Cloud Volumes for Google Cloud is a fully managed, cloud-native data storage service that brings NetApp enterprise-grade capabilities directly to GCP. The solution is fully compatible with first-party NetApp services at other hyperscalers as well as any on-premises or cloud-based ONTAP 9.11 deployments, allowing seamless data operations in a true multicloud fashion. NetApp Cloud Volumes on GCP can also be managed through NetApp BlueXP, which is comprehensively covered in the NetApp solution review.
Google Filestore is a fully managed NAS solution for the Google Compute Engine and GKE-powered Kubernetes instances. The solution, which supports the NFSv3 protocol, focuses on high-performance workloads, scales up to hundreds of terabytes, and is available on four service tiers (Basic HDD, Basic SSD, High Scale SSD, and Enterprise), each with different capacity, throughput, and input/output operations per second (IOPS) characteristics. Worth noting, the High Scale SSD and Enterprise tiers also allow customers to scale down capacity if no longer required.
The solution is native to the Google Cloud environment and is therefore not available on-premises or on other cloud platforms. It doesn’t provide a global namespace; instead, customers get one namespace of up to 100 TB per share, depending on each tier’s provisionable capacity limit.
Filestore has an incremental backup capability (available on Basic HDD and SSD tiers) that provides the ability to create backups within or across regions. Backups are globally addressable, allowing restores in any GCP region. There are currently no data recovery capabilities on the High Scale SSD tier (neither backups nor snapshots), while the Enterprise tier supports snapshots and availability at the regional level. Unfortunately, the Enterprise tier can only scale up to 10 TiB per share.
Google recommends organizations leverage ecosystem partners for enterprise-grade data protection capabilities. Data mobility capabilities primarily rely on command-line tools such as remove sync (rsync) or secure copy (scp), and these tools can also be used to copy data to cloud storage buckets, Google’s object storage solution. For larger capacities, customers can use Google Cloud Transfer Appliance, a hardened appliance laden with security measures and certifications. Google also offers a Storage Transfer Service that helps customers perform easier data transfers or data synchronization activities, but capabilities appear to be limited compared to data migration and replication tools available in the market.
Filestore includes a set of REST application programming interfaces (APIs) that can be used for data management activities. Data analytics provides basic metrics and the ability to configure alerts.
The solution implements industry-standard security features, but there are no capabilities for auditing user activities (except manually parsing logs) or for protecting against ransomware. Organizations can, however, create Google Cloud storage buckets with the Bucket Lock functionality and use data mobility tools to copy data to the object store.
Note that several vendors included in this report allow their cloud file systems to run on GCP. This provides greater flexibility to organizations leveraging GCP as their public cloud platform, though none of these solutions are currently provided as a first-party service (which would be normally operated and billed to the customer by Google).
Strengths: Google cloud file storage capabilities are improving, partly thanks to the availability of NetApp Cloud Volumes. Filestore is an interesting solution for organizations that rely heavily on the GCP. It provides a native experience with high throughput and sustained performance for latency-sensitive workloads.
Challenges: Although several improvements were made over the last year, Google Filestore still lacks maturity in several areas and provides limited scalability.
Hammerspace’s parallel global file system helps overcome the siloed nature of hybrid cloud file storage by providing a single file system regardless of a site’s geographic location or whether storage provided by any storage vendor is on-premises or cloud-based, and by separating the control plane (metadata) from the data plane (where data actually resides). It is compliant with several versions of the NFS and SMB protocols and includes RDMA support for NFSv4.2.
The solution lets customers automate through objective-based policies providing the ability to use, access, store, protect, and move data around the world through a single global namespace, and the user has no need to know where the resources are physically located. The product is based on the intelligent use of metadata across file system standards and includes telemetry data (such as IOPS, throughput, and latency) as well as user-defined and analytics-harvested metadata, allowing users or integrated applications to rapidly view, filter, and search the metadata in place instead of relying on file names. Hammerspace now also supports user-enriched metadata through the Hammerspace Metadata Plugin, a Windows-based application that allows users to create custom metadata tags directly within their Windows graphical user interface (GUI). Custom metadata will be interpreted by Hammerspace and can be used not only for classification but also to create data placement, disaster recovery, or data protection policies.
Hammerspace can be deployed on-premises or to the cloud, with support for AWS, Azure, GCP, Seagate Lyve, and several other cloud platforms. It implements share-level snapshots as well as comprehensive replication capabilities, allowing files to be replicated automatically across different sites through the Hammerspace Policy Engine. Manual replication activities are available on-demand as well. These capabilities allow organizations to implement multisite, active-active disaster recovery with automated failover and failback. Scalability has been improved, with twice the number of sites supported in multisite deployments and up to 100 object buckets supported.
Integration with object storage is also a core capability of Hammerspace because data can be replicated or saved to the cloud as well as automatically tiered on object storage, thus reducing the on-premises data footprint and leveraging cloud economics to keep storage spend under control.
One of Hammerspace’s key features is its “Autonomic Data Management” ML engine. This runs a continuous market economy simulation that, when combined with telemetry data from a customer’s environment, helps make real-time, cross-cloud data placement decisions based on performance and cost. Although Hammerspace categorizes this feature as data management, in the context of the GigaOm report “Key Criteria for Evaluating Cloud-Based File Storage Solutions,” this capability is more related to the key criteria for hybrid and multicloud and integration with object storage.
Ransomware protection is offered by Hammerspace through immutable file shares with global snapshot capabilities as well as an undelete function and file versioning, allowing users to revert to a file version not affected by ransomware-related data corruption. Auditing has also been improved, with a now-global audit trail capability.
It’s also worth noting the solution’s availability in the Azure Marketplace as well as its integration with Snowflake, which allows Snowflake analytics to run directly on Hammerspace without having to move data to the Snowflake cloud. Hammerspace is also partnering with Seagate on the release of a Corvault-based appliance capable of running Hammerspace at the edge.
Strengths: Hammerspace’s Parallel Distributed Global File System offers a very balanced set of capabilities with replication and hybrid and multicloud capabilities through the power of metadata.
Challenges: Ransomware detection capabilities are missing.
IBM Spectrum Scale
IBM Spectrum Scale offers a scalable and flexible software-defined storage solution that can be used for high-performance cloud file storage use cases. Built on the robust and proven IBM global parallel file system (GPFS), the product can handle several building blocks on the back end: IBM nonvolatile memory express (NVMe) flash nodes, Red Hat OpenShift nodes, capacity, object storage, and multivendor NFS nodes.
The solution offers several file interfaces, such as SMB, NFS, POSIX-compliant, and HDFS (Hadoop), as well as an S3-compatible object interface, making it a versatile choice for environments with multiple types of workloads. Data placement is taken care of by the IBM Spectrum Scale clients, which spread the load across storage nodes in a cluster. The company recently introduced containerized S3 access services for high-performance, cloud-native workloads and now also supports concurrent file and object access.
The solution offers a single, manageable namespace and migration policies that enable transparent data movement across storage pools without impacting the user experience.
IBM Spectrum Scale supports remote sites and offers various data caching options as well as snapshot support and multisite replication capabilities. The solution includes policy-driven storage management features that allow organizations to automate data placement on the various building blocks based on the characteristics of the data and the cost of the underlying storage. It includes a feature called Transparent Cloud Tiering that allows users to tier files to cloud object storage with an efficient replication mechanism.
The solution includes a management interface that provides monitoring capabilities for tracking data usage profiles and patterns. Comprehensive data management capabilities are provided through an additional service, IBM Watson Data Discovery.
The latest release of IBM Spectrum Scale includes file audit logging capabilities to track user access across all protocols and platforms, a key security requirement for modern cloud file systems. The latest release also includes a snapshot retention mechanism that prevents snapshot deletion at the global and fileset level, effectively bringing immutability, and thus basic ransomware protection capabilities, to the platform. The expiration time flag requires file systems to be upgraded to version 5.1.5 or later to function. In addition, Spectrum Scale also works with IBM Safeguarded Copy technology, a solution that uses Spectrum Scale immutable snapshots to orchestrate data protection and recovery against ransomware attacks.
The solution continues to be popular within the HPC community, and IBM also positions Spectrum Scale as an optimized solution for AI use cases. Finally, IBM Spectrum Fusion, a containerized version of Spectrum Scale (and also consumable in an HCI deployment model), enables edge use cases.
Strengths: IBM Spectrum Scale continues to see active development and a steady pace of releases with noteworthy improvements, such as ransomware protection capabilities. Built upon a solid foundation, the solution (released 24 years ago) is still very relevant. It offers multiple enterprise-grade capabilities and will cater to organizations looking to support diverse storage needs in a unified high-performance platform. The product has excellent multiple-platform capabilities that extend beyond x86 architectures.
Challenges: An area for improvement could be bundling the excellent IBM Watson Data Discovery service with IBM Spectrum Scale. Currently, the service is an add-on solution that incurs an extra charge. Advanced analytics capabilities are lacking and must be developed.
Microsoft offers a number of SaaS-based cloud file storage solutions through its Azure Storage portfolio, which aims to address different use cases and customer requirements. Three solutions are available to customers: Azure Blob, Azure Files, and Azure NetApp Files. In addition, Microsoft also offers Azure File Sync, a file synchronization solution that allows data synchronization to Azure by installing a software client on existing servers without the need for additional hardware. Azure Sync can also be installed on Azure or even in VMs residing in other clouds.
Azure Blob provides file-based access (REST, NFSv3.0, and HDFS via the ABFS driver for big data analytics) to an object storage back end with a focus on large, read-heavy sequential access workloads, such as large-scale analytics data, backing up and archiving, media rendering, and genomic sequencing. This solution offers the lowest storage cost among Microsoft’s cloud file storage solutions and includes several performance tiers.
The second solution, Azure Files, uses the same hardware as Azure Blob but implements full POSIX filesystem support with the NFSv4.1 protocol (as well as REST, API, and SMB). The solution is oriented toward random access workloads, ideally with in-place data updates, and consists of four performance tiers: cool, hot, transaction optimized, and premium (SSD-based), each addressing specific requirements. Azure Files and Azure Sync (covered below) integrate with Windows File Server. Azure Files can also be used to store user profile data for organizations using the Azure Virtual Desktop service.
Third in the portfolio, Azure NetApp Files consists of a first-party solution jointly developed by Microsoft and NetApp, using ONTAP running on NetApp bare metal systems, and fully integrated in the Microsoft Azure cloud. This solution offers all the benefits customers expect from NetApp, among which are enterprise-grade features, full feature parity with on-premises deployments, and other public cloud offerings based on NetApp ONTAP. Like Azure Blob, it also comes with several performance tiers. Azure NetApp Files is available in nearly all geographical regions supported by Azure—41 regions across the globe at the time of writing this report.
Global namespaces are supported with Azure File Sync through the use of DFS Namespaces, but there is no global namespace capability available to federate the various solutions and tiers offered across the Azure cloud file storage portfolio.
Besides Azure File Sync, Azure offers a variety of data replication and redundancy options. Redundancy can be set up locally or at the availability zone level. Geo-redundancy is possible either within one zone or across multiple zones. Currently, only shares under 5 TB are supported with Azure Files for geo-redundancy, but Azure NetApp Files offers greater flexibility. Backup and restore across multiple regions is also available with Azure Backup, but users need to take into account possible additional costs when restoring across regions.
Putting aside Azure NetApp Files, which relies on ONTAP, both Azure Blob and Azure Files are based on an object storage back end. Automated tiering capabilities are partially present across the Azure Blob and Azure Files offerings: Azure Blob offers lifecycle management and a policy-based automated tiering solution, while the on-premises Azure File Sync solution enables offloading of on-premises files to the cloud. For its part, Azure NetApp Files offers very comprehensive data movement, migration, and tiering options through its own BlueXP unified management interface, which also extends deep into data services, data management, ransomware protection capabilities, and data-at-rest encryption, including logging.
The storage portfolio offers rich integration capabilities through APIs for data management purposes. Observability and analytics are handled via the Azure Monitor single-pane-of-glass management interface, which also incorporates Azure Monitor Storage Insights. Storage Insights allows the organization to view macro-level information around storage usage at scale, but it also lets users drill down into particular storage accounts for in-depth metrics (such as latency or transactions per second) or to diagnose issues. A global view of capacity usage and detailed logs is also offered.
Azure Files services provide incremental read-only backups as a way to protect against ransomware. Up to 200 snapshots per share are supported: retention of these is up to 10 years when snapshots are taken using Azure Backup, or forever for snapshots taken with the Azure Files native API. The capability comes with a soft delete feature that acts as a recycle bin and allows the recovery of accidentally deleted shares within a certain timeframe (file item recovery can be performed using snapshots).
The on-premises Azure File Sync solution can be used for edge deployments, although in this case, edge would refer primarily to remote office/branch office use cases. It is also common to deploy it in various clouds to act as a local cache for distributed cloud environments. For example, primary data could reside in the US while Azure File Sync would be used in Europe.
Strengths: Microsoft offers a broad portfolio with multiple options, protocols, use cases, and performance tiers that allow organizations to consume cloud file storage in a cost-efficient manner. It also offers enterprise-grade multicloud capabilities with its first-party Azure NetApp Files solution.
Challenges: There are no global namespace management capabilities to abstract the underlying file share complexity for the end user. There are also limitations based on the different share types, although Microsoft is working on increasing maximum volume sizes. The various offerings can appear very complex and therefore intimidating for smaller organizations.
Nasuni offers a SaaS solution for enterprise file services, with an object-based global file system as its main engine and with many familiar file interfaces, including SMB and NFS. It is integrated with all major cloud providers and works with on-premises S3-compatible object stores.
Nasuni recently changed its platform to extend its non-core capabilities and take a modular approach to data services. The solution now consists of one core platform with add-on services across multiple areas, including ransomware protection and hybrid work, with data management and content intelligence services planned. Many Nasuni customers implement the solution to replace traditional NAS systems and Windows File Servers, and its characteristics enable users to replace several additional infrastructure components as well, such as backup, disaster recovery, data replication services, and archiving platforms.
Nasuni offers a global file system called UniFS, which provides a layer that separates files from storage resources, managing one master copy of data in public or private cloud object storage while distributing data access. The global file system manages all metadata—such as versioning, access control, audit records, and locking—and provides access to files via standard protocols such as SMB and NFS. Files in active use are cached using Nasuni’s Edge Appliances, so users benefit from high-performance access through existing drive mappings and share points. All files, including files in use across multiple local caches, have their master copies stored in cloud object storage so they are globally accessible from any access point.
The Nasuni Management Console delivers centralized management of the global edge appliances, volumes, snapshots, recoveries, protocols, shares, and more. The web-based interface can be used for point-and-click configuration, but Nasuni also offers a REST API method for automated monitoring, provisioning, and reporting across any number of sites. In addition, the Nasuni Health Monitor reports to the Nasuni Management Console on the health of the central processing unit (CPU), directory services, disk, file system, memory, network, services, NFS, SMB, and so on. Nasuni also integrates with tools like Grafana and Splunk for further analytics. Data management capabilities are currently absent, but Nasuni’s purchase of data management company Storage Made Easy in June 2022 hints at notable improvements in this area in the coming months.
Nasuni provides ransomware protection in its core platform through Continuous File Versioning and its Rapid Ransomware Recovery feature. To further shorten recovery times, the company recently introduced Nasuni Ransomware Protection as an add-on paid solution that augments immutable snapshots with proactive detection and automated mitigation capabilities. The service analyzes malicious extensions, ransom notes, and suspicious incoming files based on signature definitions that are pushed to Nasuni Edge Appliances, automatically stops attacks, and gives administrators a map of the most recent, clean snapshot to restore from. A future iteration of the solution (on the roadmap) will implement AI/ML-based analysis on edge appliances.
Nasuni Edge Appliances are lightweight VMs or hardware appliances that cache frequently accessed files using SMB or NFS access from Windows, macOS, and Linux clients with performance. They can be deployed on-premises or in the cloud to replace legacy file servers and NAS devices. They encrypt and dedupe files, then snapshot them at frequent intervals to the cloud, where they are written to object storage in read-only format.
The Nasuni Access Anywhere add-on service provides local synchronization capabilities, secure and convenient file sharing (including sharing outside of the organization), and full integration with Microsoft Teams. Finally, the edge appliances also provide search and file acceleration services.
Strengths: Nasuni offers a great and efficient distributed file system solution that is secure and scalable, and excellent protection against ransomware. In addition, the edge appliances allow customers to access their frequently used data in a fast and secure way.
Challenges: The solution focuses primarily on distributed data and data availability, whereas cloud file systems are tuned primarily to deliver high performance, high throughput, and low latency to performance-oriented workloads. Data management capabilities are currently missing but are on Nasuni’s roadmap.
NetApp continues to deliver a seamless experience across on-premises and public cloud environments with BlueXP, a unified control plane that comprises multiple storage and data services delivered via a single SaaS-delivered multicloud control plane.
Among services offered in NetApp BlueXP, customers can find not only Cloud Volumes ONTAP (CVO), based on NetApp’s ONTAP technology, but also first-party services on hyperscalers such as AWS (Amazon FSx for NetApp ONTAP), Azure (Azure NetApp Files), and NetApp Cloud Volume Services for Google Cloud. BlueXP also supports a host of other data services, such as observability, governance, data mobility, tiering, backup and recovery, edge caching, and operational health monitoring. It also supports ONTAP 9.10 and beyond deployments in all public clouds in addition to on-premises deployments.
Cloud Volumes implements a global namespace that abstracts multiple deployments and locations regardless of distance. Several intelligent caching mechanisms combined with global file-locking capabilities enable a seamless, latency-free experience that makes data accessible at local access speeds from local cache instances.
Based on ONTAP, Cloud Volumes has been architected to support hybrid deployments natively, whether on-premises or in the cloud. Tiering, replication, and data mobility capabilities are outstanding and enable a seamless, fully hybrid experience that lets organizations decide where primary data resides, where infrequently accessed data gets tiered to, and where data copies and backups used for disaster recovery should be replicated to. It’s worth noting that all of these operations can be handled directly from the BlueXP management interface without having to access each public cloud console, drastically reducing the time spent on usually tedious operations.
Integration with object storage is a key part of the solution, and policy-based data placement allows automated, transparent data tiering on-premises with NetApp StorageGRID, or in the cloud with AWS S3, Azure Blob Storage, or Google Cloud Storage, with the ability to recall requested files from the object tier. Object storage integration also extends to backup and DR use cases. With Cloud Backup, backup data can be written to object stores using block-level, incremental-forever technology.
Data management capabilities are enabled by consistent APIs that allow data copies to be created as needed. The platform also offers strong data analytics features in all scanned datastores, with no requirements for them to be Cloud Volumes or ONTAP through Cloud Manager, which has integrated dashboards fed by NetApp’s Cloud Insights service, and particularly through Cloud Data Sense, one of Cloud Manager’s accessible services. This service provides insights around data owners, location, access frequency, and data privileges, as well as potential access vulnerabilities, with manual or automated policy-based actions. Organizations have the ability to generate compliance and audit reports such as DSARs; HIPAA and GDPR regulatory reports also can be run in real time on all Cloud Volumes data stores.
The BlueXP platform provides advanced security measures against ransomware and suspicious user or file activities when combined with the native security features of ONTAP storage. A new feature is the Ransomware Protection dashboard, available in BlueXP, which monitors security and user behavior to help identify risks and threats and instruct how to improve an organization’s security posture and remediate attacks.
From a remediation perspective, immutable point-in-time NetApp snapshot copies provide the ability to revert to a healthy state. Organizations can also enable Cloud WORM, an additional write-once, read-many capability, when they create new Cloud Volumes ONTAP instances. This feature is powered by NetApp SnapLock and provides long-term snapshot retention that can be used not only for ransomware protection but also for regulatory and compliance purposes. To further enhance security for separation of duties, Cloud Volumes ONTAP includes multiple-admin verification (MAV), which provides the ability to require additional administrators to approve potentially destructive commands like volume and snapshot deletion.
The solution supports flexible deployment models that also take into consideration edge use cases. From Cloud Manager, customers can enable the Global File Cache service for branch locations, remote sites, or regional hyperscalers’ points of presence to enable local-speed, low-latency access to centralized shares through a single global namespace with full global file locking capabilities.
NetApp moved away from trial versions of CVO, enabling unrestricted full use up to a certain capacity level; the freemium model does not apply to all services. Organizations can test the services on the freemium tier at no cost for indefinite periods, then upgrade as the need arises. The company has implemented a digital wallet, which would create a fungible currency of digital entitlements and allow organizations to exchange their unused license entitlements for other licenses they may need or float those entitlements to a different platform (for example, from AWS to Azure).
Strengths: NetApp’s cloud file system portfolio shines with a complete enterprise-grade feature set, flexible deployment models, and ubiquitous service availability across public clouds. NetApp BlueXP offers next-level management and orchestration capabilities complemented by a host of SaaS-based data services, which significantly enhances the business value of NetApp’s cloud file system offerings and sets the bar high for the industry to follow.
Challenges: Although not necessarily a challenge, NetApp’s offering and ecosystem are very rich and comprehensive. Without proper guidance, some organizations might feel intimidated.
ObjectiveFS is a cloud file storage platform that supports on-premises, hybrid, and cloud-based deployments. Its POSIX file system can be accessed as one or many directories by clients, and it uses an object store on the back end. Data is written directly to the object store without any intermediate servers. ObjectiveFS runs locally on servers through client-side software, providing local-disk-speed performance. The solution scales simply and without disruption by adding ObjectiveFS nodes to an existing environment. The solution is massively scalable to thousands of servers and petabytes of storage.
Notable updates in 2022 revolve around architecture optimizations and performance improvements, such as support for the Intel AVX-512 instruction set and for the ARM Neon architecture.
ObjectiveFS offers a global namespace where all of the updates are synchronized through the object store back end. The solution supports cloud-based and on-premises S3-compatible object stores, such as IBM Public Cloud, Oracle Cloud, Minio, AWS S3, Azure, and GCP, allowing customers to select either the Azure native API or the S3-compatible API. Support for AWS Outposts as well as S3 Glacier Instant Retrieval was added in version 7 of ObjectiveFS.
ObjectiveFS uses its own log-structured implementation to write data to the object store back end by bundling many small writes together into a single object. The same technique can then be used for read operations by accessing only the relevant portion of the object. The solution also uses a method called compaction, which bundles metadata and data into a single object for faster access. Storage-class-aware support ensures that policies can be used to implement intelligent data tiering and move data across tiers based on usage. To ensure performance requirements are met, ObjectiveFS offers several levels of caching that can be used together.
Users can deploy the solution across multiple locations (multiregion and multicloud). The solution delivers flexible deployment choices, allowing storage and compute to run in different locations and across different clouds.
The solution currently offers no data management capabilities and relies on third-party integrations to analyze, reprocess, or augment data. Built-in analytics provide latency heatmaps as well as performance-tuning information, and the solution also logs access to data, including access to data types, access requests, and cache hit rate.
ObjectiveFS provides comprehensive security features, such as data in-flight and at-rest encryption. The solution supports multitenancy, and therefore, data is encrypted using separate encryption keys, making it accessible only to the tenant that owns the data. It also protects against ransomware through its log-structured implementation and built-in immutable snapshots.
One of ObjectiveFS’s most interesting features is the inclusion of a workload adaptive heuristics mechanism that supports hundreds of millions of files and tunes the file system to ensure consistent performance is delivered regardless of the I/O activity profile (read versus write, sequential versus random) or the size of the files, handling many small files or large terabyte-sized files at the same performance levels.
Strengths: ObjectiveFS provides a highly scalable and robust solution at consistent performance levels regardless of the data type. It delivers flexible deployment options and comes with excellent security and multitenancy features.
Challenges: Although well optimized for performance and regularly improved in this area, the solution currently lacks any data management capabilities.
Oracle Cloud Infrastructure
Oracle provides cloud file system options via three offerings: OCI File Storage, Oracle HPC File System stacks, and Oracle ZFS.
File Storage is the cloud file system solution developed by Oracle on its OCI platform. Delivered as a service, the solution provides an automatically scalable, fully managed elastic file system that supports the NFSv3 protocol and is available in all regions. Up to 100 file systems can be created in each availability domain, and each of those file systems can grow up to 8 exabytes. Optimized for parallelized workloads, the solution also focuses on high availability and data durability, with five-way replication across different fault domains. The solution implements an interesting mechanism named “eventual overwrite” for data deletion. Each file is created in the file system with its own encryption key. By default, all file systems are encrypted using Oracle-managed encryption keys, but customers can choose to encrypt all file systems with their own keys and manage those keys with the OCI Vault service. When a file is deleted, the key is destroyed and the file becomes inaccessible. The same mechanism is used for an entire file system deletion. Periodically, inaccessible files and file systems are purged to free space and eradicate residual data.
Oracle File Storage supports snapshots as well as clones. The clone feature allows a file system to be made available instantaneously for read and write access while inheriting snapshots from the original source, making copies immediately available for test and development use cases, allowing organizations to significantly reduce the time needed to create copies of their production environment for validation purposes. No backup feature currently exists, although third-party tools can be used to copy data across OCI domains, regions, OCI Object Storage, or on-premises storage.
Data management capabilities primarily reside in the use of REST APIs. These can be combined with the clone feature to automate fast copy provisioning operations that execute workloads on copies of the primary data sets. A management console provides an overview of existing file systems and provides usage and metering information, both at the file system and mount target levels. Administrators also get a view of system health with performance metrics and can configure alarms and notifications in the general monitoring interface for OCI.
Currently, ransomware protection and user activity monitoring are not available on Oracle File Storage. Through Oracle’s Dedicated Region offering, all OCI services can be deployed on the customer premises. Connectivity to Oracle Cloud is made possible through a FastConnect private connection and the OCI-Azure Interconnect. Oracle also offers a no-cost storage gateway for basic file interactions on-premises, with back-end connectivity to OCI object storage.
OCI HPC File System stacks are dedicated to high-performance computing workloads in organizations that require the use of traditional HPC parallel file systems such as BeeGFS, IBM Spectrum Scale, GlusterFS, or Lustre. An open-source license is used when the file system supports it; otherwise, the customer has to cover licensing costs through a bring-your-own-license (BYOL) model. OCI allows those HPC File System stacks to be deployed via a wizard and includes Terraform automation support. Although the feature set depends on the file system used, OCI can provide up to 500 GB/s of bandwidth. The offering has been recently enriched with support for three new high-performance file systems: BeeOND (BeeGFS on demand over RDMA), NFS File Server with High Availability, and Quobyte. The first two are available via the Oracle Cloud Marketplace Stacks (via a web-based GUI) and Terraform-based templates, while Quobyte is available only via Terraform-based templates.
Organizations also can opt for the Oracle ZFS image option, a marketplace image that can be configured as bare metal or a VM and supports ZFS, now also available in a highly available format (ZFS-HA). Each image can scale to 1024 TB, providing support for NFS and SMB with AD integration. The solution fully supports replication, snapshots, clones, and cloud snapshots, with several DR options. Oracle also provides sizing options to select the optimal image based on the expected number of clients. This service operates under a BYOL model to which the organization adds the cost of compute and block storage.
All three offerings include encryption and a key management system that also supports multitenant key management.
Strengths: OCI offers an attractive palette of cloud file services, starting with Oracle File Storage, which will be particularly attractive to organizations building on top of the OCI platform. HPC-related file system stacks are a great alternative to DIY deployments, making those popular file systems easily deployable to better serve cloud-based HPC workloads.
Challenges: The offerings are currently limited in several areas where improvements would be welcome, notably around advanced data protection features and monitoring capabilities.
Panzura offers a cloud file system, CloudFS. The solution works across sites (public and private clouds) and provides a single data plane with local file operation performance, automated file locking, and immediate global data consistency. Recently, the Panzura solution was redesigned to provide a modular architecture that will gradually allow more data services to seamlessly integrate with the core Panzura platform.
The solution implements a global namespace and tackles data integrity requirements through a global file-locking mechanism that provides real-time data consistency regardless of where a file is accessed from around the world. It also provides efficient snapshot management with version control and allows administrators to configure retention policies as needed. Besides the high-performance characteristics of the solution, backup and disaster recovery capabilities are offered as well.
Panzura relies on S3 object stores and supports a broad range of object storage solutions, whether in the public cloud or on-premises. A feature called Cloud Mirroring enables multiple back-end capabilities by writing data to a second cloud storage provider to ensure data is always available, even if a failure occurs at one of the cloud storage providers. Tiering and archiving also are implemented in Panzura.
Analytics capabilities are offered through Panzura Data Services, a set of advanced features that provide global search, user auditing, one-click file restoration, and monitoring functions aimed at core metrics and storage consumption, showing, for example, frequency of access, active users, and the health of the environment. For data management, Panzura provides various API services that allow users to connect their data management tools to Panzura. Panzura Data Services also allows the detection of infrequently accessed data so that subsequent action can be taken.
Security capabilities include user auditing (through Panzura Data Services) as well as ransomware protection. Ransomware protection is handled with a combination of immutable data (a WORM S3 back end) and read-only snapshots taken every 60 seconds at the global filer level, regularly moving data to the immutable object store, and allowing seamless data recovery in case of a ransomware attack through the same mechanism an organization would use to recover data under normal circumstances (backup). These are complemented with Panzura Protect, which currently supports detection of ransomware attacks and delivers proactive alerts. In the future, Panzura Protect will also support end-user anomaly detection to detect suspicious activity.
The solution also includes a Secure Erase feature that removes all versions of a deleted file and subsequently overwrites the deleted data with zeros, a feature available even with cloud-based object storage.
One of the new capabilities of the solution is Panzura Edge Access, which extends Panzura’s CloudFS directly to users’ local machines.
Strengths: Panzura provides a cloud file system that offers local-access performance levels with global availability, data consistency, tiered storage, and multiple back-end capabilities. Panzura Data Services delivers advanced analytics and data management capabilities that help organizations better understand and manage their data footprint.
Challenges: The Panzura solution has been architected primarily as a distributed cloud file storage solution and is also good for general purpose enterprise workloads. While it technically meets the requirements for a cloud file system and provides local-grade performance, the solution can’t meet demanding high-performance workload requirements for which high throughput, high IOPS, and ultra-low latency are essential.
Qumulo has developed a software-based, vendor-agnostic cloud file system that can be deployed on-premises, in the cloud, or even delivered through hardware vendor partnerships. The solution provides a comprehensive set of enterprise-grade data services branded Qumulo Core. These handle core storage operations (scalability, performance) as well as data replication and mobility, security, ransomware protection, data integration, and analytics.
Qumulo supports hybrid and cloud-based deployments. Cloud services are delivered through Cloud Q, a set of solutions designed specifically for the cloud that leverage Qumulo Core services. Organizations can either deploy Cloud Q through their preferred public cloud marketplace (the solution supports AWS, Azure, and GCP) or choose to deploy Qumulo as a fully managed SaaS offering on Microsoft Azure. AWS Outposts is also supported, and a comprehensive partnership with AWS is also in place. Qumulo is expanding its delivery models as well through storage as a service (STaaS) partnerships with HPE GreenLake and others.
The solution scales linearly, from both a performance and a capacity perspective, providing a single namespace with limitless capacity that supports billions of large and small files and provides the ability to use nearly 100% of usable storage through efficient erasure code techniques. It also supports automatic data rebalancing when nodes or instances are added. The namespace enables real-time queries and aggregation of metadata, greatly reducing search times.
Data protection and replication and mobility use cases are well covered and include snapshots and snapshot-based replication to the cloud, continuous replication, and disaster recovery support with failover capabilities. Qumulo SHIFT is a built-in data service that moves data to AWS S3 object stores with built-in replication, including support for immutable snapshots. With SHIFT, Qumulo allows bidirectional data movements to and from S3 object storage, providing organizations with more flexibility and better cost control.
Qumulo includes a comprehensive set of REST APIs that can be used not only to perform proactive management but also to automate file system operations. The solution comes with a powerful data analytics engine that provides real-time operational analytics (across all files, directories, metrics, users, and workloads), capacity awareness, and predictive capacity trends, with the ability to “time travel” through performance data. With Qumulo Command Center, organizations can also easily manage their Qumulo deployments at scale through a single management interface.
Advanced security features include read-only snapshots that can be replicated to the cloud, as well as audit logging to review user activity. WORM (immutable) snapshots and snapshot-locking capabilities (protection against deletion) should be introduced in early 2023. Integrations with third-party single sign-on providers such as Okta and Duo should also be available soon.
Several key capabilities or core components of cloud file systems are currently on Qumulo’s roadmap. Among others, the company is working on implementing data compression, extending its global namespace with geographically distributed deployments, and adding native S3 protocol support. Multitenancy capabilities should be available by the end of Q4 2022.
Strengths: Qumulo offers a comprehensive cloud file storage solution that is simple to manage and implement. Its rich and complete data services set, combined with a broad choice of deployment models, makes it one of the most compelling cloud file storage offerings currently available.
Challenges: Although the solution is very rich, some important features are currently still on the roadmap. Among these, data reduction improvements should be mentioned.
The WEKA Data Platform is a cloud file system architecture that delivers all-flash-array performance in a software-defined storage solution capable of running on-premises or in the cloud or both. The platform is built on the WEKA File System (WekaFS) and is deployed as a set of containers; it offers multiple deployment options on-premises (bare metal, containerized, virtual) as well in the cloud, with support for AWS, GCP, and OCI. All deployments can be managed through a single management console regardless of their location and offer a single data platform with mixed workload capabilities and multiprotocol support (SMB, NFS, S3, POSIX, GPU Direct, and Kubernetes CSI).
The solution implements a global namespace that spans performance tiers with automatic scalability in the cloud, presenting users with a unified namespace that abstracts the underlying complexity and enables transparent background data movements between tiers.
The global namespace natively supports and expands into S3 object storage bidirectionally, thanks to dynamic data tiering, which automatically pushes data to the object tier when capacity runs low on the NVMe flash tier. Both tiers (flash-based and object) can scale independently. A feature called Snap-To-Object allows data and metadata to be committed to snapshots for backup, archive, and asynchronous mirroring. This feature can also be used for cloud-only use cases in AWS, GCP, and OCI to pause or restart a cluster, protect against single availability zone failure, or migrate file systems across regions. New in 2022, incremental and non-disruptive snapshot restores with a remount of clients is now available.
Data management capabilities primarily include the ability to create copies of the data through snapshots; for example, with development operations (DevOps) use cases where jobs or functions are executed against data copies instead of the primary data set. It is expected that, in the future, WEKA will improve some of its capabilities in this area, notably around metadata tagging, querying, and so on. In addition, some of these capabilities could be offloaded to third-party engines; for example, by running data analysis from ML-based engines that can then augment the metadata of datasets residing on the WEKA Data Platform.
API integrations (including serverless) are possible through WEKA APIs. On the analytics side, WEKA’s monitoring platform captures and provides telemetry data about the environment, with the ability to deep dive into certain metrics all the way down to file system calls. WEKA also supplies a proactive cloud-based monitoring service called WEKA Home that collects telemetry data (events and statistics) and provides proactive support in case of detected issues.
The solution supports advanced security capabilities with WEKA Snap-To-Object, which also allows data to be protected with snapshots sent to immutable object stores and thus safeguards data against ransomware attacks. The solution also supports log forwarding to inhibit tampering by a malicious actor or a rogue administrator. Additional security features include encryption in-flight between WEKA clients and the core cluster, plus other features to ensure unapproved clients can’t mount data stored in the WEKA platform.
The WEKA platform supports edge aggregation deployments, which enable smaller-footprint clusters to be deployed alongside embedded internet of things (IoT) devices. Finally, organizations can deploy the WEKA platform directly on AWS from the marketplace and also run a certified deployment on AWS Outposts, thanks to WEKA’s ISV Accelerate partnership with AWS. WEKA also supports a BYOL model for GCP and OCI.
Although versatile, the solution is particularly useful in demanding environments that require low latency, high performance, and cloud scalability, such as AI/ML, life sciences, financial trading, HPC, media rendering and visual effects, electronic design and automation, and engineering DevOps.
Strengths: WEKA has architected a robust and seamlessly scalable high-performance storage solution with comprehensive deployment options, automated tiering, and a rich set of services via a single platform that eliminates the need to copy data through various dedicated storage tiers. Its single namespace encompassing file and object storage reduces infrastructure sprawl and complexity to the benefit of users and organizations alike.
Challenges: WEKA’s strong focus on performance and scalability eclipses the growing need for data analysis within organizations. The company has acknowledged this as an area for improvement and is working on enhancing its capabilities across most of the evaluated key criteria thanks to a dynamic roadmap.
Zadara Edge Cloud Services is an interesting solution that is aimed primarily at MSPs and some larger enterprises. The solution is available globally through 300 cloud partners on six continents and consists of an elastic infrastructure layer comprising compute, networking, and storage capabilities for which cost is based on usage. The storage offering, named zStorage, consists of one or more virtual private storage arrays (VSPAs) that can be deployed on solid state drive (SSD), hard disk drive (HDD), and hybrid media types. VSPAs are able to serve block, file, and object storage simultaneously. Various VPSAs can be created, each with its own engine type (which dictates performance) and its own set of drives, including spares.
Global namespaces are supported for file-based storage, with a capacity limit of up to 0.5 PB, after which a new namespace must be created. Although customers can, in theory, use a third-party object storage gateway to store files on the Zadara object store tier (and therefore circumvent this limitation), there is no native multiprotocol access capability.
The solution offers thinly provisioned snapshots as well as cloning capabilities, which can be local or remote. The snapshot-based asynchronous remote mirroring feature enables replication to a different pool within the same VPSA, to a different local or remote VSPA, or even to a different cloud provider. The replicated data is encrypted and compressed before being transferred to the destination. The solution also allows for many-to-many relationships, which enables cross-VPSA replication in active-active replication scenarios. Cloning capabilities are also available remotely and can be used for rapid migration of volumes between VPSAs because the data can be made available instantly (although a dependency with the source data remains until all of the data has been copied in the background).
Native backup and restore capabilities leverage object storage integration with AWS S3, Google Cloud Storage, Zadara VPSA Object Storage, and other S3-compatible object stores. Object storage can be used by Zadara for audit and data retention purposes. Zadara supports AWS Direct Connect as well as Azure ExpressRoute, both of which allow a single volume to be made available to workloads residing in multiple public clouds, enabling the use of a single dataset across multiple locations or clouds. When deployed on flash, zStorage supports an auto-tiering capability that recognizes hot data and places it on the flash/high-performance tier, while less frequently accessed data is tiered either on lower-cost hard disks or S3-compatible object storage.
Zadara File Lifecycle Management services provide data management and analytics capabilities to the solution, including growth trends (overall and by file type), capacity utilization across several metrics, and usage statistics by owners and groups. Those reports allow organizations to identify unused data as well as orphaned data (data without an owner assigned to it).
Zadara natively supports access auditing for files accessed through NFS and SMB. Audited data is segregated or made accessible only by administrators and can be uploaded to a remote S3 repository for long-term retention. Zadara’s file snapshots are read-only, but there are currently no snapshot lock-retention capabilities. Accidental or malicious snapshot deletion can, however, be partially prevented through the use of strict role-based access controls. Although no native ransomware protection capabilities exist, Zadara partners with Veeam to provide such protection through Veeam’s Scale-Out Backup Repository immutability features.
Zadara’s Federated Edge Program allows MSPs to rapidly deploy Zadara at the edge, enabling MSPs to provision a turnkey infrastructure closer to their customers while adhering to the Zadara Cloud operating model. Zadara provides the necessary hardware and software, and revenues are shared between Zadara and Federated Edge partners.
Finally, Zadara is working on several improvements. One of them should enhance its monitoring through the use of an ML-based engine that analyzes and parses alert patterns before informing the administrators. Another planned improvement will bring cost analysis and cost optimization recommendation to its File Lifecycle feature.
Strengths: Zadara Edge Cloud Services delivers comprehensive file storage capabilities via a platform with rich compute, storage, and networking support. Remote cloning and mirroring capabilities provide a seamless experience complemented by object storage tiering and multicloud support. Analytics provides multidimensional information about trends, capacity, and user statistics. File auditing functionalities with long-term retention can be useful for legal purposes.
Challenges: The 0.5 PB capacity limit on namespaces can become an operational hurdle for organizations with many teams working on very large datasets, such as cloud-based AI, HPC, and big data workloads.
6. Analysts’ Take
The cloud file system market is interesting. It might seem obvious to many that, thanks to their dominance and massive market share, public cloud providers would provide the most comprehensive solutions. Nothing could be further from the truth.
Public cloud provider solutions range from simpler offerings (Oracle, Google) to a broad field of more complex cloud file system solutions (Azure, Amazon), with some overlaps and niche use cases. The primary concern with these is that, with a few notable exceptions (such as Tier 1 partnerships with vendors such as NetApp), these public cloud solutions typically need additional adjustments and improvements to meet enterprise requirements.
In the context of the GigaOm report “Key Criteria for Evaluating File-Based Cloud Storage,” these solutions show gaps in key criteria coverage around management, monitoring, and advanced security implementation. To their credit, public cloud solutions generally offer seamless scalability and straightforward pay-as-you-go consumption options, as well as coverage for more use cases and meeting the needs of different personas in the enterprise.
In contrast, storage vendors specializing in cloud file systems may have more narrowly focused solutions but with better enterprise-grade capabilities and more complete feature sets. Among these, many solutions can run on public clouds and offer cloud-like consumption models while delivering compelling value and the ability to operate seamlessly using a hybrid cloud model.
As organizations put more emphasis on shifting and redesigning file-based, performance-sensitive workloads to the cloud, demand for cloud file systems will continue to grow. Specialized storage vendors are currently in a better position to meet that demand, thanks to more complete feature sets oriented toward enterprise requirements, while public cloud providers lag behind, except those who partner with specialized storage vendors and have a dedicated offering.
Cloud file systems solutions are also maturing in terms of capabilities, leading to a gradual decrease in innovation (except for a few vendors). This year, a majority of vendors are focusing on either performance and stability improvements or architectural changes. The latter can be important to unlock the future potential of a solution and enable further roadmap developments. However, concentrating only on under-the-hood changes with long-term roadmaps can lead to a loss of competitive advantage unless the focus is very narrowly on solving one specific problem.
7. About Max MortillaroMax Mortillaro
Max Mortillaro is an independent industry analyst with a focus on storage, multi-cloud & hybrid cloud, data management, and data protection.
Max carries over 20 years of experience in the IT industry, having worked for organizations across various verticals such as the French Ministry of Foreign Affairs, HSBC, Dimension Data, and Novartis to cite the most prominent ones. Max remains a technology practitioner at heart and currently provides technological advice and management support, driving the qualification and release to production of new IT infrastructure initiatives in the heavily regulated pharmaceutical sector.
Besides publishing content/research on the TECHunplugged.io blog, Gestalt IT, Amazic World, and other outlets, Max is also regularly participating in podcasts or discussion panels. He has been a long-time Tech Field Day Alumni, former VMUG leader, and active member of the IT infrastructure community. He has also continuously been running his own technology blog kamshin.com since 2008, where his passion for content creation started.
Max is an advocate for online security, privacy, encryption, and digital rights. When not working on projects or creating content, Max loves to spend time with his wife and two sons, either busy cooking delicious meals or trekking/mountain biking.
8. About Arjan TimmermanArjan Timmerman
Arjan Timmerman is an independent industry analyst and consultant with a focus on helping enterprises on their road to the cloud (multi/hybrid and on-prem), data management, storage, data protection, network, and security. Arjan has over 23 years of experience in the IT industry and worked for organizations across various verticals such as the Shared Service Center for the Dutch Government, ASML, NXP, Euroclear, and the European Patent Office to just name a few.
Growing up as an engineer and utilizing that knowledge, Arjan currently provides both technical and business architectural insight and management advice by creating High-Level and Low-Level Architecture advice and documentation. As a blogger and analyst at TECHunplugged.io blog, Gestalt IT, Amazic World, and other outlets, Arjan is also from time to time participating in podcasts, discussion panels, webinars, and videos. Starting at Storage Field Day 1 Arjan is a long-time Tech Field Day Alumni, former NLVMUG leader, and active member of multiple communities such as Tech Field Day and vExpert.
Arjan is a tech geek and even more important he loves to spend time with his wife Willy, his daughters Rhodé and Loïs and his son Thomas sharing precious memories on this amazing planet.
9. About GigaOm
GigaOm provides technical, operational, and business advice for IT’s strategic digital enterprise and business initiatives. Enterprise business leaders, CIOs, and technology organizations partner with GigaOm for practical, actionable, strategic, and visionary advice for modernizing and transforming their business. GigaOm’s advice empowers enterprises to successfully compete in an increasingly complicated business atmosphere that requires a solid understanding of constantly changing customer demands.
GigaOm works directly with enterprises both inside and outside of the IT organization to apply proven research and methodologies designed to avoid pitfalls and roadblocks while balancing risk and innovation. Research methodologies include but are not limited to adoption and benchmarking surveys, use cases, interviews, ROI/TCO, market landscapes, strategic trends, and technical benchmarks. Our analysts possess 20+ years of experience advising a spectrum of clients from early adopters to mainstream enterprises.
GigaOm’s perspective is that of the unbiased enterprise practitioner. Through this perspective, GigaOm connects with engaged and loyal subscribers on a deep and meaningful level.