This GigaOm Research Reprint Expires: Mar 15, 2023

GigaOm Radar for Unstructured Data Management: Business-Focused Solutionsv3.0

1. Summary

Exponential data growth is no longer news, with unstructured data already accounting for 80% to 90% of the total data stored in enterprise storage systems. Human-generated data is now joined by machine-generated data that is growing even more quickly and needs infrastructures with different characteristics.

Managing storage capacity with efficiency has become more accessible, less expensive, and reasonably priced, thanks to scale-out storage systems for files and objects. At the same time, the cloud offers the opportunity to expand the number of options available in terms of performance, capacity, and cold data archiving. The proliferation of data silos presents an issue, though, and it’s a trend that is accelerating alarmingly because of new multicloud IT strategies and edge computing.

Moreover, in this multicloud scenario, new demanding regulations like GDPR, CCPA, PIPL, and others require a different approach. Data protection and management processes are crucial for compliance with ever-changing business requirements, laws, and organization policies.

Furthermore, data sovereignty regulations impose restrictions on physical location of data and data flows, requiring organizations to adequately segment access to resources by location, and identify and geo-fence impacted datasets. Solutions that support these regulatory frameworks and are capable of handling data privacy requests—such as Data Subject Access Requests (DSARs), identifying and classifying personally identifiable information (PII), or even taking further action with right to be forgotten (RtbF) and right of erasure (RoE) requests—can radically simplify compliance operations.

We’re coming to a point where it may seem that storing data safely and for a long time does not benefit an organization and can quickly become a liability.

On the other hand, with the right processes and tools, businesses can do more with their data than ever before, mining it for hidden insights and capturing incredible value in the process, transforming it from a liability into an asset. Examples of this transformation are now common across all industries, with enterprises of all sizes reusing old data for new purposes, thanks to technologies and computing power that weren’t available only a few years ago.

With the right unstructured data management solutions, it’s possible to:

  • Understand what data is stored in the storage systems, no matter how complex and dispersed.
  • Build a strategy to intervene on costs while increasing the return on investment (ROI) for data storage.

Depending on the approach chosen by the user, there are several potential benefits to be derived from building and developing a data management strategy for unstructured data, including better security and compliance, improved services for end-users, cost reduction, and data reusability. The right data management strategy enables organizations to mitigate risk and make the most of opportunities.

2. Market Categories and Deployment Types

To better understand the market and vendor positioning (Table 1), we assess how well solutions for Unstructured Data Management are positioned to serve specific market segments. This radar covers business-focused solutions and also provides insights as to whether evaluated solutions can meet infrastructure-focused solution requirements. Infrastructure-focused solutions will be covered in a sister radar; however, some solutions overlap and may appear in both radars, although with different placements and evaluations.

  • Infrastructure focus: Solutions designed to target data management at the infrastructure level and metadata, including automatic tiering and basic information lifecycle management, data copy management, analytics, index, and search.
  • Business focus: Solutions designed to solve business-related problems, including compliance, security, data governance, big data analytics, e-discovery, and so on.

In addition, we recognize two possible deployment models for solutions in this report: user-managed and software as a service (SaaS).

  • User-managed solution: Usually installed and run on-premises, these products often work well in hybrid cloud environments too.
  • SaaS solution: Based on a cloud backend, and usually provided as a service, they work in a manner distinct from the products in the on-premises category. This type of solution is traditionally optimized more for hybrid and multicloud, and mobile/edge use cases.

Table 1. Vendor Positioning

Market Segment

Deployment Model

Infrastructure Focus Business Focus User-Managed SaaS
Aparavi
Cohesity
CTERA
Data Dynamics
Druva
Hitachi Vantara
IBM
Nasuni
NetApp
Varonis
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

3. Key Criteria Comparison

Building on the findings from the GigaOm report, “Key Criteria for Evaluating Unstructured Data Management Solutions,” Table 2 summarizes how each vendor included in this research performs in the areas that we consider differentiating and critical in this sector. Table 3 follows this summary with insight into each product’s evaluation metrics—the top-line characteristics that define the impact each will have on the organization. The objective is to give the reader a snapshot of the technical capabilities of available solutions, define the perimeter of the market landscape, and gauge the potential impact on the business.

Table 2. Key Criteria Comparison

Key Criteria

Metadata Analytics Global Content & Search Big Data Analytics Compliance & Security Marketplace AI/ML
Aparavi 3 2 2 2 1 0
Cohesity 2 3 3 2 3 2
CTERA 2 3 1 3 1 0
Data Dynamics 3 3 3 2 1 2
Druva 3 3 1 3 3 2
Hitachi Vantara 3 2 3 3 2 0
IBM 3 3 3 2 2 0
Nasuni 2 2 2 2 2 0
NetApp 3 3 2 3 2 2
Varonis 3 2 2 3 3 3
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

Table 3. Evaluation Metrics Comparison

Evaluation Metrics

Architecture Scalability Flexibility Performance & Efficiency Manageability & Ease of Use Ecosystem
Aparavi 3 3 2 2 3 1
Cohesity 3 3 2 3 3 3
CTERA 3 3 2 3 3 2
Data Dynamics 3 3 3 2 3 2
Druva 3 3 2 3 3 3
Hitachi Vantara 3 3 3 3 2 3
IBM 2 3 3 3 3 3
Nasuni 3 3 2 3 3 2
NetApp 3 3 3 3 3 2
Varonis 2 3 3 3 2 3
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

By combining the information provided in the tables above, the reader can develop a clear understanding of the technical solutions available in the market.

4. GigaOm Radar

This report synthesizes the analysis of key criteria and their impact on evaluation metrics to inform the GigaOm Radar graphic in Figure 1. The resulting chart is a forward-looking perspective on all the vendors in this report, based on their products’ technical capabilities and feature sets.

The GigaOm Radar plots vendor solutions across a series of concentric rings, with those set closer to the center judged to be of higher overall value. The chart characterizes each vendor on two axes—Maturity versus Innovation, and Feature Play versus Platform Play—while providing an arrow that projects each solution’s evolution over the coming 12 to 18 months.

Figure 1. GigaOm Radar for Business-Focused Unstructured Data Management Solutions

As you can see in the Radar chart in Figure 1, three groups can be identified. In the first one, which focuses on innovative platform-based approaches, Varonis leads the pack with a compelling solution that combines unstructured data management capabilities with a strong security and risk-based approach that is complemented by artificial intelligence and machine learning (AI/ML). The solution combines data classification with the ability to detect exposure and suspicious access to sensitive data, regulatory compliance handling, and advanced security capabilities, including AI-assisted permission adjustments. It also offers a rich ecosystem consisting of third parties integrations, SaaS application support, and direct integration with some storage solutions.

NetApp offers a formidable set of business capabilities with unmatched regulatory compliance management (including handling of DSARs, advanced PII identification, and data classification), a comprehensive data source support, a contextual AI engine that automatically categorizes data based on context understanding, and support of NetApp’s rich on-premises and cloud-based ecosystem. Cohesity offers a formidable platform with comprehensive, end-to-end unstructured data management capabilities under a single umbrella, offering a one-stop-shop to organizations seeking the most complete coverage, including privacy compliance.

Data Dynamics excels in multiple areas, with a comprehensive and unified solution that combines broad vendor support, petabyte-scale data management, and policy-based data copy and migration scenarios with strong data analytics and classification capabilities. Druva has a very interesting approach characterized by providing data compliance, search, and analytics capabilities on top of its SaaS-based data protection platform. It includes a broad set of features such as AI/ML-based anomaly and ransomware detection, encompassing potential ransomware activity detection, as well as a growing set of integrations with industry-acclaimed security and analytics platforms.

The second group consists of mature and proven solutions offered by Hitachi Vantara and IBM. Although both are established players with years of experience, the arrow direction clearly indicates a steady transition towards innovative approaches. IBM Spectrum Discover shines by providing a complete feature set from a business standpoint, a modern and easy-to-use management interface, combined with a simple deployment model, and a very fast time to value. In contrast, Hitachi Vantara proposes an outstanding and rich platform that strongly focuses on large enterprise use cases and supports the creation of complex, end-to-end data processing workflows with multiple sequential actions on the datasets.

In the third group, three vendors offer interesting positionings and outcomes. Nasuni offers a SaaS distributed file storage solution with strong built-in security and auditing capabilities and an advanced analytics connector that allows large organizations to leverage Nasuni data and perform comprehensive in-house data intelligence extraction. CTERA, another SaaS distributed file storage solution, also offers good capabilities but, most importantly, is catching up quickly, with major improvements planned in 2022. While it currently relies on Varonis for some security-focused features, CTERA will soon offer built-in capabilities, including AI/ML-based anomaly detection. Finally, Aparavi proposes the most complete data classification platform with over 150 built-in classification policies. The solution also includes other features which are being actively developed and improved.

Inside the GigaOm Radar

The GigaOm Radar weighs each vendor’s execution, roadmap, and ability to innovate to plot solutions along two axes, each set as opposing pairs. On the Y axis, Maturity recognizes solution stability, strength of ecosystem, and a conservative stance, while Innovation highlights technical innovation and a more aggressive approach. On the X axis, Feature Play connotes a narrow focus on niche or cutting-edge functionality, while Platform Play displays a broader platform focus and commitment to a comprehensive feature set.

The closer to center a solution sits, the better its execution and value, with top performers occupying the inner Leaders circle. The centermost circle is almost always empty, reserved for highly mature and consolidated markets that lack space for further innovation.

The GigaOm Radar offers a forward-looking assessment, plotting the current and projected position of each solution over a 12- to 18-month window. Arrows indicate travel based on strategy and pace of innovation, with vendors designated as Forward Movers, Fast Movers, or Outperformers based on their rate of progression.

Note that the Radar excludes vendor market share as a metric. The focus is on forward-looking analysis that emphasizes the value of innovation and differentiation over incumbent market position.

5. Vendor Insights

Aparavi

Aparavi offers a SaaS solution for unstructured data management. The product provides a sophisticated data collection mechanism and works across standard NFS and SMB file storage repositories as well as cloud drives (Dropbox, OneDrive, etc.), and specific applications such as Microsoft 365 (MS Teams, e-mail, for example) to capture information and drive a complete view of the infrastructure.

The solution supports extended metadata management and indexing, with over 150 pre-built policies to classify global data types. All metadata fields are collected and subsequently searchable across all locations and storage types, along with the ability to create exportable reports. The product comes with a customizable and easy-to-use SaaS-based user interface; however, the data remains on a customer-controlled data aggregator, usually located on premises.

Data can be searched, then copied, or moved to different targets, on premises or in the cloud in two ways: manually or automatically, through smart policies. These processes can be used, for example, to identify inactive data and move it to colder storage or to enforce data sovereignty laws. Besides Aparavi’s dashboard, those actions also can be executed programmatically through the solution’s REST API.

Aparavi provides a solid foundation for compliance and data privacy with its broad set of built-in data classification policies (currently, around 150 of them are available). However, the burden of processing and cleaning the data remains on the administrators. From a security standpoint, the solution is multi-tenant and supports role-based access control (RBAC).

Besides the available REST API, Aparavi is developing a set of data connectors to provide better interoperability between its product and third-party solutions.

Strengths: Very easy to use, it’s a multi-tenant solution aimed at finding content across distributed data stores in the data center, at the endpoint, and on the cloud. Aparavi offers a complete set of templates for most common types of information that automate and simplify classification while improving the entire process.

Challenges: The solution is being actively developed and has designated multiple areas for improvement, including management of data privacy and compliance queries and adding connectors to third-party tools.

Cohesity

Cohesity offers an end-to-end solution designed to tackle data and apps challenges in modern enterprises. It is available both as a software-defined, scale-out solution deployed on physical or virtual servers and as a service from major cloud providers, including Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).

Users can consolidate disparate workloads, including backup, archiving, file shares, object stores, test and dev, and analytics onto a single software-defined platform. This approach simplifies the storage, protection, and management of massive volumes of data. On top of its efficient web-scale distributed file system and integrated data protection, Cohesity offers a growing list of capabilities to address both infrastructure-focused and business-oriented applications.

The Helios management interface provides a unified view of objects and files stored across locations and offers a set of actionable insights such as backup and restore, cloning for test and dev use cases, reporting, and more. Helios also supports the deployment of applications such as Insight and Data Classification. These go far beyond standard metadata search to support real content and context-based search and discovery, all within the unified Helios management interface. When combined with another native application branded Spotlight, organizations can use Insight to analyze user activity and search for unstructured data. Metadata tagging already happens on backup snapshots for data lifecycle management and, combined with Data Classification, should be extended in a future release to cover more use cases.

Cohesity natively supports data management operations, including test and dev clones. In addition, the solution also supports native cloud integration for tiering, archiving, and replication to public cloud storage services such as Google Cloud Storage Nearline, Microsoft Azure, Amazon S3, and Glacier. Organizations can then leverage cloud-based analytics services on those datasets.

The platform includes comprehensive data security and compliance functions, including anti-malware and ransomware, vulnerability assessments, data masking, data expunge, classification, and analytics. These are available either as built-in capabilities, through containerized apps, or via API-based integrations. It also integrates with SIEM and SOAR platforms.

Cohesity enables organizations to analyze content for a growing number of use cases, taking advantage of the previously mentioned native apps or pre-configured, easy-to-use third-party apps in Cohesity’s marketplace. The marketplace already offers solutions for global search, e-discovery, log analysis, ransomware, vulnerability scanning, threat detection and mitigation, advanced reporting, and compliance.

Finally, AI/ML capabilities in Helios support two features: first, ransomware detection and alerts enabled by identifying encryption-based ransomware attacks; and second, enhancing capacity and workload modeling by offering improved predictions of the impact of workload changes on capacity utilization, both globally or by specific areas such as geo, site, or cluster.

Strengths: Cohesity offers a complete end-to-end solution for data protection, consolidation, and management, with a centralized user interface, great overall efficiency, and total cost of ownership (TCO). The marketplace is showing strong potential, expanding the possibilities of this platform even further.

Challenges: The solution, designed for large and distributed enterprise deployments, has a good ROI, but the initial investment may be too steep for small organizations.

CTERA

CTERA proposes a cloud-based SaaS distributed file storage solution incorporating unstructured data management and analytics capabilities. These are delivered through CTERA Insight, a data visualization service that analyzes file assets by type, size, and usage trends; and presents the information through a well-organized, customizable user interface. Users can drill down to understand which tenants and locations see data growth patterns and pinpoint the related groups, individuals, and data types.

Besides data insights, this interface also provides real-time usage, health, and monitoring capabilities, encompassing central components and edge appliances. CTERA also implements a comprehensive RBAC system that supports folder and user-based tagging to grant dynamic data access, including geographic or department-based access.

The solution allows enterprises to design their global file system in compliance with data sovereignty regulations through CTERA Zones. With Zones, the global file system can be segmented into multiple data units to prevent data leakage between zones. Users are prevented from accessing any share in the global namespace that does not belong to their defined Zone. Shares can be shared between multiple zones. Administrators can define zones based on the content required by each department and associate a department edge filer to each Zone, ensuring that users only have access to relevant data while restricting access to sensitive data across the organization. Another product feature is deploying the solution across multiple cloud providers and performing transparent policy-based data movement between clouds for data locality or financial reasons without impacting front-end access to the data.

CTERA provides its own security layer with audit trails, authentication mechanisms (including two-factor authentication), ransomware protection through immutable snapshots, antivirus scanning, and granular versioning. In addition, CTERA offers integrations with Varonis to deliver capabilities in multiple areas, including data classification (regulated, sensitive, and critical data), security analytics, deep data context and audit trails, security recommendations, and more.

CTERA’s SDK allows API integrations which allow microservices to perform data management related tasks. In 2022, S3 connectors will be added as well.

In 2022, CTERA will add an AI/ML-based anomaly detection engine to Insight, providing anomaly detection capabilities including ransomware alerts and the ability to terminate user accounts tied to a potential ransomware attack.

Strengths: CTERA combines proprietary data insights, advanced compliance, and security features, some of which are delivered by Varonis. The solution has its own security layer, which will be comprehensively improved in 2022. This allows CTERA to deliver comprehensive unstructured data management capabilities with a promising near-term roadmap.

Challenges: Big data analytics capabilities are currently lacking and represent a potential improvement area.

Data Dynamics

Data Dynamics offers a complete unstructured data management solution built around three products: StorageX (data location optimization and enterprise data migration), Insight AnalytiX (privacy risk classification), and ControlX (data exposure risks remediation). StorageX allows organizations to manage unstructured data at petabyte scale across storage systems and locations including cloud-based storage, with features such as data discovery, classification, and augmentation; it supports a broad set of data movement options and policy-based management capabilities.

StorageX analyzes data across storage systems and performs automated metadata tagging and metadata augmentation based on various criteria: tags can be added automatically based on criteria such as file type, file content, or file name and folder expressions, but alternatively, administrators can define and apply custom policies.

StorageX is complemented by Insight AnalytiX, a privacy risk classification solution that recognizes files containing PII across more than 200 known file types. Privacy Risk Classifier currently recognizes 49 different types of PII; the solution combines pattern recognition technology, keyword recognition, and artificial intelligence. It works in coordination with StorageX and fetches dataset information from StorageX by building advanced multi-level logical expressions and a combination of logical operators, then proceeds to stream and analyze data to identify both PII and potentially risky content. Once analysis is complete, the solution offers templates to view the analyzed data and allows users to download reports in various formats. The report is powered by deep analytics (both descriptive and diagnostic) to help enterprises get a clear understanding of the risk that exists and an easy means of quantifying it. Both StorageX and Insight AnalytiX support RBAC, boast an intuitive user interface and support full text-search functionality.

ControlX integrated with Insight AnalytiX gives enterprises the ability to proactively mitigate risk and provide scalable security remediation, with the ability to quarantine at-risk datasets and re-permission files intelligently, as well as create an immutable audit trail backed by blockchain. ControlX’s file control operations can be integrated into enterprises’ existing environment service management, data management, and governance workflow automation via RESTful APIs.

The solution is policy-based and supports multiple data copy and data movement scenarios. Datasets can be used to create data lakes for big data analytics applications; age and last accessed criteria can also be used as the basis for data tiering policies, which can automate data placement into cheaper storage tiers.

Strengths: Data Dynamics offers a robust, policy-based, unstructured data management platform that embeds outstanding metadata augmentation capabilities, broad storage solution coverage, outstanding data movement/tiering options, and a solid data analytics and privacy risk classification solution. ControlX provides a quarantine option for moving files to a specific location and isolating them. The air gap provided by Quarantine helps prevent ransomware attacks on critical files while providing immediate protection.

Challenges: Insight AnalytiX offers interesting capabilities for PII detection. Data Dynamics has an opportunity to extend its feature set further and include actionable insights on the data (such as handling of privacy law access and deletion requests).

Druva

Druva Cloud Platform provides centralized protection and management across end-user data sources and is offered as SaaS. By unifying distributed data across endpoints, data center workloads, AWS workloads, and cloud applications, organizations have a single place to manage backup and recovery, archiving, security threats, legal hold, and compliance monitoring. This unification minimizes data risks and ensures continuity for employee productivity.

Druva provides advanced metadata analytics based on unstructured data by analyzing data pipelines consisting of hundreds of million events per hour and more than 400 thousand queries per hour. Data is collected from backup events and then run through big data analytics pipelines to make it queryable. Currently, Druva provides dashboards showing summary level information, federated search capabilities (including e-discovery and Legal Hold queries) and storage insights and recommendations.

The solution offers an easy-to-use and feature-rich management console that provides useful metrics and statistics. Druva implements Federated Search, a powerful search engine that enables administrative, security, legal, and forensic teams with enhanced capabilities to conduct global metadata searches across workloads, including Microsoft 365, Salesforce, Google Workspace, and endpoint devices. Various attributes can be used to search, including email-related information.

Druva doesn’t currently offer big data analytics capabilities (in the sense of allowing data copy actions for the creation of data lakes) however the company internally uses big data analytics with ETL pipelines to build datasets for its AI/ML solutions and for monitoring its own cloud services.

Druva’s SaaS platform offers a broad set of compliance and security features. As previously mentioned, the solution supports compliance queries related to e-discovery and legal hold. In addition, Druva monitors unusual data activity to detect potential ransomware attacks and implements an accelerated ransomware recovery feature that performs quarantine and orchestrated recovery, while it allows the recovery of curated snapshots. Security-related features include RBAC, strong user authentication, multi-factor authentication, and multiple security certifications. It can provide access insights on data usage, inform of potential anomalies, and integrate with a rich ecosystem of security, monitoring, and logging solutions.

While Druva has no marketplace of its own, the solution provides a full REST API, which enables integration with industry-acclaimed third party solutions in multiple areas such as authentication and ITSM (Okta, Splunk, ServiceNow, ADFS, GitHub), e-discovery (Disco, Access Data, OpenText, Exterro), security (PaloAlto Networks, FireEye, Splunk).

Druva considers AI and ML essential capabilities for improving its solutions and differentiating against competitors. Currently, AI/ML are used to enhance customer experience with unusual behavior detection and IOC scans, provide content-based recommendations such as file-level storage insight and advanced privacy services, and to enhance the underlying metadata. Some of the product capabilities enhanced by AI/ML include ransomware anomaly detection, storage consumption forecasting, and data privacy and compliance features.

Strengths: Druva integrates data governance and management tools in a modern SaaS-based data protection solution. It’s easy to deploy and manage at scale, with a simple licensing model, good TCO, and quick ROI. Druva is also accelerating its data privacy and compliance capabilities, with more improvements expected soon.

Challenges: While Druva sees consistent improvements to its platform, the dependency on SaaS data protection may be an adoption barrier for organizations looking for a standalone UDM solution.

Hitachi Vantara

Hitachi Vantara has a comprehensive data management strategy for IoT, big data, and unstructured data. When it comes to unstructured data management, Hitachi Vantara offers a broad solution portfolio, including Hitachi Ops Center Protector aimed at data protection and copy management, Hitachi Content Platform (HCP) object store, and Hitachi Content Intelligence (HCI).

The latter offers the necessary features for optimizing and augmenting data and metadata, making it more accessible for further processing through tools like Pentaho (data analytics suite) and Lumada Data Catalog. One of the key features of HCI is the ability to define policies and actions based on standard and custom object metadata: policies can be related to a variety of actions such as data placement (protection, replication, cost-based tiering, and delivery to processing location), data transformation (anonymization, format conversion, data processing), security, and data classification.

HCI supports the creation of simple or complex end-to-end workflows that work on-premises or in the cloud: a new object or file can be augmented automatically with application-supplied metadata, scanned for various criteria (for example, identifying PII), and subsequently augmented with classification and compliance-related metadata. It also offers multiple capabilities related to compliance and governance: besides the ability to detect PII, HCI can be used for retention management and legal hold purposes; it supports geo-fencing, GDPR, HIPAA, and other regulatory frameworks. These are supported by data disposal workflows, including a built-in system to process RTBF requests, the ability to automatically delete data after retention periods have elapsed, and custom audit logging of disposition activities.

Strengths: This solution framework can be optimized for several use cases, including indexing and search, data governance and compliance, auditing, e-discovery, ransomware, and detection of other security threats. Hitachi Ops Center Protector can be used with a wide variety of sources, including non Hitachi storage systems, while HCP and Pentaho are designed for high scalability and can be deployed in hybrid cloud environments.

Challenges: Hitachi’s ecosystem is designed for large organizations and can be an expensive and complicated option for smaller ones.

IBM Spectrum Discover

IBM Spectrum Discover is IBM’s unstructured data management solution. The solution is available as a virtual appliance and provides data insights for petabyte-scale unstructured storage. It connects with IBM Cloud Object Storage and IBM Spectrum Scale to analyze, consolidate, and index file and object metadata and also supports backup and archive (IBM Spectrum Archive, IBM Spectrum Protect), IBM Elastic Storage System, and non-IBM storage systems and data sources including Dell EMC Isilon, AWS S3, NetApp, Windows SMB, and Red Hat Ceph storage.

The architecture is based on Apache Kafka to ingest metadata records from source storage systems into the IBM Spectrum Discover cluster; scans can be scheduled or performed on demand, but changes to a file or object will also trigger a metadata update. The solution also supports policy-based metadata augmentation, which can be performed either through policies, manual tagging, or user-definable keywords. Metadata tagging can be used as one of the criteria for data classification and searches, as well as content inspection.

The solution integrates fast and efficient search capabilities with pre-defined criteria or creating queries through an SQL-like syntax. Searches and policy management can be conducted through the Spectrum Discover dashboard. This management interface supports RBAC and enables users to perform various activities including report generation.

Spectrum Discover can be used for data optimization: it supports duplicate data removal and trivial data removal as well as tiering and archiving. When used with IBM Spectrum Protect, it can provide recommendations about datasets that are relevant for archival storage. In addition, data discovery capabilities can be used to discover and identify datasets, which can then be used for big data analytics and AI/ML use cases. For example, Spectrum Discover can be used to orchestrate ML/DL and MapReduce processes (with IBM Platform Symphony).

IBM Spectrum Discover Application Catalog allows users to discover, deploy, and manage community-supported third-party agents without creating any code. It also includes an API interface to support customer-developed and/or third-party software. It also offers single-click integration with IBM Watson Knowledge Catalog, which can be used to enhance regulatory compliance activities, management of data lakes, and self-service consumption of high-quality data.

Strengths: IBM Spectrum Discover is a versatile platform that offers fast data ingest and search, policy-based data augmentation, classification, and inspection. It also includes policy-based tiering and archiving, as well as support for big data analytics and AI/ML processes.

Challenges: The solution heavily focuses on business capabilities; some of the security and monitoring features, such as anomaly detection (including ransomware detection) and ransomware protection, are absent.

Nasuni

Nasuni offers a SaaS solution for enterprise file services, with an object-based global file system as its main engine and many familiar file interfaces, including SMB and NFS. It is integrated with all major cloud providers and works with on-premises S3-compatible object stores.

Nasuni offers an intuitive and easy-to-use management console for all core management operations across the global file system and embeds RBAC with multiple data access policies and granular permission setup, even for migrated data. For organizations seeking in-depth insights such as enterprise search, data privacy, and compliance, the Nasuni Analytics Connector component can be used to make Nasuni data available to cloud-based analytics services on Azure and AWS (such as Amazon Macie) through data copy. The connector also allows data residing on Nasuni to be queried through standard SQL queries by cloud-based systems such as Amazon Athena and Azure Data Lake without loading data on a database beforehand.

Security capabilities include file system auditing and logging for operations on files. Auditing events such as create, delete, rename, and security assist customers in identifying and recovering from ransomware attacks. As an alternative to its own auditing options, Nasuni also supports auditing by tools such as Varonis and StealthBits.

Nasuni Continuous File Versioning allows a nearly infinite number of snapshots as immutable copies, so customers can recover easily in minutes to any point in time to mitigate a ransomware attack quickly. The solution also supports compliance with data sovereignty regulations by adequately deploying Nasuni Edge Appliances in data centers or locations that meet such requirements.

Besides its REST API, Nasuni has begun looking at strategic partnerships for marketplace integrations with third-party applications. Currently, Nasuni integrates with Workato, and a stronger focus on partnerships is expected during 2022. Although Nasuni doesn’t embed AI/ML technology, the company leverages its Nasuni Analytics Connector to provide customers with a direct path to take advantage of cloud-based AI/ML analytics solutions.

Strengths: Nasuni provides a compelling cloud-based global SaaS file system with strong security and audit capabilities. The Nasuni Analytics Connector allows organizations to easily leverage best-in-breed cloud analytics solutions from AWS and Azure to run their own advanced analytics on unstructured data without requiring extensive AI or Machine Learning expertise.

Challenges: Besides built-in analytics capabilities, Nasuni primarily relies on its Analytics Connector to offload advanced analytics to cloud providers. This can be a limitation for organizations that require their analytics platform to remain entirely on-premises for specific reasons (such as compliance or regulatory).

NetApp

NetApp offers Cloud Data Sense, a comprehensive, predominantly business-oriented unstructured data management solution that covers infrastructure-based needs. It performs several types of analysis on storage systems (NetApp and non-NetApp) and their content (including files, objects, and databases), providing insightful dashboards, reports, and guidance for several roles in the organization.

Based on ElasticSearch, it centrally manages all storage repositories and can scale to hundreds of petabytes. The solution is almost completely platform agnostic and can be set up on either cloud hyperscalers or on-premises. Data can reside on a single server or a cluster of servers, either in the cloud (customer-operated servers) or on-premises, putting organizations fully in control of their data.

Metadata analytics features include full data mapping, data insights and control over redundant and stale data, the ability to perform advanced data investigation through comprehensive search options, and the possibility of mapping PII across storage systems. Similarly, the solution can be used to search for sensitive data through specific patterns (for example, SSNs). Organizations can generate legal-ready compliance reports in minutes, with automatically classified data, and can generate reports for privacy risk assessments as well as reports meeting the requirements of HIPAA, DSS, and DSARs.

The solution supports DSARs (usually related, but not limited to GDPR and CCPA regulations) to locate human data profiles and related PII. Those capabilities are accessible through a comprehensive, yet intuitive, user interface that’s integrated with NetApp Cloud Manager.

Big data analytics are supported by the solutions’ data source consolidation capabilities. Users can create queries to find specific data sets across storage systems, then copy those files to a designated target location, effectively creating a new data subset. The solution supports copy migration scenarios using NetApp’s capabilities (FlexClone on NetApp storage, Sync on any kind of storage) and multiple action types: copy, delete, move, tag, label, and assign to a user.

Cloud Data Sense also addresses compliance and security by providing data encryption with Cloud Volumes ONTAP. The solution provides ransomware protection and support for GDPR, CCPA, and other privacy regulations. In addition, alerts can be created that inform administrators automatically whenever sensitive data is created (for example, when files contain credit card information) or to identify dark data sources (such as large email address lists), helping to achieve better compliance within organizations.

From a marketplace standpoint, Cloud Data Sense is tightly integrated with the NetApp portfolio and supports a broad range of NetApp solutions and services, whether on premises or in the cloud, such as Cloud Volumes Platform, Cloud Volumes ONTAP, Cloud Insight, Cloud Backup, and Cloud Tiering. Also supported are Azure NetApp Files, CVS for Google Cloud, and Amazon FSx for NetApp ONTAP.

Cloud Data Sense leverages AI and ML for automated data classification, data categorization, and contextual, deep data analysis.

Strengths: Cloud Data Sense provides formidable capabilities and comprehensive data source support. The ability to serve DSARs, identify PII, and support compliance regulations are significant advantages of the solution.

Challenges: Although Cloud Data Sense offers outstanding capabilities, marketplace integrations with third-party products (in the marketplace definition of the GigaOm Key Criteria) could be improved.

Varonis

Varonis offers various products under the Data Security Platform (DSP) umbrella, a security-oriented unstructured data management solution that supports multiple data sources including on-premises storage, cloud storage and SaaS applications. The solution collects data through distributed collectors, which can scan systems through APIs or agents. The data is forwarded to Varonis DSP servers that enrich and normalize data.

DSP creates a data inventory by scanning, classifying, and indexing file contents and properties, as well as data related to users and groups. It identifies at-risk, sensitive data out of the box, and at petabyte scale, with built-in support for PCI, GDPR, HIPAA, CCPA, and more regulations. Organizations can also use Varonis DatAnswers to accelerate the processing of DSARs, with simple search queries that can analyze file contents and report PII findings.

The solution presents an inventory of the collected data in various contexts such as data sensitivity, permissions mapping, user activity, data classification, active/inactive datasets, and more. Data classification can then be used to perform a variety of actions such as permission changes, data migrations, and so on.

Collected data can be exported as human-readable audit trails, but also is continuously provided to Varonis’ real-time alerting engine, which is capable of detecting suspicious activity or anomalous events (including ransomware attacks) and informing the administrators accordingly. Varonis DSP uses machine learning to analyze user access patterns for every user in the organization, mapping and monitoring the type of data they access over time. It then recommends permission adjustments, for example, if they no longer access sensitive data after a period. The solution can also simulate permission changes and their impact, as well as commit or roll back those changes.

Varonis offers a rich ecosystem consisting of third-party integration support (SIEMs, access management solutions) and cloud-based SaaS application support (through DatAdvantage Cloud), but also direct integration with storage and data management solutions, some of which are featured in this radar. The solution also offers an API for custom integrations.

Strengths: Varonis offers a solid unstructured data management solution with a broad set of capabilities and an outstanding focus on security, data classification (including regulatory aspects), and compliance with privacy laws.

Challenges: The solution currently consists of a suite of multiple products which can be purchased independently. While this is enticing to organizations seeking to resolve only specific use cases, it can be confusing in terms of licensing.

6. Analyst’s Take

With the increasing size of storage systems dedicated to unstructured data, a growing number of enterprises are looking at management solutions to minimize costs and increase control over critical security and compliance functions.

In the business segment, Varonis is leading, with NetApp and Cohesity close behind. Varonis offers the most comprehensive feature set with a strong focus on security and a risk-based approach to data management. This is a notable advantage in an increasingly complex environment where organizations have limited personnel to handle two very complex and time-demanding issues (security and regulatory risk).

Cohesity already had an enviable position and managed to close the gap with additional features around governance and compliance. NetApp has remarkable classification and data discovery capabilities using AI and ML to detect different patterns and categories. The company also is innovating at a very rapid pace.

Hitachi Vantara and IBM shouldn’t be overlooked because while both are established players in the large enterprise space, they offer compelling outcomes. IBM with fast time to value, simplicity, and yet a broad feature set; Hitachi Vantara with the ability to create and manage complex data management workflows.

In the group of challengers, there are several interesting solutions that could push into the leaders group soon. Most of these already have strong capabilities but are missing functionality related to the key criteria we established in the “Key Criteria for Unstructured Data Management” report.

One of the trends we see in unstructured data management is that some of our evaluated solutions come from data protection vendors. These are interesting developments because data protection is often the “final target” where all of an organization’s data is collected, opening the door to many data analysis and classification opportunities. This data is also relevant from a privacy and regulatory perspective, notably for legal retention requests.

Another interesting observation can be made with respect to distributed cloud file storage solutions. While these do not have the breadth of scope of a data protection solution in terms of the universality of data collected, they still manage a significant data share and offer the immediacy provided by live production data. Data growth trends and operations can be analyzed in real time, and the same type of analysis can be done to identify anomalies and potential ransomware attacks.

Anomaly detection algorithms mostly rely on AI/ML, but AI and ML can also be used to perform diverse activities such as deep content analysis, providing improved context for data classification, and even helping to identify sensitive data sets and/or personally identifiable information.

7. About Max Mortillaro

Max Mortillaro

Max Mortillaro is an independent industry analyst with a focus on storage, multi-cloud & hybrid cloud, data management, and data protection.

Max carries over 20 years of experience in the IT industry, having worked for organizations across various verticals such as the French Ministry of Foreign Affairs, HSBC, Dimension Data, and Novartis to cite the most prominent ones. Max remains a technology practitioner at heart and currently provides technological advice and management support, driving the qualification and release to production of new IT infrastructure initiatives in the heavily regulated pharmaceutical sector.

Besides publishing content/research on the TECHunplugged.io blog, Gestalt IT, Amazic World, and other outlets, Max is also regularly participating in podcasts or discussion panels. He has been a long-time Tech Field Day Alumni, former VMUG leader, and active member of the IT infrastructure community. He has also continuously been running his own technology blog kamshin.com since 2008, where his passion for content creation started.

Max is an advocate for online security, privacy, encryption, and digital rights. When not working on projects or creating content, Max loves to spend time with his wife and two sons, either busy cooking delicious meals or trekking/mountain biking.

8. About Enrico Signoretti

Enrico Signoretti

Enrico Signoretti has more than 25 years in technical product strategy and management roles. He has advised mid-market and large enterprises across numerous industries, and worked with a range of software companies from small ISVs to global providers.

Enrico is an internationally renowned expert on data storage—and a visionary, author, blogger, and speaker on the topic. He has tracked the evolution of the storage industry for years, as a Gigaom Research Analyst, an independent analyst, and as a contributor to the Register.

9. About GigaOm

GigaOm provides technical, operational, and business advice for IT’s strategic digital enterprise and business initiatives. Enterprise business leaders, CIOs, and technology organizations partner with GigaOm for practical, actionable, strategic, and visionary advice for modernizing and transforming their business. GigaOm’s advice empowers enterprises to successfully compete in an increasingly complicated business atmosphere that requires a solid understanding of constantly changing customer demands.

GigaOm works directly with enterprises both inside and outside of the IT organization to apply proven research and methodologies designed to avoid pitfalls and roadblocks while balancing risk and innovation. Research methodologies include but are not limited to adoption and benchmarking surveys, use cases, interviews, ROI/TCO, market landscapes, strategic trends, and technical benchmarks. Our analysts possess 20+ years of experience advising a spectrum of clients from early adopters to mainstream enterprises.

GigaOm’s perspective is that of the unbiased enterprise practitioner. Through this perspective, GigaOm connects with engaged and loyal subscribers on a deep and meaningful level.

10. Copyright

© Knowingly, Inc. 2022 "GigaOm Radar for Unstructured Data Management: Business-Focused Solutions" is a trademark of Knowingly, Inc. For permission to reproduce this report, please contact sales@gigaom.com.