GigaOm Solution Profile: CloudFabrix

An Exploration Based on Key Criteria for Evaluating AIOps

1. Summary

The CloudFabrix AIOps platform automates the management and operations of IT systems and service assurance for both on-premises and cloud-based infrastructure. The platform can analyze events and alerts from within or between complex deployments, such as multi-cloud and hybrid cloud.

Perhaps the most interesting aspect of the CloudFabrix AIOps solution is its use of distributed robotic data agents (RDAs) to acquire data from within the enterprise and edge devices. With RDAs, CloudFabrix simplifies and automates the processes of data acquisition, verification, enrichment, transportation, and remediation (when possible).

CloudFabrix’s AIOps Studio allows the creation of “pipelines” through which streaming data and metadata are consumed by bots at each stage, producing a complete data pipeline from source to destination. Pipelines are created in a CloudFabrix markup language and IDE using a low-code paradigm. The use of OpenTelemety allows that data to be consumed at multiple destinations.

Without a model, events are anomalies requiring human intervention. Data models and workflows must be created in AIOps Studio before the AIOps solution can create significant insights. The creation of a model can take days to weeks, and must be tweaked manually.

Support staff can enter the root-cause information in the incident system using a data model. CloudFabrix can then use the model data to remediate the incident should it reoccur, thus speeding up the resolution time.

CloudFabrix AIOps works well with other operations management tools and provides out-of-the-box integration with security systems and DevOps tool chains. For example, CloudFabrix can create an incident in ServiceNow, then instruct a staffing tool like PagerDuty or xMatters to identify an available on-call person and inform them of the incident. CloudFabrix continues to monitor the incident, closing out the ticket once resolution takes place.

CloudFabrix can create an Incident Room based on defined support groups, the discovered infrastructure, and business services. It also can ingest existing CMDB data. All affected stakeholder groups can review the proposed root cause. In addition, teams can create their own Situation Rooms for conditions within their world. The CloudFabrix AI learns from these situations and uses them for future analysis.

It is worth considering the ramifications of using a platform such as CloudFabrix. For example, within a network data stream, CloudFabrix correlates, deduplicates, and analyzes the data. If the data does not contain location information, the AI may also derive location data from server or network device names; however, there are often multiple naming conventions within large enterprises, some of which conflict with each other. CloudFabrix allows the enrichment of data streams with topology and context information from its AIA (Asset Intelligence Analytics) module.

2. Key Criteria Analysis

This report runs through the key criteria and evaluation metrics set out in the GigaOm Key Criteria Report for AIOps.

Key Criteria

Automation Learning Systems Dashboards and Reports Data Consumption Cross-Cloud Monitoring Ops Systems Integration DevOps Integration
CloudFabrix 2 3 3 3 2 2 2
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

Automation Capabilities: CloudFabrix supports the ability to onboard new applications and create useful analysis with minimal human intervention as well as the extensibility to automate remediation for well-known processes. This criterion includes the following considerations:

  • Proactive/self-healing operations: CloudFabrix can solve problems automatically without human intervention, either through external or internal orchestration tooling or by leveraging an automated ticketing system.
  • API: CloudFabrix provides an API, allowing access from external applications that need to include capabilities for inbound data from security SOAR systems or outbound data to cloud or on-premises management systems.
  • ITSM and Configuration Management Database (CMDB) updates: CloudFabrix workflows can update ITSM and CMDB after changes.

Learning systems: The CloudFabrix AI engine can learn from the data being consumed and change its behavior with additional input. The CloudFabrix AI must be taught, but does support unsupervised learning. Events not part of an AI model are considered anomalies and reported as such. They must be added manually to the AI.

Dashboards and reports: CloudFabrix has dashboards that are customizable. Teams can create dashboards limited to their services and configured items, but can see the entire enterprise when necessary.

Data consumption: CloudFabrix consumes input data and correlates it with causation. It can execute pre-approved changes or notify humans. This functionality includes:

  • End-user monitoring: CloudFabrix can consume any form of APM data, including RUM and Synthetic monitors.
  • System monitoring: CloudFabrix can ingest any data from hardware, OS, storage, and network system data feeds. It can consume SNMP, network device flows like Cisco’s Netflow, and new feeds like WMI or outputs of OEM management tools like those from HP and Dell.
  • Application monitoring: CloudFabrix can see infrastructure resources and there is also visibility directly into applications, assuming the ingest data exists. IT may also need to include API gateways and service mesh or caching technologies.
  • System connectivity: CloudFabrix has the ability to connect to a wide variety of systems, such as storage, compute, applications, and networks, and ensure those connections exist.

Cross-cloud monitoring: CloudFabrix can consume data regardless of source, so it has the ability to analyze data across cloud providers. CloudFabrix RDAs can ingest data or poll cloud vendor APIs to gather near-real-time metrics. Assuming metrics can be compared accurately, the tool can correlate metrics from one cloud vendor to corresponding metrics from a different vendor. Model building for cross-cloud may be complicated by the inability to compare and correlate metrics across vendors.

Integration: CloudFabrix can share data and services with other infrastructure-oriented software tools for functions such as security and monitoring, including:

  • Cost and usage monitoring: The data sent to CloudFabrix can monitor usage, cost information, and analytical systems found in other cost governance or traditional enterprise accounting.
  • Leveraging agents: CloudFabrix consumes feeds from major vendors’ agents, like Oracle OEM or other AIOps tool agents, that require their own agents on endpoint devices.
  • Configuration management: CloudFabrix lets the user see the current infrastructure state, such as with a CMDB. CloudFabrix can consume business service definitions from the CMDB, but these must be manually integrated.
  • IT service management: CloudFabrix can view or create incident tickets.

DevOps integration: CloudFabrix can integrate, and thus share, information between development-facing and operations-facing tool sets. This includes the ability to see the outcomes of the continuous deployment process of a DevOps tool chain and correlate that with ITSM change requests validated by the CMDB. Thus, CloudFabrix is always monitoring what is and not what was.

3. Evaluation Metrics Analysis

Evaluation Metrics

Flexibility Managability Ease of Implementation Usability
CloudFabrix 2 2 2 2
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

Flexibility: CloudFabrix RDAs can ingest telemetric data from any source, giving it flexibility in large enterprise environments. The cloud portion of CloudFabrix places no additional responsibilities for disaster recovery and business continuity (DR/BC). RDAs exist on-site and DR/BC are an important aspect of the deployment.

Manageability: With a SaaS deployment, CloudFabrix can manage multiple clouds. It includes most operational patterns and best practices immediately. The on-premises solution may take longer, as there are likely more data feeds to ingest. Experience has shown that even in a complicated environment, CloudFabrix can be up, running, and providing a level of useful data in 30 days.

Ease of implementation: Deployment time for the SaaS version of CloudFabrix is within hours. It does not yet have functional parity with the on-premises version at the time of this report. The on-premise implementation of CloudFabrix is more complicated, but with proper planning, the deployment time can be shortened.

Usability (use of dashboards and other analytics): The day-to-day use of CloudFabrix is simple and dependable.

ROI/TCO: We removed ROI/TCO from scoring because there was no uniform method to compare all the vendors. However, the ROI for an CloudFabrix AIOps can be realized from weeks to months before the solution prevents a major outage that could shut down parts or all of the company.

4. Bottom Line

The strength of CloudFabrix AIOps derives from the robotic data agents underlying the product and its ability to ingest any form of data from many sources. The supervised and unsupervised learning in the AI/ML allows the creation of customized models for various portions of the enterprise.

Ingesting data from multiple and very disparate sources can slow the implementation of CloudFabrix. However, the RDAs underpinning the AIOps solution can shorten the implementation. Given the role of AIOps as an enabler of IT operations in complex environments, CloudFabrix can transition from an expensive tactical adjunct to a comprehensive solution for the entire company.

The SaaS solution may be the better choice in cases where the infrastructure is less complicated, and organic growth has not created competing methods for monitoring the enterprise. The SaaS solution is a fit for public cloud-only enterprises.

Using Incident Rooms and support team dashboards creates organizational workflow and so returns value more quickly. CloudFabrix reduces the noise for operations teams, whether IT Operations, business units, or DevOps, allowing them to focus on improving the environment rather than watching their screens.

