This GigaOm Research Reprint Expires: Apr 26, 2023

GigaOm Radar for Incident and Task Management Solutionsv1.0

1. Summary

Tracking and managing incidents allows organizations to quickly address problems in an IT environment and correct them with the appropriate response. Incident and task management platforms enable this capability. They notify subject matter experts or stakeholders of incidents affecting an environment. Before these platforms became prevalent, notification systems usually involved physical pagers, emails, and phone calls. Schedules would be created on paper and shared across organizations.

Every application relies on a complex ecosystem of tools, processes, and teams. An incident and task management solution not only helps to identify what might have gone wrong, but also recommends how the incident can be resolved. It provides clear lines of communication and responsibilities.

Incident and task management solutions may consume events and metrics to create comprehensive pictures of incidents and manage them. Event-based incident management tracks data generated in chronological order by system changes. In contrast, metric-based incident management ingests performance data for forecasting performance and determining the current state of systems.

An incident and task management solution can alert responsible persons through phone, text, email, or a mobile app. An on-call schedule is needed to ensure that resources or support groups–that is, people–are available to respond to incidents so they are not lost or ignored.

Incident and task management solutions are critical to both technical and business operations and contribute to the operational awareness of the entire organization.

No product is 100% free from bugs or service interruptions. Still, with a proper incident and task management solution, incidents are addressed by the correct personnel, and any adjustments or corrective actions needed proceed promptly.

The GigaOm Key Criteria and Radar reports provide an overview of incident and task management platforms, identify capabilities and evaluation factors for selecting a solution, and detail vendors and products that excel. These reports give prospective buyers an overview of the top incident and task management solutions in the market and will help decision makers evaluate platforms and decide where to invest.

How to Read this Report

This GigaOm report is one of a series of documents that helps IT organizations assess competing solutions in the context of well-defined features and criteria. For a fuller understanding, consider reviewing the following reports:
Key Criteria report: A detailed market sector analysis that assesses the impact that key product features and criteria have on top-line solution characteristics—such as scalability, performance, and TCO—that drive purchase decisions.
GigaOm Radar report: A forward-looking analysis that plots the relative value and progression of vendor solutions along multiple axes based on strategy and execution. The Radar report includes a breakdown of each vendor’s offering in the sector.
Solution Profile: An in-depth vendor analysis that builds on the framework developed in the Key Criteria and Radar reports to assess a company’s engagement within a technology sector. This analysis includes forward-looking guidance around both strategy and product.

2. Market Categories and Deployment Types

For a better understanding of the market and vendor positioning (Table 1), we assess how well solutions for incident and task management are positioned to serve the following specific market segments:

  • Startups: These companies often have limited funding for more extensive solutions. Optimally, a long-term trial or vendors with a program targeted at startups may be the best solution. If rapid expansion is possible, the ability to scale is a consideration.
  • Small-to-medium business (SMB): In this category we assess solutions on their ability to meet the needs of organizations ranging from small businesses to medium-sized companies. Also assessed are departmental use cases in large enterprises, where ease of use and deployment are more important than extensive management functionality, data mobility, and feature set.
  • Large enterprise: Here offerings are assessed on their ability to support large and business-critical projects. Optimal solutions in this category will have a strong focus on flexibility, performance, data services, and features that improve security and data protection. Scalability is another big differentiator, as is the ability to deploy the same service in different environments.

In addition, we recognize three deployment models for solutions in this report: software as a service (SaaS), on-premises, and hybrid.

  • SaaS solutions: These are available only in the cloud. Often designed, deployed, and managed by the service provider, they are available only from that specific provider. The big advantages of this type of solution are integration with other services offered by the cloud service provider (functions, for example) and its simplicity.
  • On-premises solutions: The entire solution resides within the enterprise environment. There are no requirements to connect to a SaaS platform. The solution can be deployed entirely on-premises to meet requirements for security and privacy not available within SaaS solutions.
  • Hybrid solutions: These solutions are meant to be installed both on-premises and in the cloud, allowing organizations to build hybrid or multicloud infrastructures.

Table 1. Vendor Positioning

Market Segment

Deployment Model

Startup SMB Large Enterprise SaaS On-Premises Hybrid
Atlassian
PagerDuty
Splunk
xMatters
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

3. Key Criteria Comparison

Building on the findings from the GigaOm report, “Key Criteria for Evaluating Issue Management and Tracking,” Table 2 summarizes how each vendor included in this research performs in the areas that we consider differentiating and critical in this sector. Table 3 follows this summary with insight into each product’s evaluation factors—the top-line characteristics that define the impact each will have on the organization. The objective is to give the reader a snapshot of the technical capabilities of available solutions, define the perimeter of the market landscape, and gauge the potential impact on the business.

Table 2. Key Criteria Comparison

Key Criteria

Runbooks User Management Integrations Maximum Escalation Level Chatops Audio & Video Conferencing Workflow Management
Atlassian 1 2 2 3 3 2 2
PagerDuty 3 2 3 2 3 3 1
Splunk 1 2 2 3 3 2 2
xMatters 3 2 2 3 3 3 3
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

Table 3. Evaluation Factors Comparison

Evaluation Metrics

Licensing Maintainability Production vs. Non-Production Training & Support Stakeholder Support
Atlassian 2 2 1 2 2
PagerDuty 2 2 1 2 2
Splunk 2 2 1 2 2
xMatters 2 2 2 2 3
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

By combining the information provided in the tables above, the reader can develop a clear understanding of the technical solutions available in the market.

4. GigaOm Radar

This report synthesizes the analysis of key criteria and their impact on evaluation factors to inform the GigaOm Radar graphic in Figure 1. The resulting chart is a forward-looking perspective on all the vendors in this report, based on their products’ technical capabilities and feature sets.

The GigaOm Radar plots vendor solutions across a series of concentric rings, with those set closer to the center judged to be of higher overall value. The chart characterizes each vendor on two axes—Maturity versus Innovation, and Feature Play versus Platform Play—while providing an arrow that projects each solution’s evolution over the coming 12 to 18 months.

Figure 1. GigaOm Radar for Incident and Task Management Solutions

As you can see in the Radar chart in Figure 1, two vendors are on the Feature Play side, while the other two vendors are Platform Play.

Feature Play orientation indicates a solution that is defined more by its individual features and may be limited in the sense that it is focused on niche functionality. xMatters and PagerDuty have strong features but are not part of a platform of other products.

Platform Plays are defined by their broad and even-handed coverage of the solution space, as well as their robust integration and alignment with other apps and platform services, such as IT service management (ITSM). Splunk On-Call and Atlassian Opsgenie both fit into larger platforms of solutions for IT operations management or software development and collaboration tools, respectively.

All of these solutions are mature products, as much of this market has been absorbed into other tools and can no longer be purchased as standalone solutions. xMatters stands out as an Outperformer due to its advanced feature set and its rate of feature growth.

Inside the GigaOm Radar

The GigaOm Radar weighs each vendor’s execution, roadmap, and ability to innovate to plot solutions along two axes, each set as opposing pairs. On the Y axis, Maturity recognizes solution stability, strength of ecosystem, and a conservative stance, while Innovation highlights technical innovation and a more aggressive approach. On the X axis, Feature Play connotes a narrow focus on niche or cutting-edge functionality, while Platform Play displays a broader platform focus and commitment to a comprehensive feature set.

The closer to center a solution sits, the better its execution and value, with top performers occupying the inner Leaders circle. The centermost circle is almost always empty, reserved for highly mature and consolidated markets that lack space for further innovation.

The GigaOm Radar offers a forward-looking assessment, plotting the current and projected position of each solution over a 12- to 18-month window. Arrows indicate travel based on strategy and pace of innovation, with vendors designated as Forward Movers, Fast Movers, or Outperformers based on their rate of progression.

Note that the Radar excludes vendor market share as a metric. The focus is on forward-looking analysis that emphasizes the value of innovation and differentiation over incumbent market position.

5. Vendor Insights

Atlassian Opsgenie

Atlassian produces software that helps teams work together more efficiently and effectively. The company provides project planning and management software, collaboration tools, and IT help desk solutions, and operates in four models: subscriptions (term licenses and cloud agreements), maintenance (annual contracts that provide support and periodic updates and are generally attached to perpetual license sales), perpetual license (upfront sale for indefinite use of the software), and “other” (training, strategic consulting, and revenue from the Atlassian Marketplace app store).

Opsgenie is a modern, cloud-based incident management platform that ensures critical incidents are never missed, and actions are taken by the right people as quickly as possible. Opsgenie receives alerts from monitoring systems and custom applications and categorizes each alert based on importance and timing.

On-call schedules ensure the right people are notified through multiple communication channels, including voice calls, email, SMS, and push messages to mobile devices. If an alert is not acknowledged, Opsgenie automatically escalates it, ensuring the incident gets the needed attention. This capability contributes to a high score for the escalation-level key criterion.

Opsgenie also received a high score for chatops, and integration with external chat solutions (such as Slack) is on par with other solutions.

Using Opsgenie, ops teams can:

  • Consolidate alerts from disparate monitoring tools.
  • Define escalation policies and on-call schedules with rotations to notify the right people and escalate as necessary.
  • Use multiple notification methods (email, SMS, push, phone call, group chat, and so on) to ensure alerts are seen by users.
  • Get access to alert details, respond to alerts, communicate with team members, and initiate investigative and corrective actions using Opsgenie apps for mobile devices.

Strengths: Opsgenie is part of the Atlassian suite of applications and provides a good platform-based solution for users of other Atlassian products.

Challenges: Runbooks require an additional product. Machine learning (ML) is not currently a part of Opsgenie.

PagerDuty

PagerDuty is a premier incident response platform that has been steadily building out its feature set to provide more automated, flexible, and proactive ways to orchestrate enterprise-wide response. Since its acquisition of Rundeck in 2020 (now rebranded as PagerDuty Process Automation), it has added automation of IT and business processes to its platform capabilities, facilitating automated root cause analysis, remediation, event-triggered processes, and even self-service IT requests.

PagerDuty has over 600 integration partners and can integrate with most monitoring tools, filtering out noisy alerts and focusing only on alerts about critical events detected. With bi-directional integration enabled between PagerDuty and an ITSM system, a resolved incident can be updated within the ITSM platform as well. PagerDuty sports a clean UI and is easy to navigate.

The solution has all the standard on-call management features, like flexible scheduling, alerting, and escalation policies, so organizations can ensure the right people are notified every time. The tool is augmented with powerful incident response features built on a service-based architecture to orchestrate a business-wide response that includes mobilizing the right cross-functional teams, activating targeted response plays, and managing stakeholder communications to align on business response.

To further streamline the incident response process, the PagerDuty solution includes Event Intelligence, which uses ML and data science techniques to reduce alert noise and help to create situational awareness for faster resolution. This includes pointing to probable origin points, surfacing change event correlation, showing related and past incidents, and highlighting service dependencies.

Rich analytics help users track issues, monitor team health, and create reports that are useful for management when deciding how to allocate resources and direct their team’s focus.

Due to the large number of integrations available, PagerDuty stood out on the integration key criteria. Native integrations are available out of the box for many ITSM and monitoring data sources. Custom integrations can be made with webhooks, which can be used to extend pure data (PD) and build custom solutions.

PagerDuty also features third-party tooling that enables DevOps workflows (CI/CD pipelines). There is a Terraform provider to programmatically create rules, teams, users, and schedules.

The solution received a high score on the chatops key criteria, supporting mobile, email, and instant messaging platforms to send alerts. It has a free pricing option (for up to five users).

PagerDuty also supports bi-directional integration, allowing it to communicate with monitoring, ticketing, and IT tools such as Zendesk, Jira, ServiceNow, and Salesforce, and to update incident status and other information in those tools.

There are several use cases for the solution, including:

  • Technology: Paging an SME to address an issue as part of an outage or incident.
  • Emergency response: Sending a mass notification to emergency responders.
  • Operations: Mobilizing a response to address any inefficiencies.

Strengths: PagerDuty has excellent analytics for identifying patterns. This gives teams the ability to see what anomalies may be lurking in their environment. The tool provides powerful integrated runbook capabilities as a premium feature tied to its automated remediation capabilities.

Challenges: Runbooks and runbook automation are premium features. Workflow management needs improvement.

Splunk On-Call

Originally named VictorOps, the software was acquired by Splunk in 2018 and renamed Splunk On-Call in October 2020.

Splunk On-Call can integrate with a variety of Splunk and third-party monitoring solutions to trigger incidents and send alerts to SMEs. The acquisition by Splunk allows the tool to evolve beyond an incident response platform by combining the data-harvesting capabilities of Splunk and the incident management of Splunk On-Call.

The solution received a good score on the workflow management and integrations key criteria. There’s a customizable workflow to deliver alerts through many channels, such as app, mobile phone, email, and CI/CD applications. Splunk On-Call is able to integrate with CI/CD platforms such as Jenkins, GitHub, and GitLab.

A heavy emphasis is placed on DevOps workflow and there’s a strong list of integrations that enable further integrations with ticketing software like Jira and ServiceNow, which in turn enables execution of seamless DevOps workflows.

This solution also has the same integrations as other platforms in its class. Some integrations allow Splunk On-Call to tap into any signal-producing platform, such as CloudWatch, Nagios, Zabbix, and Grafana.

Splunk On-Call received a high score on the chatops and audio/video conferencing key criteria. It supports mobile, email, and instant messaging platforms for sending alerts, and offers an in-app chat feature as well. Organizations can build dashboards within Splunk using metrics received from Splunk On-Call.

The platform features “war room” capabilities that allow users to communicate and plan actions to correct issues. It uses ML to assist in recommending subject matter experts for particular issues. Its audit trail capability allows operators to track issues and view the context that led to them. Users can annotate alerts to add details about an issue. This platform also features a post-incident review function.

Splunk On-Call can be the least expensive solution at $25 per user per month in the enterprise-level plan, and a free 14-day trial is available.

Strengths: Its “war room” capabilities allow incident commanders and responders to collaborate quickly and generate situational awareness around an issue. This built-in collaboration feature makes it easy to quickly assemble a team to work on an issue.

Challenges: Splunk On-Call does not have a native automated runbook feature, and relies on integrations to create a runbook. There is no non-production environment, but routing can be used to simulate some features without interrupting production.

xMatters

xMatters, now an Everbridge Company, was founded in 2000 and is headquartered in San Ramon, California.

xMatters can be used by IT, engineering, DevOps and digital services departments in a wide range of industries, including healthcare, retail, and financial services—wherever scheduling resources, support groups, and escalations are required to ensure uninterrupted services. It’s free to use for up to 10 users.

Like other incident response platforms, the solution can integrate with monitoring systems and alert SMEs of an incident. It can also be integrated within a CI/CD pipeline. It supports sending alerts via mobile, email, instant messaging, and Everbridge’s desktop alerting platform, SnapComms. Competitor platforms can integrate via API.

xMatters scored high on the escalation-level key criteria as its escalation policies can be fine-tuned to create custom schedules more easily than its competitors. An audit trail can show an issue’s timelines and actions taken to correct the problem. Service dependency maps give users a view of what a particular technology stack interacts with.

The xMatters solution got the highest score available for the runbooks key criteria because it sports native runbook automation. This feature does not need to be installed and configured and can be used out of the box.

Numerous integrations make xMatters highly customizable for any application, such as those used for emergency response, manufacturing, and technology. The use of intelligent noise suppression and other features of its Intelligent Signaling system are especially important in larger organizations.

xMatters can integrate with tools that enable DevOps tooling to be implemented, including GitHub, Jira, ServiceNow, and Jenkins.

With its acquisition by Everbridge, xMatters was able to further extend its reach to enhance operational awareness beyond just the technical to also support a business response. xMatters has purpose-built integrations and runbooks to address the technical issues and Everbridge has purpose-built technology to evaluate business risk and engage the proper business teams such as HR, marketing, and customer success. This means organizations can work across departments to address incidents.

Strengths: xMatters features native automated runbook execution out of the box. This feature is built into the platform and customers can use it to take action if an issue arises. It offers drag-and-drop capabilities within the UI for strong ease of use. Use cases for xMatters are expandable beyond the technology industry.

Challenges: The platform currently does not have any ML capability to identify issues. The Signal Intelligence features hold promise for future AI/ML abilities to provide early detection or an automated fix for an issue.

6. Analyst’s Take

The complexity of today’s IT environments requires resources from all across the company. When an incident occurs, those same resources have to be available to handle the problem. IT leaders need to be able to see schedules, assign resources, and manage the assorted tasks needed to remedy the incident. Incident and task management solutions can help. Organizations can choose to purchase incident response as a standalone solution or as part of a platform of other solutions.

The Feature Play solutions in this radar are xMatters and PagerDuty, two long-time rivals in this space. Both handle the basics of scheduling, notify resources well, and are more similar than they are different. Their costs are essentially equivalent, but there are differences that may sway potential buyers toward one or the other.

Runbooks can contain essential application or system information for subject matter experts, or to remedy a problem. They can also be used to auto-remediate some issues. PagerDuty requires PagerDuty Process Automation to use runbooks. xMatters includes runbooks in its implementation, depending on the licensing level. Runbooks are a higher licensing tier for xMatters and an add-on package for PagerDuty. The total costs will be similar for those requiring runbooks.

PagerDuty has ML features to target noise suppression and facilitate event correlation. While xMatters does not feature ML functionality explicitly, it does provide noise suppression, event correlation, and other capabilities to assist with incident and task management.

xMatters’ strengths in ease of use and workflow management stand out, as does its ability to maintain both a production and non-production environment. The second environment is useful in training and in the creation of workflows.

Comparing the two platform solutions, Splunk On-Call and Atlassian Opsgenie, is more complicated. Splunk On-Call is part of the Splunk family of operations and observability solutions. Atlassian also has a large suite of tools, many built around open-source projects. Neither provide native runbook support, thus requiring a non-native solution (Splunk) or an additional software solution (Atlassian).

While pricing is similar to the Feature Play solutions, neither is normally looked at as a standalone solution. The other products Splunk and Atalassian provide strengthen their solutions, but may not have as many features as those offered by vendors on the other side of the radar.

For enterprises with existing ITSM, monitoring, or observability solutions, xMatters, with its strong feature set, may be a good solution. The workflow management features are also appealing. PagerDuty may be a good fit for other organizations when PagerDuty Process Automation is included. Lab testing the two will provide the final critical details for buyers focused on ease-of-use and workflow management.

Users of Splunk are served well by Splunk On-Call, but xMatters and PagerDuty are easily integrated with almost all ITSM solutions.

7. About Ron Williams

Ron Williams

Ron Williams is an astute technology leader with more than 30 years’ experience providing innovative solutions for high-growth organizations. He is a highly analytical and accomplished professional who has directed the design and implementation of solutions across diverse sectors. Ron has a proven history of excellence propelling organizational success by establishing and executing strategic initiatives that optimize performance. He has demonstrated expertise in planning and implementing solutions for enterprises and business applications, developing key architectural components, performing risk analysis, and leading all phases of projects from initialization to completion. He has been recognized for promoting effective governance and positive change that improved operational efficiency, revenues, and cost savings. As an elite communicator and design architect, Ron has transformed strategic ideas into reality through close coordination with engineering teams, stakeholders, and C-level executives.

Ron has worked for the US Department of Defense (Star Wars initiative), NASA, Mary Kay Cosmetics, Texas Instruments, Sprint, TopGolf, and American Airlines, and participated in international consulting in Qatar, Brazil, and the U.K. He has led remote software and infrastructure teams in India, China, and Ghana.

Ron is a pioneer in enterprise architecture who improved response and resolution of enterprise-wide problems by deploying “smart” tools and platforms. In his current role as an analyst, Ron provides innovative technology and strategy solutions in both enterprise and SMB settings. He is currently using his expertise to analyze the IT processes of the future with particular interest in how machine learning and artificial intelligence can improve IT operations.

8. About GigaOm

GigaOm provides technical, operational, and business advice for IT’s strategic digital enterprise and business initiatives. Enterprise business leaders, CIOs, and technology organizations partner with GigaOm for practical, actionable, strategic, and visionary advice for modernizing and transforming their business. GigaOm’s advice empowers enterprises to successfully compete in an increasingly complicated business atmosphere that requires a solid understanding of constantly changing customer demands.

GigaOm works directly with enterprises both inside and outside of the IT organization to apply proven research and methodologies designed to avoid pitfalls and roadblocks while balancing risk and innovation. Research methodologies include but are not limited to adoption and benchmarking surveys, use cases, interviews, ROI/TCO, market landscapes, strategic trends, and technical benchmarks. Our analysts possess 20+ years of experience advising a spectrum of clients from early adopters to mainstream enterprises.

GigaOm’s perspective is that of the unbiased enterprise practitioner. Through this perspective, GigaOm connects with engaged and loyal subscribers on a deep and meaningful level.

9. Copyright

© Knowingly, Inc. 2022 "GigaOm Radar for Incident and Task Management Solutions" is a trademark of Knowingly, Inc. For permission to reproduce this report, please contact sales@gigaom.com.