Join our daily and weekly newsletters for the latest updates and exclusive content on top AI coverage. More information
Incident response, the process of responding to system disruptions and slowdowns, is a critical aspect of IT operations. It’s also an activity that traditionally involves a lot of manual, time-consuming processes.
That’s the challenge Harness is targeting with its new incident response service. The technology is entering early access today as a module on the company’s eponymous platform. Harness started in 2017 with an initial focus on continuous integration/continuous delivery (CI/CD) automation for DevOps. In the following years, the company expanded into a multi-module software delivery platform. In the fall of 2024, Harness infiltrated agent AI, initially to aid in software development.
Now the company is extending the same basic agent AI foundation to incident response. The new solution also benefits from licensed features originally developed by development workflow vendor Transposit. Tina Huang, co-founder of Transposit, along with many of her team, joined Harness in September 2024.
The goal of Harness Incident Response is to accelerate the mean time to resolution (MTTR) of an incident.
“If you think about what DevOps platforms have been until now, a lot of it has been about helping you structure those deployments,” Huang told VentureBeat. “I think a very natural place to go after that is, ‘How can I hold your deployments after they go into production?’
How Harness Enables Autonomous Incident Response Using Agent AI
At the core of Harness’ Incident Response module is the company’s AI agent architecture, first introduced in September 2024.
Jyoti Bansal, CEO and co-founder of Harness, explained to VentureBeat that its AI agents are designed to provide autonomous assistance that goes beyond simply notifying engineers of incidents. Traditional incident response technology uses an approach known as the playbook. IT teams, often working with site reliability engineers (SREs), define manuals that describe step-by-step processes for recovering from different types of service interruptions.
Instead of relying solely on predefined playbooks, AI agents can suggest actions, identify potential root causes, and even create new playbooks on the fly.
“The agent’s workflow suggests actions that should be taken,” Bansal said.
Huang explained that AI agents perform several steps that are critical to helping organizations respond more quickly to incidents. Even before the playbook can be launched, some sorting needs to be done, Bansal explained. For example, generic triage can identify which services are affected, or identify both upstream and downstream dependencies that will also be affected by the incident.
The Harness system has agents that are aware and connected to multiple systems and can automatically collect information, including information and discussions from Slack channels. This information can then help other agents alert people and provide autonomous assistance.
While the system has a high degree of automation, Huang stressed that people are still in the loop. But instead of a person raising a problem and then having to find out if there’s a manual—and if so, how to run it—the system recommends a fix, and the person just has to approve it.
Incident response requires more than just technology
The Harness Incident Response module can run independently, meaning that the organization no longer needs to run any other Harness modules.
But Bansal expects the combined offering – which could allow integration with many other workflows including DevOps or chaos engineering – could be beneficial. Chaos engineering is the process of injecting unexpected variables and events into an application to see how it responds. Harness has a chaos engineering module as part of its platform as of 2022.
Huang explained that as part of the incident response platform, the organization can run “fire drills” along with the chaos engineering module to test different scenarios.
“Incidents happen infrequently and are often the unfortunate result of something you didn’t catch earlier,” Huang said. “We want to enable a very proactive approach to incident response.”
How businesses will benefit from agent-based AI incident response
One of Harness’s customers using the incident response module is Tyler Technologies, which develops software for the public sector.
The company uses the Harness platform for continuous deployment, cloud cost management and feature development. Adding incident response could help solve a key problem it faces, explained Jeff Green, CTO of Tyler Technologies.
“Our main challenge is to really integrate all the operational data, metrics and processes and then relate them to one unified approach to managing incidents and automating our response to them,” he told VentureBeat. “Our portfolio includes more than 100 products built on different technologies using a wide range of devops tools and platforms.”
The incident response capability will complement the existing operations that Tyler Technologies already conducts with Harness. For example, the ability to correlate deployments with incidents or flags functionality with incidents.
“We think the artificial intelligence capabilities embedded in the product will save a lot of time by helping us with root cause analysis, identifying ways to mitigate or resolve incidents, and preventing incidents,” Green said. “Much of this work today is done by humans pulling data from various sources, sifting through logs and application performance monitoring (APM) data, and looking for patterns, tasks for which AI is better suited.”
ROI agent AI for incident response
Another Harness customer evaluating the incident response module is Omar Alwattar, Sr DevOps Engineer at InStride.
Alwattar told VentureBeat that his firm uses the Harness Continuous Delivery module. He noted that when it comes to incident response, his organization has two key challenges: proactive monitoring and root cause identification. Harness’s new incident response tool is interesting to his company, he said, because it will help identify problems more quickly and automatically suggest fixes.
“In terms of ROI, the most significant impact would be on reducing downtime, as this directly affects SLA compliance and customer satisfaction,” Alwattar said. “In addition, by automating aspects of incident response, our 11-person DevOps team can focus more on strategic projects and innovation than on constant troubleshooting.”