AI in DevOps and Incident Response

By late 2023 every vendor had an AI incident response pitch. I went to the people who actually run systems at scale to find out what was real. Nora, Jeremy, Mandi, and Brent each came in with a different angle, and what came back was specific and cautionary.

The consensus across the four conversations was narrow. Generative AI is useful for summarization and toil. Hallucination disqualifies it from autonomous action in high-stakes incidents. Several of them reached, independently, for the same frame: working with these tools is like working with a very junior engineer. You still have to check everything.

By late 2023, every vendor had an AI pitch for incident response. Autonomous remediation. AI-powered root cause analysis. MTTR cut in half, automatically. I went to the people who actually run systems at scale to find out what was real. I interviewed Nora Jones (founder of Jeli, formerly Slack and Netflix), Jeremy Edberg (Amazon Alexa, formerly Netflix and Reddit), Mandi Walls (PagerDuty, formerly Chef), and Brent Chapman (Great Circle Associates, formerly Google and Slack). What came back was more specific, and more cautionary, than the vendor hype.

Key Themes

Where AI actually helps: summarization

The consensus is narrow but real. AI is useful for incident summarization: drafting post-incident reports, generating status updates, and catching latecomers up to speed during an active incident. These are tasks where a plausible first draft is valuable, and where a human will verify before it matters.

Nora Jones frames it precisely: “I think what we really want to do is use AI to get people more curious about what’s happening in their incidents.” The value is in the question it opens, not the answer it provides.

Why Hallucination Disqualifies Autonomous Remediation

The harder truth is that AI hallucinates, and in incident response, hallucination is dangerous. Brent Chapman’s observation cuts to the bone: “LLMs are sometimes wrong, but never uncertain.” A system that is confident and occasionally wrong is the worst possible profile for taking autonomous action in high-stakes, time-compressed situations.

Jeremy Edberg draws the line explicitly: “Right now, GenAI is something of an advisory tool. We’re not to the point where we trust it enough to take actions.” The human-in-the-loop is not a temporary limitation waiting to be engineered away. It is the right architecture given current reliability. Every AI-powered RCA and autonomous remediation pitch in 2023 ran into this constraint. The serious practitioners all landed in the same place.

The Junior Developer Analogy

Multiple practitioners reach for the same frame independently: working with AI tools is like working with a very junior programmer. You still have to check everything. You still have to ask whether the output is right, useful, and complete. Alert correlation, automated runbooks, AI-assisted postmortem generation all require the same oversight.

Edberg uses this frame to address career anxiety directly: “If you are good at logic and want to learn how to reason about computer systems, software engineering will still be a great place to be.” AI will accelerate development, particularly for engineers who already know how to think about systems. It will not replace that judgment.

The AI SRE Career Question

Mandi Walls draws a useful distinction for roles: SREs, who are deeply integrated with engineering practice, will see more direct benefits from AI coding and observability tools than operators in more traditional roles. The productivity gains land where the work is closest to the code. MTTR reduction through AI-assisted triage and automated runbooks accrues to engineers who understand what the AI is actually doing.

The overall career picture is a shift toward strategic work and away from the mechanical. Practitioners who understand systems reasoning will find AI accelerates their output. Those optimizing for task execution without deeper systems understanding face more pressure.

The Human Judgment Floor

What runs through every perspective is a shared conviction: AI does not yet have contextual or collaborative judgment. It cannot read the room during an incident, weigh the organizational history that shapes which escalation matters, or decide what to prioritize when the situation is genuinely novel. Autonomous incident investigation fails at exactly the moments when the incident is most unusual, which are precisely the moments when it would matter most.

Nora Jones is direct: “I don’t think generative AI is going to fix the incidents for you.” Someone still has to verify, lead, and decide. The tools make some of that work faster while keeping the responsibility in human hands.