Skip to content
Intellira

AI Operations

Summaries are not root cause

Most "AI for DevOps" tools summarize what already happened. Diagnosis means naming the change that caused it, with evidence — and that is a higher bar.

By Intellira Engineering, Editorial team

The difference between a summary and a diagnosis

A lot of "AI SRE" tooling does one thing well: it reads your alerts and logs and writes a tidy paragraph. That is a summary — a restatement of the symptom in fluent English. It is genuinely useful for situational awareness, and it is not the same as a diagnosis.

A diagnosis answers a different question: which change caused this, and how do I know? It names the commit, the build, the sync, or the config that turned a healthy system into a failing one — and it shows the evidence. That second clause is the whole job. Google's own SRE practice frames the postmortem goal as making sure "all contributing root cause(s) are well understood, and, especially, that effective preventive actions are put in place" — not that the incident was narrated well (Google SRE Book, Postmortem Culture).

Why the distinction matters at 3am

When you are on call, a summary tells you what you already know from the alert: payments is crash-looping. What you need is the next sentence — because commit a1f9c2e raised the cache to 2GB while the memory limit stayed at 1Gi. That sentence (illustrative) is the difference between a fix and another hour of bisecting across five tools.

The reason "which change?" is the right first question is empirical, not rhetorical. The change-failure-rate metric DORA tracks exists precisely because a large share of production failures are traceable to a deployment. DORA defines the change fail rate as "the ratio of deployments that require immediate intervention following a deployment. Likely resulting in a rollback of the changes or a 'hotfix' to quickly remediate any issues" (DORA: the four keys). If most incidents follow a change, then the highest-value thing an investigation can produce is the identity of that change — not a better description of the smoke.

Summaries scale by reading more text. Diagnosis scales by correlating across systems that don't talk to each other — source, CI, GitOps, runtime — which is exactly the slow, manual work humans do during an incident. A language model that only sees the alert stream cannot name a Bitbucket commit or an ArgoCD sync it was never shown; correlation is a data-access problem first and a reasoning problem second.

What "evidence" has to mean

If an AI claims a root cause, it should be able to point at the log line, the event, the diff. An unsourced verdict is just a confident summary. We hold a simple bar: every claim cites the thing it rests on, and a human can audit the chain. That mirrors how Google's blameless postmortem works — it investigates the "systematic reasons why an individual or team had incomplete or incorrect information" and records contributing causes with their trigger, rather than stopping at the symptom (Google SRE Book, Postmortem Culture). A causal chain you cannot audit is not safe to act on, and it is not safe to put in a post-mortem.

The same bar search engines now reward

This is not only an operational preference — it is the direction the wider content ecosystem moved too, which is worth naming because "AI wrote a summary" is now cheap and everywhere. Google's ranking guidance does not penalize content for being AI-assisted; its stated position is to "reward high-quality, original content" however it is produced, while treating automation used "to generate many pages without adding value for users" as a spam-policy violation (Google Search Central: guidance on AI-generated content).

What it rewards instead is experience and trust. In 2022 Google added the extra "E" — Experience — to its E-A-T quality framework, asking whether content "clearly demonstrate[s] first-hand expertise and a depth of knowledge (for example, expertise that comes from having actually used a product or service)"; of experience, expertise, authoritativeness, and trust, it states that "trust is most important" (Google Search Central: creating helpful, people-first content). A fluent summary with nothing under it fails that test for the same reason a confident, unsourced root cause fails an on-call engineer: there is no auditable proof behind the words.

The bar we hold

  • Name the change, not just the symptom.
  • Cite the evidence for every claim — the log line, the event, the diff.
  • Stay read-only: explain, never mutate.

It is a higher bar than summarization, and it is the only one worth building toward. "The pod restarted" is a sentence anyone can write. "This commit caused the restart, here is the proof, here is the fix" is the job.

Sources

By Intellira Engineering. AI-assisted draft, reviewed by the Intellira engineering team; claims cited inline; last verified 2026-06-02.

Related

Stop guessing at root cause

Connect your stack read-only and get an evidence-backed RCA on your next incident.