Jenkins CI failures: a debugging field guide

Start here: read the end state before you read the log

A Jenkins run that "broke" can be in several different states, and the state tells you which log to open and which fix to reach for. Before scrolling a 4,000-line console, answer three questions in order: (1) what is the run's final result — FAILURE, UNSTABLE, ABORTED, or still queued; (2) if it ran, which stage went red; (3) is the problem that the build ran and failed, or that it never got an executor to run on. Those three answers route almost every Jenkins debugging session. Guessing — re-running the job, bumping the timeout, restarting the agent — wastes time because it skips the classification step that decides everything downstream.

This is the triage and methodology layer. Each common failure below ends with a link to a dedicated page that carries the full fix; this guide's job is to get you to the right one fast.

Why a systematic approach beats re-running the job

Jenkins build outcomes are not free-form. They are a fixed, ordered set of result constants, and the order is one-directional: a result can only get worse during a run, never better. The combine() logic that merges stage results "returns the worse one," and the result ordinal is documented as "Bigger numbers are worse" (Jenkins hudson.model.Result API). That single fact has a practical consequence: a green-then-red transition is meaningful, a red-then-green transition inside one run cannot happen, and a run marked UNSTABLE did not have a fatal error — it had a non-fatal one. Reading the end state correctly tells you whether you are chasing a crash or a quality gate before you read a single line of output.

The other reason to triage first: the failing signal is almost never the whole console. A multi-stage pipeline records a result per stage, and an early failure can leave later stages as NOT_BUILT — "used in a multi-stage build ... where a problem in earlier stage prevented later stages from building" (hudson.model.Result). The interesting log lives in the first stage that changed color, not at the bottom of the console where the run finally gave up.

The triage framework

Step 1 — Read the final result

The five build results have precise meanings (Jenkins hudson.model.Result API):

SUCCESS — "The build had no errors."
UNSTABLE — "The build had some errors but they were not fatal. For example, some tests failed."
FAILURE — "The build had a fatal error."
ABORTED — "The build was manually aborted." (also the result when a build is cancelled or hits a timeout)
NOT_BUILT — "used in a multi-stage build ... where a problem in earlier stage prevented later stages from building."

What each state routes to:

FAILURE (red) — a step exited non-zero or the agent died mid-run. Go find the failing stage (Step 2).
UNSTABLE (yellow) — the run completed but a quality gate flagged it. By default the JUnit plugin "marks the build as unstable if at least one test fails" (JUnit plugin). Do not debug this as a crash — read the test report, not the shell output.
ABORTED — someone cancelled it, or a timeout fired. Distinguish a human cancel from a timeout: a timeout usually means the run was waiting (often for an executor — see Step 3), not that the code is broken.
Still queued / never started — there is no stage log to read because nothing ran. This is a scheduling problem, not a build problem (Step 3).

Step 2 — Find the failing stage, open THAT log

Declarative pipelines are built from stages — the docs recommend "at least one stage directive for each discrete part of the continuous delivery process, such as Build, Test, and Deploy" (Pipeline syntax). The Stage View and Blue Ocean render each stage as a cell; failures are "typically denoted by red in the web UI" (Pipeline syntax). Click the first red cell, not the last.

The order to read it:

Identify the first stage that is not green. Earlier stages that are NOT_BUILT are downstream noise; the first red stage is the cause.
Open that stage's log only. In Blue Ocean, expand the failing step; in the classic UI, use the Stage View cell's log link rather than the full console.
Read the exit code and the last command before it. A non-zero exit code from a shell step is the fatal error Jenkins is reporting; the line above it is what you actually need to fix.
Compare against the last green run. Because results only worsen within a run, the useful comparison is across runs: what changed between the last SUCCESS and this one — a commit, a dependency, an agent image.

Step 3 — Separate "queued" from "failing"

This is the distinction that most often gets misdiagnosed. A build that never started has no failing log; it has a queue reason. Jenkins shows "a little black clock icon in the build queue" to indicate "that your job is sitting in the queue unnecessarily," and the tooltip on that icon states why the job cannot proceed right now (Executor Starvation).

Hover the queued item and read the tooltip first. The documented reasons a build waits (Executor Starvation):

The build needs a specific agent that is offline — check the agent status page at http://server/jenkins/computer/AGENTNAME.
The build needs a specific agent that is already fully busy building other things.
All agents carrying the required label are busy — at which point the documented answer is to add more agents.

A "running" pipeline can still be stuck in the queue: a node / agent block allocates an executor, and the time spent allocating the agent is not always inside the stage timeout, so a pipeline can report as running while its node step waits for a slot (Pipeline syntax). If the symptom is waiting, this is a capacity/label problem, not a code problem — do not read it as a build failure.

To check capacity directly, go to Manage Jenkins → Nodes and confirm which executors are online and busy (Executor Starvation).

Recognising the common failures

Each of these is a short "how do I know I'm in this case" — the full fix lives on the linked page.

Builds stuck in the queue (executor starvation)

Recognise it by: the run shows the black clock icon, the tooltip says "Waiting for next available executor," nothing is running yet, and Manage Jenkins → Nodes shows executors busy or offline. There is no stage log because no stage started. Before you "add more agents," rule out the cheaper causes — a label that no online agent satisfies, a blocked or throttled build, or a flyweight task waiting — because true starvation and a label typo present identically in the queue.

Full diagnosis and fix: Jenkins builds stuck — executor starvation.

A build that ran and failed at a stage

Recognise it by: the run is FAILURE (red) or UNSTABLE (yellow), a specific stage is colored, and there is a stage log to open. The classification here is what UNSTABLE versus FAILURE is telling you — a yellow run is a quality gate (commonly test failures via the JUnit plugin's default behavior), a red run is a fatal non-zero exit or a lost agent. The fix path differs sharply between the two, and an exit code 137 (agent killed) points somewhere different again.

Full diagnosis and fix: Jenkins build failing.

General prevention and operational principles

Triage by state, not by re-run. A re-run that "fixes" a failure usually means the failure was a queue or capacity flake, not a code fix — and you have learned nothing. Classify first; the state tells you whether re-running can even help.
Use labels, not named agents. The executor-starvation guide's recurring advice is to use labels "so that builds can run on any machine that satisfies the system requirements," so one offline or busy agent does not block a job (Executor Starvation). Pinning a job to a single named agent converts a capacity blip into a hard stall.
Decide whether failed tests should fail the build or just flag it. The JUnit default is UNSTABLE on any test failure (JUnit plugin). That is a deliberate choice — yellow keeps the artifact while signalling a quality issue. If your gate must hard-stop on test failure, configure it explicitly rather than being surprised by yellow.
Make the failing stage obvious by design. Split work into the discrete stages the Pipeline docs recommend (Pipeline syntax). A pipeline that does everything in one giant stage forces you back to reading the whole console — the exact failure mode triage is meant to avoid.
Treat the queue as a first-class signal. A growing queue with idle wall-clock time is capacity or label drift, visible in Manage Jenkins → Nodes long before any build "fails." Watch it as you would any saturation signal, not only after a job complains.
Correlate across the change. Because a run's result can only worsen within itself, the diagnostic comparison is always against the last green run — the commit, dependency, or agent-image change between green and red is the lead. In a multi-tool stack (Jenkins plus the repo, the artifact registry, the deploy target) that correlation is the slow part of every investigation, and the part worth automating.

Sources

By Intellira Engineering. AI-assisted draft, reviewed by the Intellira engineering team; claims cited inline; last verified 2026-06-02.