Skip to main content
The Hard Parts.dev
FM-24 technical FM Failure Modes
Severity high Freq common

Test Theater

A team has high coverage numbers and a passing CI pipeline, but tests that do not catch real regressions.

Severity
high
Frequency
common
Lifecycle
build · operate
Recovery
medium
Confidence
high
At a glanceFM-24
Also known as

coverage theaterthe green build illusionmetric-driven testingfalse safety net

First noticed by

staff engineersenior engineerQA lead

Mistaken for
strong engineering discipline
Often mistaken as
a well-tested codebase

Why it looks healthy

Concrete external tells that make the pattern read as responsible behavior.

  • Coverage percentage is high and visible
  • CI is green on every PR
  • The test suite is large and growing
  • New engineers are told "we take testing seriously"

Definition

What it is

Blast radius code reliability delivery

Tests are written to satisfy coverage metrics or pass CI rather than to verify behavior, creating a false sense of safety.

How it unfolds

The arc of the pattern

  1. Starts

    A team is told to increase test coverage or maintain a green build.

  2. Feels reasonable because

    Coverage numbers and passing CI are measurable and feel like quality signals.

  3. Escalates

    Tests are written to hit coverage targets, not to express intent. Assertions are weak or absent.

  4. Ends

    A significant regression ships despite a fully green build. The team is surprised; coverage was above the threshold.

Recognition

Warning signs by stage

Observable signals as the pattern progresses.

EARLY

Early

  • Test coverage is tracked but test quality is not discussed.
  • Tests have no assertions or assert only that code runs without exceptions.
  • The same coverage number is cited as evidence in different contexts.

MID

Mid

  • Bugs are found in areas with high coverage.
  • Refactors break tests that were not testing behavior.
  • Engineers describe tests as a formality before merging.

LATE

Late

  • A significant production regression ships through a green build.
  • Post-mortem reveals tests existed but did not cover the failing scenario.
  • Engineers have stopped trusting the test suite.

Root causes

Why it happens

  • Coverage is used as a proxy for quality
  • Tests are written after the fact to satisfy requirements
  • There is no culture of test review as distinct from code review
  • Assertion quality is not a review criterion

Response

What to do

Immediate triage first, then structural fixes.

First move

Sample ten random tests from the suite and check what each one would catch if it broke - you will know within an hour what kind of suite you actually have.

Hard trade-off

Accept lower coverage numbers in exchange for fewer, meaningful tests that actually catch regressions.

Recovery trap

Adopting a better coverage tool that measures the same thing more precisely, preserving the illusion at higher fidelity.

Immediate actions

  • Review a sample of tests for meaningful assertions
  • Run mutation testing to measure how many tests actually catch bugs
  • Stop reporting coverage without also reporting defect escape rate

Structural fixes

  • Pair coverage metrics with defect escape metrics
  • Add test quality as a criterion in code review
  • Use behavior-driven test naming to make intent explicit

What not to do

  • Do not raise coverage targets as a response to escaped defects
  • Do not treat a green build as evidence the system is correct

AI impact

How AI distorts this pattern

Where AI-assisted workflows accelerate, hide, or help with this failure mode.

AI can help with

  • AI can help generate meaningful test cases from specifications, edge cases, and real incident reports.

AI can make worse by

  • AI can generate high-coverage test suites quickly that satisfy metrics without meaningfully testing behavior, accelerating the theater at scale.

Relationships

Connected patterns

Causal flows inside Failure Modes, and related entries across the site.

Easy to confuse with

Nearby patterns and how this one differs.

  • Metric myopia is the broader pattern. Test theater is metric myopia applied specifically to testing.

  • Synthetic velocity is output without durable value. Test theater is test output without confidence value.

  • Adjacent concept Legitimate testing discipline

    Legitimate testing catches regressions. Theater produces numbers.

Heard in the wild

What it sounds like

The phrase that signals the pattern is about to start, and who tends to say it.

Heard in the wild

Coverage is at 85%, so we should be fine.

Said byengineer or manager before a release

Notes from practice

What experienced people notice

Annotations from engineers who have worked this pattern before.

Best momentWhen intervention actually changes the trajectory.
When coverage is celebrated without asking what the tests actually assert
Counter moveThe specific action that breaks the pattern.
Ask what the tests catch, not how many there are.
False positiveWhen this pattern is actually the correct call.
High coverage is better than low coverage. The failure mode is treating coverage as a quality guarantee rather than a partial quality indicator.