Skip to main content
The Hard Parts.dev
RF-07 Code · Delivery RF Red Flags
Severity medium-high Freq common

End-to-end tests carry all the confidence

The team relies mainly on slow, broad tests because lower-level confidence is weak or absent.

Severity
medium-high
Frequency
common
First noticed by
developers · QA · platform engineers
Detectability
obvious
Confidence
high
At a glanceRF-07
Where you see this

legacy systemsUI-heavy productsintegration-dense platforms

Not necessarily a problem when
a small system is intentionally tested mainly through a few stable end-to-end flows and this remains affordable
Often mistaken for
end-to-end tests are more real, so they should dominate
Time horizon
near-term
Best placed to act

tech leadquality lead

The signal

What you would actually notice

Confidence becomes slow, expensive, flaky, and difficult to localize.

Field observation

Pipeline trust depends mostly on large system tests while unit, component, or contract tests are thin or untrusted.

Also observed

  • If the browser suite passes, we ship.
  • We do not really trust the unit tests.

Primary reading

What it usually indicates

Most likely underlying patterns when this signal shows up. Not a diagnosis, a starting hypothesis.

Usually indicates

Most likely underlying patterns when this signal shows up.

  • poor testability
  • weak lower-level coverage
  • integration-heavy system design
  • late confidence strategy

Stakes

Why it matters

Confidence becomes slow, expensive, flaky, and difficult to localize.

Inspection

What to check next

Deliberate steps to confirm or disconfirm the primary reading above. Not a checklist. An order of inspection.

  1. pipeline timing
  2. flake rates
  3. test distribution by layer

Diagnostic questions

Questions to ask the team, or yourself, before concluding anything.

  1. What lower-level confidence is missing?
  2. Which key behaviors could be proven earlier and cheaper?
  3. Do end-to-end tests verify product reality or substitute for missing seams?

Progression

Under the signal

Where this pattern tends to come from, what's holding it up, and where it goes if nothing changes.

Leading indicators

What tends to show up first.

  • long pipelines
  • flake anxiety
  • developers cannot tell where a failure originates

Common root causes

What is usually sitting under the signal.

  • design not testable in smaller scopes
  • late QA strategy
  • integration-first confidence model

Likely consequences

What happens if nothing changes.

  • slow feedback
  • flake fatigue
  • expensive troubleshooting

Look-alikes

Not what it looks like

Patterns that can be mistaken for this signal, and 'fix' attempts that make it worse.

False friends Things the signal is often confused with, but isn't.
  • end-to-end tests are more real, so they should dominate

Anti-patterns when responding

Responses that feel sensible and usually make the underlying pattern worse.

  • solving all quality concerns by adding more end-to-end tests
  • treating pipeline duration as the unavoidable cost of confidence

Context

Context and ownership

Where this signal surfaces, who sees it first, who can actually act, and how much runway there usually is before escalation.

Common contexts

Where it shows up

  • legacy systems
  • UI-heavy products
  • integration-dense platforms
Most likely to notice

Who sees it first

Before it escalates.

  • developers
  • QA
  • platform engineers
Best placed to act

Who can move on it

Not always the same as who notices it.

  • tech lead
  • quality lead
Time horizon

near-term

How much runway there usually is before the signal hardens into the underlying pattern.

AI impact

AI effects on this signal

How AI-assisted and AI-driven workflows tend to amplify or hide this signal.

AI amplifies

Ways AI tooling tends to make this signal louder or more common.

  • AI can generate more broad tests quickly, increasing volume without fixing confidence distribution.

AI masks

Ways AI tooling tends to hide this signal, so it keeps growing under the surface.

  • Large test suites can look like maturity while actually signaling design weakness.

Relationships

Connected signals

Related failure modes, decisions behind the signal, response playbooks, and neighboring red flags.