Skip to main content
The Hard Parts.dev
EP-12 Architecture EP Engineering Playbook
Difficulty high Owner · tech lead

Improve testability without stopping delivery

Improve testability incrementally by adding seams, isolating side effects, simplifying change hotspots, and rebalancing confidence creation across layers while real work continues.

Difficulty
high
Time horizon
weeks to months
Primary owner
tech lead
Confidence
high
At a glanceEP-12
Situation
The system is hard to test, but delivery cannot pause for a large rewrite.
Goal
Make routine changes safer and cheaper to validate without waiting for a total redesign.
Do not use when
the system is already testable enough and the issue is weak review or release process instead
Primary owner
tech lead
Roles involved

tech leadmaintainers of hotspot areasQA or quality leadarchitect where structural moves are needed

Context

The situation

Deciding whether to reach for this playbook: when it fits, and when it doesn't.

Use when

Conditions where this playbook is the right tool.

  • Small changes are expensive to test
  • The team relies too heavily on manual or end-to-end validation
  • Developers avoid touching areas because testing them is painful
  • The system resists focused automated confidence

Stakes

Why this matters

What this playbook protects against, and why skipping or half-running it tends to be expensive.

Poor testability is often architecture feedback. If simple changes are hard to test, the system is revealing tight coupling, side-effect sprawl, hidden state, or weak boundaries.

Quality bar

What good looks like

The observable qualities of a team or system that is actually doing this well. Not just going through the motions.

Signs of the playbook done well

  • The team can add focused tests around important behaviors more easily
  • High-risk changes no longer require huge setup or full-stack confidence only
  • Testability improves first in hotspots that slow delivery the most
  • Confidence gets created earlier and closer to the change

Preparation

Before you start

What you need available and true before running the procedure. Skipping this is the most common reason playbooks fail.

Inputs

Material you'll want to gather first.

  • Painful recent changes
  • Test pyramid or current confidence map
  • Hotspot areas
  • Incident and escape defect history
  • Dependency and side-effect map

Prerequisites

Conditions that should be true for this to work.

  • You can identify which changes are hard to test and why
  • The team is willing to improve structure during normal delivery work
  • There is some discipline around not making testability worse in the meantime

Procedure

The procedure

Each step carries its purpose (why it exists), its actions (what you do), and its outputs (what you produce). Read the purpose. It's what keeps the step from degenerating into checklist theatre.

  1. Map test pain to design pain

    Avoid treating testability as only a tooling issue.

    Actions

    • Review recent painful changes and what made them hard to validate
    • Separate fixture pain, environment pain, hidden state, and side-effect pain
    • Identify repeated missing seams

    Outputs

    • Testability pain map
  2. Choose one hotspot and one seam at a time

    Improve incrementally where it matters most.

    Actions

    • Pick a high-change or high-risk area
    • Introduce one clearer seam: isolate side effects, split orchestration from rules, or clarify data flow
    • Ensure the next similar change can be tested more locally

    Outputs

    • Testability improvement slice
  3. Rebalance confidence layers

    Move some safety earlier and cheaper.

    Actions

    • Identify where unit, component, contract, and end-to-end tests each add value
    • Reduce reliance on full-stack tests for behavior that can be proven earlier
    • Add focused checks around the business-critical parts, not generic volume

    Outputs

    • Confidence layer plan
  4. Change the definition of ready-to-merge

    Make testability improvement part of normal work.

    Actions

    • Ask on hot areas whether the change made the next change easier or harder to validate
    • Prefer review comments that challenge hidden coupling or weak seams
    • Avoid accepting new hard-to-test patterns casually

    Outputs

    • Testability review guidance
  5. Track whether confidence is becoming cheaper

    Ensure the work improves real delivery flow.

    Actions

    • Watch time to validate common changes
    • Watch flake dependence and end-to-end overuse
    • Review whether the same areas remain scary to change

    Outputs

    • Testability progress review

Judgment

Judgment calls and pitfalls

The places where execution actually diverges: decisions that need thought, questions worth asking, and mistakes that recur regardless of good intent.

Decision points

Moments where judgment and trade-offs matter more than procedure.

  • What hotspot should improve first?
  • Which seam gives the biggest gain in local confidence?
  • What confidence should stay end-to-end and what can move lower?
  • How much structure change can fit into normal delivery work?

Questions worth asking

Prompts to use on yourself, the team, or an AI assistant while running the procedure.

  • What makes routine changes hard to validate here?
  • Which seam would most reduce test pain in the next two weeks?
  • What confidence are we creating too late today?

Common mistakes

Patterns that surface across teams running this playbook.

  • Trying to improve testability everywhere at once
  • Adding many tests around poor structure without reducing the core pain
  • Treating end-to-end volume as the main answer
  • Separating refactor time completely from product work so it never happens

Warning signs you are doing it wrong

Signals that the playbook is being executed but not landing.

  • Test count rises but validation time does not improve
  • Hotspot changes are still avoided for the same reasons
  • Developers still say it is faster to click than to write a useful test
  • The new tests assert implementation detail rather than behavior

Outcomes

Outcomes and signals

What should exist after the playbook runs, how you'll know it worked, and what to watch for over time.

Artifacts to produce

Durable outputs the playbook should leave behind.

  • Testability pain map
  • Testability improvement slice plan
  • Confidence layer plan
  • Testability review guidance
  • Testability progress review

Success signals

Observable changes that mean the playbook landed.

  • Common changes become easier to validate
  • Fewer changes require full-stack setup by default
  • Developers report less fear in targeted hotspots
  • Review and release confidence improve together

Follow-up actions

Moves that keep the playbook's effects compounding after it finishes.

  • Promote recurring testability issues into architecture debt work
  • Update coding and review norms for hotspot areas
  • Teach the team to spot hidden state and side-effect sprawl earlier

Metrics or signals to watch

Longer-horizon indicators that the underlying problem is receding.

  • Time to validate a routine change
  • Share of confidence coming from end-to-end only
  • Hotspot change failure rate
  • Flake rate in confidence-critical paths

AI impact

AI effects on this playbook

How AI-assisted and AI-driven workflows help execution, and the ways they can make it worse.

AI can help with

Where AI tooling genuinely reduces the cost of running this playbook well.

  • Finding side-effect-heavy methods and coupling hotspots
  • Drafting targeted tests and seam extraction ideas
  • Summarizing repeated test pain from PRs and incident history

AI can make worse by

Distortions AI introduces that make the underlying problem harder to see.

  • Generating lots of shallow tests around bad structure
  • Inflating confidence dashboards without improving local understanding
  • Encouraging snapshot-like or brittle assertions that do not reduce real risk

Relationships

Connected playbooks

Failure modes this playbook tends to address, decisions behind the situation, red flags that motivate running it, and neighboring playbooks.