Refactor a dangerous hotspot · thehardparts.dev

Difficulty: high
Time horizon: days to months depending on hotspot severity
Primary owner: maintainer
Confidence: high

At a glanceEP-16

Situation: A frequently changing or high-risk code area is slowing delivery and raising risk.
Goal: Make a risky high-churn area safer to change without turning refactoring into a disguised rewrite.
Do not use when: the hotspot is rarely touched and the team is optimizing aesthetics over risk
Primary owner: maintainer
Roles involved: maintainertech leadreviewers familiar with the areaQA or quality partner if test pain is severe

Context

The situation

Deciding whether to reach for this playbook: when it fits, and when it doesn't.

Use when

Conditions where this playbook is the right tool.

One file, module, or service keeps appearing in risky changes and incidents
Teams avoid an area because it is brittle
Review and validation cost is disproportionately high there
The hotspot is clearly slowing delivery repeatedly

Stakes

Why this matters

What this playbook protects against, and why skipping or half-running it tends to be expensive.

Hotspots attract risk because they are already where change, complexity, and ambiguity meet. Improving them produces outsized payoff, but only if the refactor is tied to real recurring pain rather than cleanup ambition.

Quality bar

What good looks like

The observable qualities of a team or system that is actually doing this well. Not just going through the motions.

Signs of the playbook done well

The hotspot becomes easier to understand and test
Changes in the area get smaller and more local
Ownership and purpose become clearer
Fear around modifying the area declines
Incident and regression frequency tied to the hotspot drops

Preparation

Before you start

What you need available and true before running the procedure. Skipping this is the most common reason playbooks fail.

Inputs

Material you'll want to gather first.

Change hotspot analysis
Incident history
Review pain points
Current tests and seams
Ownership map
Recent diffs touching the hotspot

Prerequisites

Conditions that should be true for this to work.

The hotspot is evidenced by real change or incident patterns
The team can reserve some refactoring capacity during delivery
Someone owns the area strongly enough to guide the work

Procedure

The procedure

Each step carries its purpose (why it exists), its actions (what you do), and its outputs (what you produce). Read the purpose. It's what keeps the step from degenerating into checklist theatre.

01
Define why this hotspot is dangerous
Avoid vague cleanup instincts.
Actions
- Review incidents, painful PRs, and high-churn changes tied to the hotspot
- Classify the pain as testability, coupling, oversized responsibility, unclear intent, or hidden side effects
- Write the top 2 to 3 recurring harms
Outputs
- Hotspot danger profile
02
Pick the smallest refactoring seam
Improve safely without exploding scope.
Actions
- Identify a narrow seam where logic, state, or responsibility can be separated
- Choose a move that makes the next similar change easier
- Avoid broad restructuring before the first safe cut
Outputs
- Refactor seam plan
03
Add confidence around the seam
Protect the hotspot while changing it.
Actions
- Add targeted tests or observational checks around key behavior
- Capture edge cases that commonly break
- Define how you will know the hotspot actually improved
Outputs
- Hotspot confidence pack
04
Refactor in production-relevant slices
Keep the work anchored to real behavior and real delivery.
Actions
- Make one structural improvement at a time
- Ship and review the next similar change to see whether the pain reduced
- Keep a visible list of what got safer and what remains dangerous
Outputs
- Refactor progress log
05
Decide whether to continue, stop, or escalate
Prevent endless hotspot work with unclear returns.
Actions
- Review whether risk and cost are going down
- Stop if marginal refactoring value has flattened
- Escalate to broader architecture work only if hotspot evidence points there
Outputs
- Hotspot review decision

Judgment

Judgment calls and pitfalls

The places where execution actually diverges: decisions that need thought, questions worth asking, and mistakes that recur regardless of good intent.

Decision points

Moments where judgment and trade-offs matter more than procedure.

What makes this hotspot truly dangerous?
What seam is small enough to improve safely?
What evidence would show the hotspot is actually getting better?
When does hotspot refactoring become broader redesign?

Questions worth asking

Prompts to use on yourself, the team, or an AI assistant while running the procedure.

Why is this hotspot actually dangerous today?
What is the smallest seam that makes the next change easier?
How will we know the hotspot is genuinely safer, not just cleaner-looking?

Common mistakes

Patterns that surface across teams running this playbook.

Refactoring the whole area instead of one seam
Cleaning for aesthetics instead of recurring pain
Skipping targeted confidence because the team is eager to improve structure
Declaring victory because the code looks nicer while the next change is still painful

Warning signs you are doing it wrong

Signals that the playbook is being executed but not landing.

The refactor grows in scope faster than the hotspot risk is shrinking
The same kind of change remains just as hard after several rounds
Nobody can say what specific danger was reduced
The hotspot is now cleaner-looking but still avoided by the team

Outcomes

Outcomes and signals

What should exist after the playbook runs, how you'll know it worked, and what to watch for over time.

Artifacts to produce

Durable outputs the playbook should leave behind.

Hotspot danger profile
Refactor seam plan
Hotspot confidence pack
Refactor progress log
Hotspot review decision

Success signals

Observable changes that mean the playbook landed.

The next similar change is easier and smaller
Review and validation burden decreases
More engineers can work in the hotspot safely
Hotspot-driven regressions decline

Follow-up actions

Moves that keep the playbook's effects compounding after it finishes.

Repeat on the next highest-value seam if evidence supports it
Promote repeated hotspot classes into boundary or ownership work
Update coding and review expectations around the area

Metrics or signals to watch

Longer-horizon indicators that the underlying problem is receding.

Change size in the hotspot over time
Review time for hotspot changes
Hotspot-related defect rate
Number of engineers making safe changes there

AI impact

AI effects on this playbook

How AI-assisted and AI-driven workflows help execution, and the ways they can make it worse.

AI can help with

Where AI tooling genuinely reduces the cost of running this playbook well.

Spotting churn and coupling patterns in hotspot code
Suggesting seam candidates and extracting repeated structures
Drafting targeted tests around known risky behavior

AI can make worse by

Distortions AI introduces that make the underlying problem harder to see.

Encouraging overly broad automated refactors
Making cosmetic cleanup look like architectural progress
Introducing new subtle inconsistencies in already risky code

Relationships

Connected playbooks

Failure modes this playbook tends to address, decisions behind the situation, red flags that motivate running it, and neighboring playbooks.