Start a rewrite safely · thehardparts.dev

Difficulty: high
Time horizon: 2 to 6 weeks to frame safely before broad commitment
Primary owner: architect
Confidence: high

At a glanceEP-18

Situation: The current system is painful enough that a rewrite is being seriously considered.
Goal: Prevent a justified frustration from becoming an uncontrolled second system.
Do not use when: the rewrite is mostly a morale move
Primary owner: architect
Roles involved: tech leadarchitectengineering managerproduct owner or sponsorsenior maintainers of legacy system

Context

The situation

Deciding whether to reach for this playbook: when it fits, and when it doesn't.

Use when

Conditions where this playbook is the right tool.

The team keeps proposing a fresh start
Maintenance pain is real and recurring
Architectural debt is affecting delivery materially
The current system resists safe change

Stakes

Why this matters

What this playbook protects against, and why skipping or half-running it tends to be expensive.

Rewrites are seductive because pain is real. The failure usually comes from vague success criteria, hidden parity obligations, and optimism about forgotten legacy behavior.

Quality bar

What good looks like

The observable qualities of a team or system that is actually doing this well. Not just going through the motions.

Signs of the playbook done well

The rewrite has a sharply defined problem statement
The first displaced slice is known before broad implementation begins
The team can explain what will not be rebuilt
The old system is treated as a behavior inventory, not only a code smell
Leadership understands cost, overlap, and coexistence risk

Preparation

Before you start

What you need available and true before running the procedure. Skipping this is the most common reason playbooks fail.

Inputs

Material you'll want to gather first.

Top maintenance pain points
Incident history
Dependency map
Delivery friction analysis
Stakeholder expectations
Legacy behavior inventory

Prerequisites

Conditions that should be true for this to work.

Honest diagnosis of current pain
Explicit rewrite sponsor
Access to people who know the old system well
Willingness to reject the rewrite if the case is weak

Procedure

The procedure

Each step carries its purpose (why it exists), its actions (what you do), and its outputs (what you produce). Read the purpose. It's what keeps the step from degenerating into checklist theatre.

01
Name the actual reasons
Separate real causes from emotional shorthand.
Actions
- List the concrete failures of the current system
- Group them into structure, delivery, operations, and ownership issues
- Test whether each issue truly requires a rewrite or could be addressed incrementally
Outputs
- Rewrite problem statement
- Pain-to-cause map
02
Inventory what must survive
Expose the hidden parity burden early.
Actions
- Identify critical workflows, edge cases, contracts, and operational dependencies
- Review legacy incidents for behaviors that matter more than code elegance
- Capture business behaviors that users assume even if engineers dislike them
Outputs
- Must-preserve inventory
- Legacy behavior map
03
Define first displacement before full build
Prevent open-ended replacement efforts.
Actions
- Name the first slice that will move and how it will be proven live
- Define the minimum architecture needed for that slice
- Write down what is intentionally not in scope
Outputs
- First displacement slice
- Excluded scope list
04
Constrain ambition
Stop the rewrite becoming a wishlist.
Actions
- Separate parity work from improvement work
- Ban unrelated modernization goals unless they are required by the slice
- Set explicit criteria for when net-new scope may be added
Outputs
- Rewrite guardrails
- Scope constraints
05
Approve only with a migration model
Tie the rewrite to displacement reality.
Actions
- Show how the old system shrinks over time
- Define coexistence and rollback assumptions
- Agree on success metrics tied to retirement, not output
Outputs
- Rewrite approval brief
- Migration model

Judgment

Judgment calls and pitfalls

The places where execution actually diverges: decisions that need thought, questions worth asking, and mistakes that recur regardless of good intent.

Decision points

Moments where judgment and trade-offs matter more than procedure.

Is this truly a rewrite candidate or a refactor candidate?
What are the non-negotiable legacy behaviors?
What is the first slice that proves the rewrite is real?
What scope is explicitly banned until parity is proven?

Questions worth asking

Prompts to use on yourself, the team, or an AI assistant while running the procedure.

What exact pain are we rewriting to solve?
What legacy behaviors would break customers if we forgot them?
What is the first thing we will actually retire?

Common mistakes

Patterns that surface across teams running this playbook.

Starting architecture work before defining the first displacement slice
Using the rewrite to also fix every adjacent problem
Treating ugly legacy behavior as unimportant because it is hard to defend aesthetically
Telling leadership the rewrite will simplify everything quickly

Warning signs you are doing it wrong

Signals that the playbook is being executed but not landing.

The rewrite is described as cleanup, modernization, platform reset, and product acceleration all at once
The team cannot state what will be turned off first
The architecture is getting clearer faster than the migration path
Legacy behavior is dismissed with phrases like 'we probably do not need that'

Outcomes

Outcomes and signals

What should exist after the playbook runs, how you'll know it worked, and what to watch for over time.

Artifacts to produce

Durable outputs the playbook should leave behind.

Rewrite problem statement
Must-preserve inventory
First-slice plan
Scope guardrails
Migration approval brief

Success signals

Observable changes that mean the playbook landed.

The rewrite has a sharply limited early scope
Stakeholders understand what will not be rebuilt yet
The team can point to explicit displacement milestones
The rewrite is framed as migration, not just construction

Follow-up actions

Moves that keep the playbook's effects compounding after it finishes.

Review the rewrite case after the first slice rather than locking all assumptions up front
Kill or reduce the rewrite if the first displacement proves weaker than expected
Refresh the preserve-inventory after every major incident or discovery

Metrics or signals to watch

Longer-horizon indicators that the underlying problem is receding.

Time to first displaced slice
Ratio of parity work to net-new ambition
Number of legacy behaviors discovered late
Dual-run duration

AI impact

AI effects on this playbook

How AI-assisted and AI-driven workflows help execution, and the ways they can make it worse.

AI can help with

Where AI tooling genuinely reduces the cost of running this playbook well.

Summarizing maintenance pain from historical tickets and incidents
Mapping legacy dependencies and hidden contracts
Highlighting code hotspots and behavior clusters
Drafting parity and cutover checklists

AI can make worse by

Distortions AI introduces that make the underlying problem harder to see.

Making it too easy to generate a large shiny replacement quickly
Giving teams false confidence that understanding has caught up with output
Hiding ambiguity behind confident rewrite documents

Relationships

Connected playbooks

Failure modes this playbook tends to address, decisions behind the situation, red flags that motivate running it, and neighboring playbooks.