Rewrite vs Refactor · thehardparts.dev

Severity if wrong: high
Frequency: common
Audiences: architects · senior engineers · engineering managers
Reversibility: hard
Confidence: high

At a glanceTD-14

Really about: Risk containment, value continuity, and how much hidden system behavior still matters.
Not actually about: Whether the old code looks ugly or embarrassing.
Why it feels hard: Refactoring feels slow and compromised; rewriting feels clean but often hides displacement risk.

The decision

Should we replace this system wholesale or improve it incrementally?

Usually a migration-discipline decision, not a cleanliness decision.

Default stance

Where to start before any evidence arrives.

Prefer refactor unless rewrite has explicit migration slices and retirement logic.

Options on the table

Two poles of the trade-off

Neither is the right answer by default. Each option's conditions, strengths, costs, hidden costs, and failure modes when misused are laid out in parallel so you can read across facets.

Option A

Rewrite

Best when

Conditions where this option is a natural fit.

architecture is structurally unfit
migration slices are clear
legacy behavior can be mapped reliably
parallel investment is acceptable

Real-world fits

Concrete environments where this option has worked.

systems with irrecoverable foundational constraints and a credible migration plan
small replaceable internal tools with bounded behavior
surfaces where the legacy workload can be retired slice by slice

Strengths

What this option does well on its own terms.

cleaner foundations
chance to reset poor structure
can remove accumulated compromise

Costs

What you accept up front to get those strengths.

high migration risk
parallel system burden
longer path to equivalent business value

Hidden costs

Costs that surface later than expected — the main thing novices miss.

unknown legacy behavior surfaces late
teams often add ambition while rewriting

Failure modes when misused

How this option breaks when applied to the wrong context.

Turns into the friendly rewrite: a prestige effort with weak cutover discipline.

Option B

Refactor

Best when

Conditions where this option is a natural fit.

business continuity matters
core behavior is still valuable
teams can isolate improvements gradually
delivery pressure cannot tolerate parallel rebuild

Real-world fits

Concrete environments where this option has worked.

core business systems that must keep shipping
products with valuable but messy codebases
systems where seams for incremental change already exist

Strengths

What this option does well on its own terms.

lower immediate risk
continuous value delivery
better learning from real system behavior

Costs

What you accept up front to get those strengths.

slower visible transformation
more patience required
legacy constraints remain longer

Hidden costs

Costs that surface later than expected — the main thing novices miss.

without discipline, refactoring can become endless patching
teams may lose morale if progress feels invisible

Failure modes when misused

How this option breaks when applied to the wrong context.

Becomes endless compromise without meaningful structural improvement.

Cost, time, and reversibility

Who pays, how it ages, and what undoing it costs

Trade-offs are rarely zero-sum and rarely static. Someone pays, the payoff curve shifts with the horizon, and the decision has an undo cost.

Cost bearer

Option A · Rewrite

Who absorbs the cost

Current delivery team
Future maintainers
Business stakeholders waiting for parity

Option B · Refactor

Who absorbs the cost

Current team carrying legacy complexity
Delivery timelines through slower visible change

Time horizon

Option A · Rewrite

May win only if the new target is genuinely better and retirement is believable.

Option B · Refactor

Usually wins by preserving continuity while paying down risk incrementally.

Reversibility

What undoing costs

Hard

What should force a re-look

Trigger conditions that mean the answer may have changed.

System boundaries become clearer
Retirement path improves
Business continuity pressure changes

How to decide

The work you still have to do

The reference can frame the trade-off; only you can weight the factors against your context.

Questions to ask

Open these in the room. Answering them is most of the decision.

What does the old system do that we do not fully understand yet?
Can we name the first slice to replace or improve?
What is the cutover and retirement plan?
Are we trying to fix pain or buy hope?

Key factors

The variables that actually move the answer.

Legacy behavior complexity
Migration path clarity
Business continuity needs
Tolerance for parallel systems
Delivery pressure

Evidence needed

What to gather before committing. Not after.

Legacy behavior inventory
Migration slice map
Dependency analysis
Business continuity constraints

Signals from the ground

What's usually pushing the call, and what should

On the left, pressures to recognize and discount. On the right, signals that genuinely point toward one option or the other.

What's usually pushing the call

Pressures to recognize and discount.

Common bad reasons

Reasoning that feels convincing in the moment but doesn't hold up.

The code is ugly
We want a clean start
Rewriting will be faster than understanding the old system

Anti-patterns

Shapes of reasoning to recognize and set aside.

Starting a rewrite without parity criteria
Calling endless patching refactoring
Mixing migration with major redesign ambition

What should push the call

Concrete signals that genuinely point to one pole.

For · Rewrite

Observations that genuinely point to Option A.

Old architecture fundamentally blocks change
Migration can be sliced and measured

For · Refactor

Observations that genuinely point to Option B.

Valuable behavior still works
Delivery cannot pause
Incremental seams exist

AI impact

How AI bends this decision

Where AI accelerates the call, where it introduces new distortions, and anything else worth knowing.

AI can help with

Where AI genuinely reduces the cost of making the call.

AI can help uncover legacy behavior and identify refactor seams.

AI can make worse

Distortions AI introduces that didn't exist before.

AI makes rewriting feel faster by making code generation cheaper, but it does not reduce hidden behavior risk.

Relationships

Connected decisions

Nearby decisions this is sometimes confused with, adjacent decisions that are often entangled with this one, related failure modes, red flags, and playbooks to reach for.

Easy to confuse with

Nearby decisions and how this one differs.

TD-01 Monolith vs Microservices

That decision is about deployment topology. This one is about whether to keep evolving the current code at all.
TD-12 Scope Flexibility vs Date Certainty

That decision is about commitment axes. This one is about the path to the same outcome (replacement vs incremental improvement).
Adjacent concept A modernization decision

Modernization is a framing. This decision is which of its two concrete forms to take.