Skip to main content
The Hard Parts.dev
EP-17 Delivery EP Engineering Playbook
Difficulty high Owner · tech lead

Run a phased migration

Move from old to new in controlled slices, where each slice has explicit ownership, cutover criteria, rollback, and retirement of the old path.

Difficulty
high
Time horizon
multi-sprint to multi-quarter
Primary owner
tech lead
Confidence
high
At a glanceEP-17
Situation
You need to replace or move a live system without stopping delivery.
Goal
Reduce migration risk by replacing behavior incrementally instead of betting everything on one cutover.
Do not use when
the target system is still conceptually undefined
Primary owner
tech lead
Roles involved

tech leadarchitectdelivery leadservice ownerQA or quality leadoperations or platform ownerproduct owner if user-facing impact exists

Context

The situation

Deciding whether to reach for this playbook: when it fits, and when it doesn't.

Use when

Conditions where this playbook is the right tool.

  • A legacy system must be replaced or decomposed
  • A monolith capability is moving into a new service or platform
  • A schema, provider, or runtime migration is unavoidable
  • The existing system is painful but still business-critical

Stakes

Why this matters

What this playbook protects against, and why skipping or half-running it tends to be expensive.

Most migrations fail because ambition outruns displacement. A phased migration keeps the team focused on moving real behavior, not just producing new code.

Quality bar

What good looks like

The observable qualities of a team or system that is actually doing this well. Not just going through the motions.

Signs of the playbook done well

  • The migration is divided into business-meaningful slices
  • Old and new paths are both observable during transition
  • Every slice has explicit entry and exit criteria
  • Teams can say what was displaced, not just what was built
  • The legacy surface shrinks over time in visible ways

Preparation

Before you start

What you need available and true before running the procedure. Skipping this is the most common reason playbooks fail.

Inputs

Material you'll want to gather first.

  • Current system map
  • Dependency inventory
  • Business-critical workflows
  • Cutover constraints
  • Rollback options
  • Operational readiness expectations

Prerequisites

Conditions that should be true for this to work.

  • Shared understanding of why migration is needed
  • Minimum observability on the current path
  • Named owners for both source and target behavior
  • Agreement on what counts as displacement

Procedure

The procedure

Each step carries its purpose (why it exists), its actions (what you do), and its outputs (what you produce). Read the purpose. It's what keeps the step from degenerating into checklist theatre.

  1. Define the migration unit

    Turn the migration into slices that move real behavior instead of abstract layers.

    Actions

    • Map the current system by business capability, not only by code structure
    • Identify the smallest slice that can be moved and verified independently
    • Write down what remains in the old system after that slice moves

    Outputs

    • Migration slice map
    • First slice definition
  2. Make legacy behavior explicit

    Avoid losing hidden behavior that nobody remembered until production proves it mattered.

    Actions

    • Capture current inputs, outputs, side effects, edge cases, and operational expectations
    • Review incident history for hidden assumptions
    • Identify consumers that depend on undocumented behavior

    Outputs

    • Behavior inventory
    • Known edge-case list
  3. Design cutover and coexistence

    Ensure the team can compare, phase, and reverse safely.

    Actions

    • Define how old and new paths will coexist
    • Decide what traffic, data, or users move first
    • Define rollback conditions and rollback mechanics

    Outputs

    • Cutover strategy
    • Rollback approach
    • Coexistence model
  4. Move one slice and retire one slice

    Prevent endless parallel ownership.

    Actions

    • Implement the target slice
    • Validate parity and operational behavior
    • Retire or isolate the old slice explicitly once confidence is sufficient

    Outputs

    • Migrated slice
    • Legacy retirement note
  5. Measure displacement, not activity

    Keep the program honest.

    Actions

    • Track legacy dependency reduction
    • Track which users, workflows, or traffic moved
    • Review whether old operational load actually fell

    Outputs

    • Displacement dashboard
    • Migration progress review

Judgment

Judgment calls and pitfalls

The places where execution actually diverges: decisions that need thought, questions worth asking, and mistakes that recur regardless of good intent.

Decision points

Moments where judgment and trade-offs matter more than procedure.

  • What is the first slice: technical seam or business capability?
  • Do we mirror behavior first or improve behavior during the move?
  • Can old and new paths run in parallel, or do we need progressive cutover?
  • What level of parity is required before retirement?

Questions worth asking

Prompts to use on yourself, the team, or an AI assistant while running the procedure.

  • What is the smallest migration slice that removes real legacy responsibility?
  • What hidden behaviors does the current system provide that nobody wrote down?
  • How will we prove this slice is displaced rather than duplicated?

Common mistakes

Patterns that surface across teams running this playbook.

  • Making the slice too technical and not user- or behavior-meaningful
  • Adding net-new product ambition before parity is reached
  • Measuring generated code instead of displaced legacy behavior
  • Keeping the old path alive indefinitely out of vague caution
  • Assuming undocumented legacy behavior is accidental

Warning signs you are doing it wrong

Signals that the playbook is being executed but not landing.

  • Nobody can name what has been retired so far
  • The new system keeps expanding in scope while the old one stays fully alive
  • Migration status sounds impressive but no user or workload has actually moved
  • Teams describe the migration using architecture language only, not business behavior

Outcomes

Outcomes and signals

What should exist after the playbook runs, how you'll know it worked, and what to watch for over time.

Artifacts to produce

Durable outputs the playbook should leave behind.

  • Migration slice map
  • Behavior parity checklist
  • Cutover plan
  • Rollback plan
  • Legacy retirement record

Success signals

Observable changes that mean the playbook landed.

  • Specific legacy paths are retired on a steady cadence
  • Support and operational burden drops on the source system
  • Cutover decisions are made with evidence, not optimism
  • Stakeholders can see what has actually moved

Follow-up actions

Moves that keep the playbook's effects compounding after it finishes.

  • Clean up coexistence code quickly after each slice
  • Review whether the target design still fits reality after each major slice
  • Update ownership and runbooks as the center of gravity shifts

Metrics or signals to watch

Longer-horizon indicators that the underlying problem is receding.

  • Legacy dependency count
  • Percentage of traffic or workflows on new path
  • Rollback frequency
  • Incidents caused by parity gaps
  • Time spent maintaining dual paths

AI impact

AI effects on this playbook

How AI-assisted and AI-driven workflows help execution, and the ways they can make it worse.

AI can help with

Where AI tooling genuinely reduces the cost of running this playbook well.

  • Mapping legacy code paths and dependencies
  • Summarizing incident history around the old system
  • Generating migration checklists and parity matrices
  • Finding likely consumers of hidden behaviors

AI can make worse by

Distortions AI introduces that make the underlying problem harder to see.

  • Accelerating speculative replacement code before behavior is understood
  • Making parallel systems grow faster than the team can validate
  • Creating false confidence through polished migration documentation

Relationships

Connected playbooks

Failure modes this playbook tends to address, decisions behind the situation, red flags that motivate running it, and neighboring playbooks.