Skip to main content
The Hard Parts.dev
EP-23 Delivery EP Engineering Playbook
Difficulty medium-high Owner · engineering manager

Turn recurring urgent work into managed work

Convert chronic urgency into a known workstream by categorizing recurrence, pricing the interruption cost, and building preventive work into normal planning rather than treating every recurrence as exceptional.

Difficulty
medium-high
Time horizon
weeks to quarters
Primary owner
engineering manager
Confidence
high
At a glanceEP-23
Situation
The same urgent issues keep recurring and disrupting planned work.
Goal
Reduce reactive load and stop recurring urgent work from consuming hidden capacity forever.
Do not use when
the urgent work is a one-off crisis rather than a recurring pattern
Primary owner
engineering manager
Roles involved

engineering managertech leadoperations or support partnerdelivery lead

Context

The situation

Deciding whether to reach for this playbook: when it fits, and when it doesn't.

Use when

Conditions where this playbook is the right tool.

  • The same production or support issues recur
  • Planned work is repeatedly displaced by urgent interrupts
  • Incident or ops work is predictable in pattern but unmanaged in planning
  • The team feels permanently in catch-up mode

Stakes

Why this matters

What this playbook protects against, and why skipping or half-running it tends to be expensive.

Recurring urgent work is usually not truly urgent in the strategic sense. It is unmanaged debt. Teams get trapped when they keep paying the interrupt tax without converting it into structured improvement work.

Quality bar

What good looks like

The observable qualities of a team or system that is actually doing this well. Not just going through the motions.

Signs of the playbook done well

  • Urgent work patterns are named and categorized
  • Some capacity is explicitly reserved for recurrence reduction
  • Teams track prevention progress, not just incident handling
  • Interruptions stop being explained as random bad luck
  • Planned work becomes more stable over time

Preparation

Before you start

What you need available and true before running the procedure. Skipping this is the most common reason playbooks fail.

Inputs

Material you'll want to gather first.

  • Incident and interrupt history
  • Support escalation patterns
  • Team capacity view
  • Delivery plan
  • Recurrence cost estimate

Prerequisites

Conditions that should be true for this to work.

  • At least some interrupt history exists
  • There is a way to reserve or reprioritize capacity
  • The team can distinguish real emergencies from chronic noise

Procedure

The procedure

Each step carries its purpose (why it exists), its actions (what you do), and its outputs (what you produce). Read the purpose. It's what keeps the step from degenerating into checklist theatre.

  1. Name the recurring urgent classes

    Stop treating recurrence as randomness.

    Actions

    • Group urgent work by cause, system, and operational pattern
    • Separate true emergencies from predictable recurring issues
    • Estimate how much delivery time recurrence is consuming

    Outputs

    • Urgent work taxonomy
    • Interrupt cost estimate
  2. Choose what becomes managed work

    Promote repeat pain into normal planning.

    Actions

    • Identify the top recurring categories worth prevention effort
    • Create explicit work items for reducing recurrence
    • Assign owners and expected operational effect

    Outputs

    • Recurrence reduction backlog
  3. Protect prevention capacity

    Ensure prevention is not always displaced by the next urgent event.

    Actions

    • Reserve capacity or a dedicated lane for recurrence reduction
    • Make trade-offs explicit when prevention work is displaced
    • Escalate if urgent intake makes prevention impossible

    Outputs

    • Capacity protection model
  4. Measure recurrence down

    Show that the pattern is improving or not.

    Actions

    • Track recurrence rate by category
    • Track time lost to urgent work
    • Review whether prevention work changed the interrupt profile

    Outputs

    • Recurrence dashboard
  5. Reclassify what counts as urgent

    Prevent every recurring issue from keeping premium priority forever.

    Actions

    • Define criteria for true urgent work
    • Move known recurring issues into normal planning unless they break those criteria
    • Teach stakeholders the difference between urgent and unmanaged

    Outputs

    • Updated urgency policy

Judgment

Judgment calls and pitfalls

The places where execution actually diverges: decisions that need thought, questions worth asking, and mistakes that recur regardless of good intent.

Decision points

Moments where judgment and trade-offs matter more than procedure.

  • Which recurring issues deserve preventive investment first?
  • How much planned capacity should be protected for recurrence reduction?
  • When should a recurring issue stop being treated as urgent?

Questions worth asking

Prompts to use on yourself, the team, or an AI assistant while running the procedure.

  • Which urgent issues are actually recurring classes in disguise?
  • How much delivery time are interrupts costing us each month?
  • What prevention work keeps getting pushed out by the next urgent ask?

Common mistakes

Patterns that surface across teams running this playbook.

  • Trying to eliminate all urgent work at once
  • Tracking incidents but not recurrence classes
  • Allowing every new urgent request to displace prevention work
  • Treating recurring pain as evidence the team is just in a tough phase

Warning signs you are doing it wrong

Signals that the playbook is being executed but not landing.

  • The same categories appear in incidents and retros month after month
  • The team says we never have time to fix the root cause
  • Urgent work volume is described qualitatively but not measured
  • Stakeholders still use urgent language to bypass prioritization

Outcomes

Outcomes and signals

What should exist after the playbook runs, how you'll know it worked, and what to watch for over time.

Artifacts to produce

Durable outputs the playbook should leave behind.

  • Urgent work taxonomy
  • Interrupt cost estimate
  • Recurrence reduction backlog
  • Recurrence dashboard
  • Urgency policy

Success signals

Observable changes that mean the playbook landed.

  • Planned work survives more often
  • Repeated urgent issues decline in count or impact
  • The team can explain its interrupt load with evidence
  • Some operational pain moves from urgent to managed

Follow-up actions

Moves that keep the playbook's effects compounding after it finishes.

  • Review which recurring urgent items point to larger architectural fixes
  • Adjust staffing or ownership if recurrence is concentrated unfairly
  • Carry recurrence costs into future roadmap planning

Metrics or signals to watch

Longer-horizon indicators that the underlying problem is receding.

  • Interrupt hours per sprint or month
  • Repeat incident count by category
  • Planned work displacement rate
  • Mean time between recurring urgent events

AI impact

AI effects on this playbook

How AI-assisted and AI-driven workflows help execution, and the ways they can make it worse.

AI can help with

Where AI tooling genuinely reduces the cost of running this playbook well.

  • Clustering repeated incidents or tickets
  • Summarizing interrupt patterns from support and ops data
  • Drafting recurrence taxonomies and prevention candidates

AI can make worse by

Distortions AI introduces that make the underlying problem harder to see.

  • Making interrupt reporting cleaner without changing the underlying policy
  • Producing too many speculative prevention tasks
  • Normalizing urgency through automated escalation summaries

Relationships

Connected playbooks

Failure modes this playbook tends to address, decisions behind the situation, red flags that motivate running it, and neighboring playbooks.