Skip to main content
The Hard Parts.dev
EP · Engineering Playbook Issue 01

Playbooks

Practical playbooks for recurring engineering situations: delivery, architecture, operations, team work, and AI adoption. Operational references you can actually execute, not motivational posts.

Entries

40

Classes

05

Classes

  • delivery
  • team
  • architecture
  • operations
  • ai

Every entry is a structured procedure for a named situation: when to use it, when not to, what good looks like, the actual steps (with their purpose and outputs), common mistakes, artifacts, and success signals. Card weight reflects difficulty. Harder playbooks darken the card.

  • Difficulty key

    • low
    • medium
    • medium-high
    • high

    Chip = card fill on the grid.

  • Confidence

    How sure we are the playbook works as written across teams and contexts: provisional vs. repeatedly validated in real engineering work.

    • low
    • medium
    • medium-high
    • high
Playbook set 01

Delivery

08 playbooks
EP-17 Delivery

Run a phased migration

Move from old to new in controlled slices, where each slice has explicit ownership, cutover criteria, rollback, and retirement of the old path.

tech lead
EP-18 Delivery

Start a rewrite safely

Before approving a rewrite, force clarity on the actual problem, what must be preserved, what will be displaced first, and how success will be measured beyond cleaner code.

architect
EP-19 Delivery

De-risk a risky release

Reduce risk by clarifying what could fail, shrinking rollout scope, tightening observability, and defining credible rollback before the release becomes a test of nerve.

release owner
EP-20 Delivery

Recover a slipping project

Recover by replacing optimism and activity reporting with a sharp picture of reality: what is blocked, what still matters, what can move, and what must be cut or re-sequenced.

delivery lead
EP-21 Delivery

Re-scope without lying

Cut or reshape scope by making the trade-off explicit, preserving the core value, and naming what is no longer promised rather than hiding the loss inside ambiguity.

product lead
EP-22 Delivery

Handle a high-dependency delivery

Treat coordination as real work: make dependencies visible, assign owners, define handoff quality, and avoid pretending the schedule is local when the delivery is not.

delivery lead
EP-23 Delivery

Turn recurring urgent work into managed work

Convert chronic urgency into a known workstream by categorizing recurrence, pricing the interruption cost, and building preventive work into normal planning rather than treating every recurrence as exceptional.

engineering manager
EP-24 Delivery

Stabilize a fragile service

Stabilize by making the service observable, reducing risky change surfaces, clarifying ownership, and fixing the few failure drivers that create most of the pain before chasing architectural perfection.

service owner
Playbook set 02

Team

08 playbooks
EP-33 Team

Onboard a new engineer well

Turn onboarding into a deliberate path from orientation to safe contribution, with clear context, early wins, visible ownership, and durable learning artifacts.

engineering manager
EP-34 Team

Spread knowledge out of one expert

Reduce hero dependence by moving operational, technical, and decision knowledge into the team through structured transfer, shared execution, and durable artifacts.

engineering manager
EP-35 Team

Re-establish ownership in a blurry area

Restore ownership by clarifying decision rights, stewardship expectations, operating responsibility, and visible accountability in a system area that currently lives in ambiguity.

engineering manager
EP-36 Team

Run a healthy engineering retrospective

Use retrospectives to identify patterns, distinguish local from systemic issues, and produce a small number of meaningful actions or escalations instead of repeating ritual complaints.

facilitator
EP-37 Team

Repair trust after a painful incident

Repair trust by making the event intelligible, changing the conditions that produced it, and demonstrating through behavior that the team is safer, more honest, and more accountable than before.

engineering manager
EP-38 Team

Facilitate a difficult technical disagreement

Turn argument into decision quality by clarifying the actual choice, separating evidence from identity, and designing a process where trade-offs are compared explicitly rather than won socially.

decision owner
EP-39 Team

Reduce review bottlenecks without lowering quality

Reduce bottlenecks by changing review shape, diff quality, ownership, and review expectations-not by silently lowering the bar or converting review into a rubber stamp.

tech lead
EP-40 Team

Recover a team stuck in reactive mode

Recover by making interruption load visible, reducing hidden priority channels, protecting planning capacity, and converting repeated reactive patterns into owned, managed work.

engineering manager
Playbook set 03

Architecture

08 playbooks
EP-09 Architecture

Review service boundaries

Review service boundaries by looking at change patterns, ownership reality, dependency shape, and runtime behavior - not just diagrams or intended architecture.

architect
EP-10 Architecture

Reduce change fan-out

Reduce fan-out by finding the responsibilities and contracts that make ordinary changes travel too far, then redesigning for locality of change instead of only tidier structure.

tech lead
EP-11 Architecture

Make an integration contract explicit

Turn accidental behavior into an explicit contract by naming what is guaranteed, what is incidental, who owns compatibility, and how consumers will learn about change.

producer owner
EP-12 Architecture

Improve testability without stopping delivery

Improve testability incrementally by adding seams, isolating side effects, simplifying change hotspots, and rebalancing confidence creation across layers while real work continues.

tech lead
EP-13 Architecture

Choose where business rules should live

Choose rule placement by deciding where the authoritative truth belongs, what other layers may mirror or guard, and how the team will prevent behavior drift across boundaries.

tech lead
EP-14 Architecture

Audit a shared layer for accidental complexity

Audit the shared layer by testing whether it serves real repeated needs, has clear consumers, and simplifies product work - or whether it has become a prestige dumping ground for abstracted uncertainty.

platform or shared-layer owner
EP-15 Architecture

Design a safe rollout path

Design rollout as a control system: decide how to limit blast radius, observe early effects, pause or reverse safely, and learn from each stage before widening exposure.

tech lead
EP-16 Architecture

Refactor a dangerous hotspot

Refactor a hotspot by targeting the specific reasons it is dangerous: high churn, poor testability, unclear ownership, or oversized responsibility - and improving it in narrow, repeatable steps.

maintainer
Playbook set 04

Operations

08 playbooks
EP-25 Operations

Run an incident review that actually helps

Turn an incident review into a system-learning exercise that explains what happened, why it made sense at the time, what conditions enabled it, and what changes will reduce recurrence.

incident lead
EP-26 Operations

Write a useful runbook

Write a runbook for real operational use: quick orientation, clear triggers, diagnostic paths, safe actions, escalation criteria, and links to trustworthy deeper context.

service owner
EP-27 Operations

Build a practical rollback strategy

Design rollback as a practical recovery system, not a comforting word: understand what can be reversed, what cannot, how quickly, by whom, and under what evidence thresholds.

release owner
EP-28 Operations

Reduce operational dependence on heroes

Make operations less person-fragile by exposing where hero dependence exists, redistributing capability through practice and artifacts, and improving the system conditions that keep creating heroics.

engineering manager
EP-29 Operations

Create meaningful alerts

Design alerts around actionable operational meaning: what is wrong, who should care, how urgent it is, and what first action or investigation path should follow.

service owner
EP-30 Operations

Triage operational debt

Triage operational debt by identifying recurring pain patterns, ranking them by impact and drag, and choosing what to eliminate, reduce, contain, or deliberately tolerate for now.

engineering manager
EP-31 Operations

Prepare a handover properly

Prepare a handover as a transfer of capability and accountability, not just files and meetings. Make sure the receiving side can actually operate, decide, and recover safely.

sending owner and receiving owner jointly
EP-32 Operations

Improve release confidence

Improve release confidence by strengthening the system of evidence, controls, ownership, and recovery around deployment-not by asking people to feel calmer about the same fragile process.

release owner
Playbook set 05

Ai

08 playbooks
EP-01 Ai

Upgrade code review for AI-assisted work

Redesign review so that AI-assisted changes are judged by risk, understanding, and behavioral correctness, not by surface polish or author confidence.

tech lead
EP-02 Ai

Define safe AI development zones

Create explicit zones of safe, cautious, and restricted AI use so the team can move fast where the cost of error is low and stay deliberate where the risk is structural, legal, operational, or hard to detect.

tech lead
EP-03 Ai

Introduce AI tools without synthetic velocity

Adopt AI by measuring whether it improves durable outcomes like clarity, cycle time, quality, and system understanding - not just output volume, ticket count, or code throughput.

engineering manager
EP-04 Ai

Build a grounded RAG system

Design RAG around source trust, retrieval quality, task fit, and answer behavior so that citation and grounding mean something more than 'the model found text nearby'.

AI engineer
EP-05 Ai

Evaluate an AI feature against real tasks

Evaluate the feature against real user jobs, realistic failure patterns, and operational constraints so the team learns whether the system actually helps, not just whether it performs well on curated examples.

evaluation owner
EP-06 Ai

Design human review that is not rubber-stamping

Design the human-in-the-loop step so the human still adds real judgment: enough context, enough authority, enough time, and clear criteria for when to intervene, reject, or escalate.

workflow designer
EP-07 Ai

Detect AI drift before users do

Build a drift-detection system based on task slices, baseline behavior, and operational signals so changes in model quality are noticed intentionally rather than through user frustration or vague team instinct.

evaluation owner
EP-08 Ai

Create AI usage norms for a team

Create team-level norms that define how AI is used, disclosed, reviewed, challenged, and learned from so that the team behaves intentionally rather than drifting into private habit systems.

engineering manager