Playbooks
Practical playbooks for recurring engineering situations: delivery, architecture, operations, team work, and AI adoption. Operational references you can actually execute, not motivational posts.
Entries
40
Classes
05
Classes
- delivery
- team
- architecture
- operations
- ai
Every entry is a structured procedure for a named situation: when to use it, when not to, what good looks like, the actual steps (with their purpose and outputs), common mistakes, artifacts, and success signals. Card weight reflects difficulty. Harder playbooks darken the card.
-
Difficulty key
- low
- medium
- medium-high
- high
Chip = card fill on the grid.
-
Confidence
How sure we are the playbook works as written across teams and contexts: provisional vs. repeatedly validated in real engineering work.
- low
- medium
- medium-high
- high
Delivery
08 playbooksRun a phased migration
Move from old to new in controlled slices, where each slice has explicit ownership, cutover criteria, rollback, and retirement of the old path.
Start a rewrite safely
Before approving a rewrite, force clarity on the actual problem, what must be preserved, what will be displaced first, and how success will be measured beyond cleaner code.
De-risk a risky release
Reduce risk by clarifying what could fail, shrinking rollout scope, tightening observability, and defining credible rollback before the release becomes a test of nerve.
Recover a slipping project
Recover by replacing optimism and activity reporting with a sharp picture of reality: what is blocked, what still matters, what can move, and what must be cut or re-sequenced.
Re-scope without lying
Cut or reshape scope by making the trade-off explicit, preserving the core value, and naming what is no longer promised rather than hiding the loss inside ambiguity.
Handle a high-dependency delivery
Treat coordination as real work: make dependencies visible, assign owners, define handoff quality, and avoid pretending the schedule is local when the delivery is not.
Turn recurring urgent work into managed work
Convert chronic urgency into a known workstream by categorizing recurrence, pricing the interruption cost, and building preventive work into normal planning rather than treating every recurrence as exceptional.
Stabilize a fragile service
Stabilize by making the service observable, reducing risky change surfaces, clarifying ownership, and fixing the few failure drivers that create most of the pain before chasing architectural perfection.
Team
08 playbooksOnboard a new engineer well
Turn onboarding into a deliberate path from orientation to safe contribution, with clear context, early wins, visible ownership, and durable learning artifacts.
Spread knowledge out of one expert
Reduce hero dependence by moving operational, technical, and decision knowledge into the team through structured transfer, shared execution, and durable artifacts.
Re-establish ownership in a blurry area
Restore ownership by clarifying decision rights, stewardship expectations, operating responsibility, and visible accountability in a system area that currently lives in ambiguity.
Run a healthy engineering retrospective
Use retrospectives to identify patterns, distinguish local from systemic issues, and produce a small number of meaningful actions or escalations instead of repeating ritual complaints.
Repair trust after a painful incident
Repair trust by making the event intelligible, changing the conditions that produced it, and demonstrating through behavior that the team is safer, more honest, and more accountable than before.
Facilitate a difficult technical disagreement
Turn argument into decision quality by clarifying the actual choice, separating evidence from identity, and designing a process where trade-offs are compared explicitly rather than won socially.
Reduce review bottlenecks without lowering quality
Reduce bottlenecks by changing review shape, diff quality, ownership, and review expectations-not by silently lowering the bar or converting review into a rubber stamp.
Recover a team stuck in reactive mode
Recover by making interruption load visible, reducing hidden priority channels, protecting planning capacity, and converting repeated reactive patterns into owned, managed work.
Architecture
08 playbooksReview service boundaries
Review service boundaries by looking at change patterns, ownership reality, dependency shape, and runtime behavior - not just diagrams or intended architecture.
Reduce change fan-out
Reduce fan-out by finding the responsibilities and contracts that make ordinary changes travel too far, then redesigning for locality of change instead of only tidier structure.
Make an integration contract explicit
Turn accidental behavior into an explicit contract by naming what is guaranteed, what is incidental, who owns compatibility, and how consumers will learn about change.
Improve testability without stopping delivery
Improve testability incrementally by adding seams, isolating side effects, simplifying change hotspots, and rebalancing confidence creation across layers while real work continues.
Choose where business rules should live
Choose rule placement by deciding where the authoritative truth belongs, what other layers may mirror or guard, and how the team will prevent behavior drift across boundaries.
Audit a shared layer for accidental complexity
Audit the shared layer by testing whether it serves real repeated needs, has clear consumers, and simplifies product work - or whether it has become a prestige dumping ground for abstracted uncertainty.
Design a safe rollout path
Design rollout as a control system: decide how to limit blast radius, observe early effects, pause or reverse safely, and learn from each stage before widening exposure.
Refactor a dangerous hotspot
Refactor a hotspot by targeting the specific reasons it is dangerous: high churn, poor testability, unclear ownership, or oversized responsibility - and improving it in narrow, repeatable steps.
Operations
08 playbooksRun an incident review that actually helps
Turn an incident review into a system-learning exercise that explains what happened, why it made sense at the time, what conditions enabled it, and what changes will reduce recurrence.
Write a useful runbook
Write a runbook for real operational use: quick orientation, clear triggers, diagnostic paths, safe actions, escalation criteria, and links to trustworthy deeper context.
Build a practical rollback strategy
Design rollback as a practical recovery system, not a comforting word: understand what can be reversed, what cannot, how quickly, by whom, and under what evidence thresholds.
Reduce operational dependence on heroes
Make operations less person-fragile by exposing where hero dependence exists, redistributing capability through practice and artifacts, and improving the system conditions that keep creating heroics.
Create meaningful alerts
Design alerts around actionable operational meaning: what is wrong, who should care, how urgent it is, and what first action or investigation path should follow.
Triage operational debt
Triage operational debt by identifying recurring pain patterns, ranking them by impact and drag, and choosing what to eliminate, reduce, contain, or deliberately tolerate for now.
Prepare a handover properly
Prepare a handover as a transfer of capability and accountability, not just files and meetings. Make sure the receiving side can actually operate, decide, and recover safely.
Improve release confidence
Improve release confidence by strengthening the system of evidence, controls, ownership, and recovery around deployment-not by asking people to feel calmer about the same fragile process.
Ai
08 playbooksUpgrade code review for AI-assisted work
Redesign review so that AI-assisted changes are judged by risk, understanding, and behavioral correctness, not by surface polish or author confidence.
Define safe AI development zones
Create explicit zones of safe, cautious, and restricted AI use so the team can move fast where the cost of error is low and stay deliberate where the risk is structural, legal, operational, or hard to detect.
Introduce AI tools without synthetic velocity
Adopt AI by measuring whether it improves durable outcomes like clarity, cycle time, quality, and system understanding - not just output volume, ticket count, or code throughput.
Build a grounded RAG system
Design RAG around source trust, retrieval quality, task fit, and answer behavior so that citation and grounding mean something more than 'the model found text nearby'.
Evaluate an AI feature against real tasks
Evaluate the feature against real user jobs, realistic failure patterns, and operational constraints so the team learns whether the system actually helps, not just whether it performs well on curated examples.
Design human review that is not rubber-stamping
Design the human-in-the-loop step so the human still adds real judgment: enough context, enough authority, enough time, and clear criteria for when to intervene, reject, or escalate.
Detect AI drift before users do
Build a drift-detection system based on task slices, baseline behavior, and operational signals so changes in model quality are noticed intentionally rather than through user frustration or vague team instinct.
Create AI usage norms for a team
Create team-level norms that define how AI is used, disclosed, reviewed, challenged, and learned from so that the team behaves intentionally rather than drifting into private habit systems.