Skip to main content
The Hard Parts.dev
EP-37 Team EP Engineering Playbook
Difficulty high Owner · engineering manager

Repair trust after a painful incident

Repair trust by making the event intelligible, changing the conditions that produced it, and demonstrating through behavior that the team is safer, more honest, and more accountable than before.

Difficulty
high
Time horizon
days for immediate repair work, weeks to months for durable trust recovery
Primary owner
engineering manager
Confidence
high
At a glanceEP-37
Situation
A serious incident damaged trust inside the team, with leadership, or with users.
Goal
Restore credible working trust after an incident without reducing the response to blame theater or PR language.
Do not use when
the system is still in the middle of the live incident response
Primary owner
engineering manager
Roles involved

engineering managertech leadincident leadteam members involvedstakeholder or partner leads when trust needs repair across boundaries

Context

The situation

Deciding whether to reach for this playbook: when it fits, and when it doesn't.

Use when

Conditions where this playbook is the right tool.

  • A major outage, release failure, or avoidable incident occurred
  • The team is carrying shame, blame, fear, or distrust afterward
  • Leadership or partner teams no longer trust operational confidence
  • The incident changed social as well as technical conditions

Stakes

Why this matters

What this playbook protects against, and why skipping or half-running it tends to be expensive.

Incidents damage more than uptime. They damage team belief, leadership confidence, and willingness to trust future commitments. Trust is not repaired by apology alone; it is repaired by clearer truth and better conditions.

Quality bar

What good looks like

The observable qualities of a team or system that is actually doing this well. Not just going through the motions.

Signs of the playbook done well

  • People can explain what happened without turning the story into heroics or blame
  • The team distinguishes technical causes from organizational conditions
  • Leaders and partners see visible changes tied to the incident
  • The same failure pattern becomes less likely, not just better narrated
  • The team becomes more candid, not more defensive

Preparation

Before you start

What you need available and true before running the procedure. Skipping this is the most common reason playbooks fail.

Inputs

Material you'll want to gather first.

  • Incident timeline
  • Operational evidence
  • Communication history
  • Affected users or stakeholders
  • Incident review outputs
  • Current trust fracture areas

Prerequisites

Conditions that should be true for this to work.

  • The incident is contained
  • Basic facts are available
  • Someone is willing to lead a non-defensive repair process

Procedure

The procedure

Each step carries its purpose (why it exists), its actions (what you do), and its outputs (what you produce). Read the purpose. It's what keeps the step from degenerating into checklist theatre.

  1. Stabilize the emotional and operational aftermath

    Create enough safety and clarity for honest recovery.

    Actions

    • Separate immediate operational follow-up from trust repair conversations
    • Acknowledge impact clearly to affected people
    • Avoid premature blame assignment or performance narratives

    Outputs

    • Initial repair frame
  2. Build a truthful shared account

    Replace rumor, shame, or defensive simplification with reality.

    Actions

    • Reconstruct the incident timeline clearly
    • Name both technical and organizational contributors
    • Identify where the team was misled, overloaded, or unsupported

    Outputs

    • Shared incident account
  3. Identify the trust breaks

    Understand what kind of trust was damaged.

    Actions

    • Separate confidence in the system from confidence in the team’s behavior
    • Identify whether trust broke inside the team, upward, outward, or all three
    • Ask what people now fear will happen again

    Outputs

    • Trust break map
  4. Make visible repairs

    Show that the system and behavior will actually change.

    Actions

    • Choose a small number of changes with direct trust value
    • Fix misleading status patterns, ownership confusion, rollback weakness, or alert gaps
    • Communicate the change in terms of reduced uncertainty, not just work completed

    Outputs

    • Repair action set
  5. Rebuild credibility through follow-through

    Let repeated behavior, not reassurance, repair trust.

    Actions

    • Review whether repair actions happened and helped
    • Update stakeholders on changed conditions, not just closure status
    • Watch whether fear and defensiveness actually decline

    Outputs

    • Trust recovery review

Judgment

Judgment calls and pitfalls

The places where execution actually diverges: decisions that need thought, questions worth asking, and mistakes that recur regardless of good intent.

Decision points

Moments where judgment and trade-offs matter more than procedure.

  • What type of trust actually broke?
  • Which repairs matter most for credibility right now?
  • What must be communicated broadly versus fixed quietly first?
  • Who needs to hear the truth in what level of detail?

Questions worth asking

Prompts to use on yourself, the team, or an AI assistant while running the procedure.

  • What kind of trust broke here: system trust, team trust, leadership trust, or user trust?
  • What visible repair would make the biggest difference right now?
  • What are people afraid will happen again?

Common mistakes

Patterns that surface across teams running this playbook.

  • Focusing only on root cause and ignoring relational damage
  • Using apology language without operational change
  • Reducing the incident to one person’s mistake when conditions were systemic
  • Trying to restore confidence faster than evidence justifies

Warning signs you are doing it wrong

Signals that the playbook is being executed but not landing.

  • The incident story gets cleaner as it moves upward but less true
  • The same vague promises appear in every follow-up
  • People are careful in public and bitter in private
  • Teams say trust is repaired but still behave defensively around the same risk

Outcomes

Outcomes and signals

What should exist after the playbook runs, how you'll know it worked, and what to watch for over time.

Artifacts to produce

Durable outputs the playbook should leave behind.

  • Shared incident account
  • Trust break map
  • Repair action set
  • Trust recovery review

Success signals

Observable changes that mean the playbook landed.

  • The team can discuss the incident without spiraling into blame or denial
  • Stakeholders observe visible change tied to the incident
  • Future status language becomes more truthful
  • The same fragility is reduced in practice

Follow-up actions

Moves that keep the playbook's effects compounding after it finishes.

  • Review whether trust gaps are shrinking after one month and one quarter
  • Promote systemic issues into ownership, release, or architecture work
  • Update onboarding and operating docs with incident learnings

Metrics or signals to watch

Longer-horizon indicators that the underlying problem is receding.

  • Repeat incident rate in similar category
  • Status-reporting honesty improvements
  • Stakeholder confidence signals
  • Team willingness to escalate early

AI impact

AI effects on this playbook

How AI-assisted and AI-driven workflows help execution, and the ways they can make it worse.

AI can help with

Where AI tooling genuinely reduces the cost of running this playbook well.

  • Assembling timelines from logs, chats, and tickets
  • Summarizing technical and organizational factors from large evidence sets
  • Drafting clear incident account structures

AI can make worse by

Distortions AI introduces that make the underlying problem harder to see.

  • Flattening nuance into a polished but emotionally false story
  • Softening accountability language into PR language
  • Making leadership summaries cleaner than the trust damage actually is

Relationships

Connected playbooks

Failure modes this playbook tends to address, decisions behind the situation, red flags that motivate running it, and neighboring playbooks.