The Hard Parts
Engineering reference
- Failure modes.
- Red flags.
- Trade-offs.
- Playbooks.
Not a framework.
Not a methodology.
A reference.
Software fails
the same way.
Every time.
A reference for recurring software failures, warning signals, hard decisions, and practical playbooks.
Sections
04
Entries
151
Open. Use.
- Use in retros.
- Use in reviews.
- Use in decisions.
Failure Modes
Named software-engineering failure patterns with warning signs, escalation paths, recovery moves, and AI-era distortions.
Tech Decision Trade-offs
A reference catalog for consequential engineering and product decisions where context matters more than ideology.
Red Flags Reference
Early warning signals in code, teams, process, leadership, and AI usage that suggest deeper problems may already exist or are forming.
Engineering Playbook
Practical playbooks for recurring engineering situations - delivery, architecture, operations, teamwork, and AI adoption.
Start from the question you have
From Failure Modes
A glimpse of the catalog. Each entry walks through how the pattern starts, how it escalates, what it looks like at early, mid, and late stages, and what good responses look like.
The Friendly Rewrite
A rewrite framed as cleanup becomes a long-running replacement with no stable landing zone.
The Hero Trap
One person becomes the informal system of record for critical knowledge, decisions, and rescue work.
Abstraction Addiction
The system grows more layers, indirection, and generic structure than current reality actually demands.
Ticket Theater
Work tracking becomes performance for stakeholders instead of coordination for delivery.
The Invisible Deadline
A date exists socially or politically, but not explicitly enough for the team to manage the trade-offs honestly.
Autocomplete Architecture
Teams accept AI-suggested structures faster than they understand or own them, embedding design decisions nobody made consciously.
From Tech Decisions
One decision per axis the catalog covers: architecture, delivery, team, quality, and AI systems. Each entry lays out two concrete options with their real conditions, costs, hidden costs, and failure modes when misapplied.
Monolith vs Microservices
Usually a team-shape and operational-maturity decision disguised as an architecture preference.
Build vs Buy
Usually a control-vs-focus decision, not an engineering pride decision.
Specialist Teams vs Cross-Functional Teams
Usually a coordination-vs-depth decision, not a modernity decision.
Test Pyramid vs Heavy End-to-End
Usually a feedback-speed vs system-confidence decision.
RAG vs Fine-Tuning
Usually a knowledge-grounding vs behavior-shaping decision.
Human-in-the-Loop vs Full Automation
Usually a trust-boundary and consequence-of-error decision.
From Red Flags
One signal per layer the catalog covers: code, team, process, leadership, and AI. Each entry opens with what you would actually notice and walks through what it usually indicates and what to check next.
Changes always touch too many places
Even ordinary changes require edits across many files, layers, or services.
Everyone asks the same person
One person becomes the default source of truth, escalation path, or decision gateway for too many important areas.
Work enters faster than it leaves
Incoming work volume consistently outpaces completion, so queues, context switching, and churn grow silently.
Reporting looks healthier than delivery feels
Dashboards, status updates, and leadership narratives stay calm and positive while the teams doing the work feel far more fragility and risk.
AI-generated artifacts are trusted more than source material
Summaries, synthesized docs, or generated analyses start becoming the operational truth instead of pointers back to real sources.
Benchmarks are discussed more than real user outcomes
Teams spend more time on benchmark scores and synthetic eval wins than on whether the system helps real users in real tasks.
From Engineering Playbook
One playbook per subcategory the catalog covers: delivery, team, architecture, operations, and AI adoption. Each entry opens with when to use it, when not to, and walks you through the steps, common mistakes, and signals that it actually landed.
Run a phased migration
Move from old to new in controlled slices, where each slice has explicit ownership, cutover criteria, rollback, and retirement of the old path.
Repair trust after a painful incident
Repair trust by making the event intelligible, changing the conditions that produced it, and demonstrating through behavior that the team is safer, more honest, and more accountable than before.
Refactor a dangerous hotspot
Refactor a hotspot by targeting the specific reasons it is dangerous: high churn, poor testability, unclear ownership, or oversized responsibility - and improving it in narrow, repeatable steps.
Run an incident review that actually helps
Turn an incident review into a system-learning exercise that explains what happened, why it made sense at the time, what conditions enabled it, and what changes will reduce recurrence.
Upgrade code review for AI-assisted work
Redesign review so that AI-assisted changes are judged by risk, understanding, and behavioral correctness, not by surface polish or author confidence.
Evaluate an AI feature against real tasks
Evaluate the feature against real user jobs, realistic failure patterns, and operational constraints so the team learns whether the system actually helps, not just whether it performs well on curated examples.