Review service boundaries · thehardparts.dev

Difficulty: high
Time horizon: days to weeks depending on system size
Primary owner: architect
Confidence: high

At a glanceEP-09

Situation: You need to evaluate whether current service boundaries are helping or hurting.
Goal: Determine whether current boundaries are reducing complexity, clarifying ownership, and localizing change, or whether they are creating coupling, coordination cost, and fuzzy responsibility.
Do not use when: the system is too new to have meaningful change and incident history
Primary owner: architect
Roles involved: architecttech leadservice ownersstaff engineersdelivery lead when change cost mattersplatform or operations partner when runtime behavior matters

Context

The situation

Deciding whether to reach for this playbook: when it fits, and when it doesn't.

Use when

Conditions where this playbook is the right tool.

Simple changes require coordination across too many services or teams
Ownership is unclear at service edges
Integration incidents are common
Teams debate whether a boundary is right but lack evidence
A monolith split or service consolidation is being considered

Stakes

Why this matters

What this playbook protects against, and why skipping or half-running it tends to be expensive.

Bad service boundaries create invisible tax: broader changes, slower delivery, more coordination, hidden duplication, and weaker accountability. Good boundaries reduce cognitive load and make failure and change more local.

Quality bar

What good looks like

The observable qualities of a team or system that is actually doing this well. Not just going through the motions.

Signs of the playbook done well

The team can explain what each service owns in business terms
Common changes stay local more often than not
Consumers depend on explicit contracts rather than accidental behavior
Ownership and operational accountability align with the service boundary
Boundary changes are justified by evidence, not fashion

Preparation

Before you start

What you need available and true before running the procedure. Skipping this is the most common reason playbooks fail.

Inputs

Material you'll want to gather first.

Service inventory
Change history across services
Incident and dependency history
Team ownership map
API or event contracts
Architecture diagrams if available

Prerequisites

Conditions that should be true for this to work.

You can identify current service owners and consumers
You have access to recent change and incident history
The review is allowed to challenge the current boundary model honestly

Procedure

The procedure

Each step carries its purpose (why it exists), its actions (what you do), and its outputs (what you produce). Read the purpose. It's what keeps the step from degenerating into checklist theatre.

01
State what each service is supposed to own
Compare intended boundaries with real ones.
Actions
- Describe each service in one sentence using business behavior, not tech stack
- List what it explicitly owns and what it should not own
- Identify where different teams describe the same service differently
Outputs
- Service purpose inventory
02
Look at change fan-out and coordination cost
Use actual change behavior as evidence.
Actions
- Review common changes from the last few months
- Measure how many services and teams a typical change crosses
- Identify recurring cross-boundary edits that look structural, not incidental
Outputs
- Change fan-out map
03
Inspect runtime and contract coupling
Find where boundaries look clean in diagrams but not in operation.
Actions
- Review dependencies, shared schemas, hidden data assumptions, and retry or failure patterns
- Identify contracts that are implicit, brittle, or operationally expensive
- Check whether services depend on implementation detail rather than declared interface
Outputs
- Coupling and contract assessment
04
Check ownership fit
Make sure service boundaries support real accountability.
Actions
- Ask who owns roadmap, incidents, operability, and compatibility for each service
- Identify boundaries where responsibility and authority do not match
- Note where a service looks shared but is really owned through heroics or hidden escalation
Outputs
- Ownership fit review
05
Recommend targeted boundary moves
Avoid vague conclusions like re-architect more.
Actions
- Name which boundaries should stay, shift, merge, split, or be made more explicit
- Tie each recommendation to change locality, contract health, or ownership clarity
- Sequence the changes in small, evidence-based steps
Outputs
- Boundary recommendation set
- Next-step architecture plan

Judgment

Judgment calls and pitfalls

The places where execution actually diverges: decisions that need thought, questions worth asking, and mistakes that recur regardless of good intent.

Decision points

Moments where judgment and trade-offs matter more than procedure.

Is the problem really the service boundary, or the contract and ownership around it?
Should two services merge, or should their interface become cleaner?
Would a modular monolith boundary serve this responsibility better?
Which changes are worth making now versus watching longer?

Questions worth asking

Prompts to use on yourself, the team, or an AI assistant while running the procedure.

What common changes cross these boundaries today?
Which service owns this business behavior end to end?
Are we paying more in coordination than we gain in separation?

Common mistakes

Patterns that surface across teams running this playbook.

Reviewing boundaries from diagrams only
Treating all cross-service traffic as proof boundaries are wrong
Optimizing for theoretical purity over real ownership and change cost
Using service count as a maturity proxy
Deciding to split or merge without looking at change history

Warning signs you are doing it wrong

Signals that the playbook is being executed but not landing.

The review produces generic statements like reduce coupling without naming where
Teams still cannot explain who owns what after the review
The answer is microservices are bad or monoliths are bad instead of context-specific
Common changes still cross the same boundaries but the review calls them edge cases

Outcomes

Outcomes and signals

What should exist after the playbook runs, how you'll know it worked, and what to watch for over time.

Artifacts to produce

Durable outputs the playbook should leave behind.

Service purpose inventory
Change fan-out map
Coupling and contract assessment
Ownership fit review
Boundary recommendation set

Success signals

Observable changes that mean the playbook landed.

Future changes become more local
Service purpose descriptions become sharper and more consistent
Ownership and on-call routing become clearer
Teams stop rediscovering the same contract and boundary problems

Follow-up actions

Moves that keep the playbook's effects compounding after it finishes.

Turn repeated cross-boundary friction into explicit migration or contract work
Review high-friction boundaries again after a few release cycles
Update service catalogs, onboarding docs, and dependency maps with the clearer boundary model

Metrics or signals to watch

Longer-horizon indicators that the underlying problem is receding.

Median number of services touched per common change
Cross-team coordination count per feature
Contract-related incident rate
Service ownership ambiguity events

AI impact

AI effects on this playbook

How AI-assisted and AI-driven workflows help execution, and the ways they can make it worse.

AI can help with

Where AI tooling genuinely reduces the cost of running this playbook well.

Summarizing change histories across repos or services
Mapping dependency and call patterns from code and telemetry
Drafting first-pass service inventories and interface summaries
Finding likely hidden couplings in configs, schemas, and logs

AI can make worse by

Distortions AI introduces that make the underlying problem harder to see.

Making weak boundaries look coherent through elegant summaries
Encouraging large speculative service redesigns before evidence is solid
Confusing generated architecture rationales with actual system understanding

Relationships

Connected playbooks

Failure modes this playbook tends to address, decisions behind the situation, red flags that motivate running it, and neighboring playbooks.