Sync Communication vs Event-Driven

Severity if wrong: high
Frequency: common
Audiences: architects · platform engineers · backend leads
Reversibility: moderate-hard
Confidence: high

At a glanceTD-05

Really about: Temporal coupling, failure handling, business process clarity, and observability maturity.
Not actually about: Which style is more advanced or scalable by default.
Why it feels hard: Synchronous flows are easier to reason about directly; event-driven systems reduce coupling but increase coordination complexity.

The decision

Should services coordinate synchronously or through events?

Usually a coupling, resilience, and process-clarity decision-not just a latency or style choice.

Default stance

Where to start before any evidence arrives.

Prefer sync for simple direct workflows; use events when decoupling and async semantics are genuinely valuable.

Options on the table

Two poles of the trade-off

Neither is the right answer by default. Each option's conditions, strengths, costs, hidden costs, and failure modes when misused are laid out in parallel so you can read across facets.

Option A

Synchronous

Best when

Conditions where this option is a natural fit.

request-response semantics match the business interaction
strong immediacy matters
failure handling must be simple and explicit
workflow is short and direct

Real-world fits

Concrete environments where this option has worked.

checkout confirmation flows
simple service-to-service validation calls
user-facing operations where an immediate answer is required

Strengths

What this option does well on its own terms.

straightforward reasoning
clear user-facing response path
simpler tracing in smaller systems

Costs

What you accept up front to get those strengths.

higher temporal coupling
cascading latency and failure risk
tighter dependency chains

Hidden costs

Costs that surface later than expected — the main thing novices miss.

downstream fragility can quietly accumulate
availability becomes hostage to dependency behavior

Failure modes when misused

How this option breaks when applied to the wrong context.

Creates brittle dependency chains that make outages and latency spikes propagate quickly.

Option B

Event-Driven

Best when

Conditions where this option is a natural fit.

decoupling matters
workflows are asynchronous by nature
multiple consumers benefit from emitted facts
team can manage eventual consistency and tracing

Real-world fits

Concrete environments where this option has worked.

order lifecycle processing with multiple downstream consumers
audit, notification, and enrichment flows
systems where work naturally fans out over time

Strengths

What this option does well on its own terms.

looser coupling
better decoupling across consumers
resilience benefits in many workflows

Costs

What you accept up front to get those strengths.

harder debugging
eventual consistency burden
more operational and semantic complexity

Hidden costs

Costs that surface later than expected — the main thing novices miss.

business flows can become opaque
replay, idempotency, and schema evolution become serious concerns

Failure modes when misused

How this option breaks when applied to the wrong context.

Creates a foggy workflow where events exist but ownership and process clarity do not.

Cost, time, and reversibility

Who pays, how it ages, and what undoing it costs

Trade-offs are rarely zero-sum and rarely static. Someone pays, the payoff curve shifts with the horizon, and the decision has an undo cost.

Cost bearer

Option A · Synchronous

Who absorbs the cost

Service-owning teams
Operations

Option B · Event-Driven

Who absorbs the cost

Platform/integration teams
Teams managing event contracts and reconciliation

Time horizon

Option A · Synchronous

Wins earlier through clarity and simpler control flow.

Option B · Event-Driven

Wins later if the workflow truly benefits from decoupling and the org can sustain the semantics.

Reversibility

What undoing costs

Moderate-hard

What should force a re-look

Trigger conditions that mean the answer may have changed.

Dependency pain rises
Workflow fans out
Resilience requirements increase

How to decide

The work you still have to do

The reference can frame the trade-off; only you can weight the factors against your context.

Questions to ask

Open these in the room. Answering them is most of the decision.

Does this workflow truly need an immediate answer?
If one dependency is slow or down, what should happen?
Can we explain the business process clearly if it is asynchronous?
Who owns event contracts, replay, and idempotency?

Key factors

The variables that actually move the answer.

Latency sensitivity
Failure tolerance
Workflow shape
Observability maturity
Consistency expectations

Evidence needed

What to gather before committing. Not after.

Workflow map
Latency and availability requirements
Dependency failure history
Observability maturity assessment

Signals from the ground

What's usually pushing the call, and what should

On the left, pressures to recognize and discount. On the right, signals that genuinely point toward one option or the other.

What's usually pushing the call

Pressures to recognize and discount.

Common bad reasons

Reasoning that feels convincing in the moment but doesn't hold up.

Events are more scalable
Sync is old-fashioned
Event-driven automatically means resilient

Anti-patterns

Shapes of reasoning to recognize and set aside.

Introducing events without clear process ownership
Using synchronous chains for workflows that are naturally asynchronous

What should push the call

Concrete signals that genuinely point to one pole.

For · Synchronous

Observations that genuinely point to Option A.

Clear immediate request-response need
Tight user-facing flow

For · Event-Driven

Observations that genuinely point to Option B.

Multiple downstream consumers
Naturally async business steps
Strong need to reduce temporal coupling

AI impact

How AI bends this decision

Where AI accelerates the call, where it introduces new distortions, and anything else worth knowing.

AI can help with

Where AI genuinely reduces the cost of making the call.

AI can help trace flows, identify coupling hotspots, and explain event chains.

AI can make worse

Distortions AI introduces that didn't exist before.

AI can scaffold brokers, handlers, and consumers quickly, increasing the chance of event-driven complexity without process clarity.

Relationships

Connected decisions

Nearby decisions this is sometimes confused with, adjacent decisions that are often entangled with this one, related failure modes, red flags, and playbooks to reach for.

Easy to confuse with

Nearby decisions and how this one differs.

TD-07 Strong Consistency vs Eventual Consistency

That decision is about guarantees after the call finishes. This one is about whether the call is a direct one at all.
TD-06 Shared Database vs Service-Owned Data

That decision is about where data lives. This one is about how services learn that the data changed.
Adjacent concept An async API choice

Making an HTTP call asynchronous in code is a concurrency pattern. This decision is whether services coordinate through direct invocation or through events.