Skip to main content
The Hard Parts.dev
TD-08 Architecture TD Tech Decisions
Severity if wrong · medium-high Freq · common

Batch vs Real-Time Processing

Usually a freshness-vs-complexity decision.

Severity if wrong
medium-high
Frequency
common
Audiences
architects · data engineers · product teams
Reversibility
moderate
Confidence
high
At a glanceTD-08
Really about
How much latency the business actually needs versus how much operational complexity it can afford.
Not actually about
Whether the system sounds more modern or advanced.
Why it feels hard
Real time sounds better; batch is often enough and much simpler.

The decision

Should this data or workflow be processed in scheduled batches or in real time?

Usually a freshness-vs-complexity decision.

Default stance

Where to start before any evidence arrives.

Prefer batch unless real-time freshness is materially valuable.

Options on the table

Two poles of the trade-off

Neither is the right answer by default. Each option's conditions, strengths, costs, hidden costs, and failure modes when misused are laid out in parallel so you can read across facets.

Option A

Batch

Best when

Conditions where this option is a natural fit.

  • freshness requirements are measured in minutes or hours
  • cost efficiency matters
  • workflow tolerates delay

Real-world fits

Concrete environments where this option has worked.

  • daily reporting
  • overnight reconciliation
  • periodic enrichment and recalculation jobs

Strengths

What this option does well on its own terms.

  • simpler operations
  • cost efficiency
  • easier reprocessing and recovery

Costs

What you accept up front to get those strengths.

  • slower feedback
  • less immediate visibility
  • delay in downstream actions

Hidden costs

Costs that surface later than expected — the main thing novices miss.

  • batch windows can become implicit deadlines
  • large failure recovery can be painful

Failure modes when misused

How this option breaks when applied to the wrong context.

  • Creates stale systems where timeliness actually matters.

Option B

Real Time

Best when

Conditions where this option is a natural fit.

  • freshness directly affects user value or risk
  • latency matters materially
  • streaming/real-time ops maturity exists

Real-world fits

Concrete environments where this option has worked.

  • fraud/risk detection
  • live personalization
  • user-facing workflow state that must update immediately

Strengths

What this option does well on its own terms.

  • faster reactions
  • fresh data
  • more immediate user or business impact

Costs

What you accept up front to get those strengths.

  • higher complexity
  • more observability burden
  • harder recovery models

Hidden costs

Costs that surface later than expected — the main thing novices miss.

  • real-time pipelines can be expensive to operate for marginal business gain
  • downstream systems may not actually be real-time ready

Failure modes when misused

How this option breaks when applied to the wrong context.

  • Creates expensive always-on pipelines without meaningful business leverage.

Cost, time, and reversibility

Who pays, how it ages, and what undoing it costs

Trade-offs are rarely zero-sum and rarely static. Someone pays, the payoff curve shifts with the horizon, and the decision has an undo cost.

Cost bearer

Option A · Batch

Who absorbs the cost

  • Business stakeholders waiting for slower data

Option B · Real Time

Who absorbs the cost

  • Platform/data engineers
  • Operations
Time horizon

Option A · Batch

Often wins longer than teams expect because simpler systems stay reliable.

Option B · Real Time

Wins when freshness is genuinely monetized or risk-relevant.

Reversibility

What undoing costs

Moderate

What should force a re-look

Trigger conditions that mean the answer may have changed.

  • User expectations change
  • Risk detection needs tighten
  • Streaming maturity improves

How to decide

The work you still have to do

The reference can frame the trade-off; only you can weight the factors against your context.

Questions to ask

Open these in the room. Answering them is most of the decision.

  • What changes if this is 1 second late, 1 minute late, or 1 hour late?
  • Who truly benefits from freshness?
  • Can downstream consumers actually use real-time data?
  • How expensive is reprocessing and recovery?

Key factors

The variables that actually move the answer.

  • Latency value
  • Cost sensitivity
  • Recovery needs
  • Ops maturity

Evidence needed

What to gather before committing. Not after.

  • Latency-to-value analysis
  • Consumer freshness needs
  • Ops cost estimate
  • Recovery/replay requirements

Signals from the ground

What's usually pushing the call, and what should

On the left, pressures to recognize and discount. On the right, signals that genuinely point toward one option or the other.

What's usually pushing the call

Pressures to recognize and discount.

Common bad reasons

Reasoning that feels convincing in the moment but doesn't hold up.

  • Real time is modern
  • Batch sounds legacy

Anti-patterns

Shapes of reasoning to recognize and set aside.

  • Building streaming pipelines for dashboards nobody watches live
  • Keeping batch pipelines where user-facing harm from staleness is already clear

What should push the call

Concrete signals that genuinely point to one pole.

For · Batch

Observations that genuinely point to Option A.

  • Delay is acceptable
  • Cost matters
  • Replay simplicity matters

For · Real Time

Observations that genuinely point to Option B.

  • Freshness directly changes user or business value

AI impact

How AI bends this decision

Where AI accelerates the call, where it introduces new distortions, and anything else worth knowing.

AI can help with

Where AI genuinely reduces the cost of making the call.

  • AI can help estimate freshness value versus complexity burden.

AI can make worse

Distortions AI introduces that didn't exist before.

  • AI may recommend streaming or real-time designs as generic best practice.

Relationships

Connected decisions

Nearby decisions this is sometimes confused with, adjacent decisions that are often entangled with this one, related failure modes, red flags, and playbooks to reach for.

Easy to confuse with

Nearby decisions and how this one differs.

  • That decision is about inter-service coordination. This one is about how fresh the processed result is.

  • Adjacent concept A streaming-framework choice

    Choosing Kafka Streams vs Flink vs Spark Streaming is about the engine. This decision is whether streaming is the right mode at all.

  • Adjacent concept A scheduling-cadence decision

    Scheduling cadence is how often batches run. This decision is whether it should be a batch at all.