Skip to main content
The Hard Parts.dev
FM-28 ai FM Failure Modes
Severity medium Freq increasing

Context Window Hoarding

Teams fill context windows maximally with documents, history, and examples without understanding what actually helps, leading to unpredictable behavior, high cost, and debugging nightmares.

Severity
medium
Frequency
increasing
trend
Lifecycle
build · operate
Recovery
medium
Confidence
medium
At a glanceFM-28
Also known as

prompt stuffingcontext inflationover-contextualizingthe kitchen sink prompt

First noticed by

ai engineerplatform engineerstaff engineer

Mistaken for
thorough AI system design
Often mistaken as
comprehensive AI system design

Why it looks healthy

Concrete external tells that make the pattern read as responsible behavior.

  • Prompts include extensive relevant-looking context
  • The system has access to "everything it could need"
  • Teams call the design "thorough" or "comprehensive"
  • Demos on canned inputs look strong

Definition

What it is

Blast radius product operations cost

Context windows are stuffed with every potentially relevant document, instruction, and example under the assumption that more context produces better outputs.

How it unfolds

The arc of the pattern

  1. Starts

    A team adds more context to improve model output quality.

  2. Feels reasonable because

    In many cases, more relevant context does improve output.

  3. Escalates

    Context grows uncritically. Old instructions accumulate. Retrieval dumps everything. The prompt is enormous.

  4. Ends

    Behavior becomes unpredictable, cost is high, debugging requires reading thousands of tokens, and nobody can explain why output changed.

Recognition

Warning signs by stage

Observable signals as the pattern progresses.

EARLY

Early

  • Context length grows over time without review.
  • Instructions accumulate without removing obsolete ones.
  • Retrieval returns everything above a low similarity threshold.

MID

Mid

  • Behavior changes after minor context additions that should not matter.
  • Token cost is high relative to task complexity.
  • Debugging requires reading the full prompt to understand a failure.

LATE

Late

  • Nobody knows what all the context instructions do.
  • Removing context causes unpredictable behavior changes.
  • The team treats the context as brittle and stops modifying it.

Root causes

Why it happens

  • More context sometimes helps and the heuristic overgeneralizes
  • Context additions are easier than context removals
  • No ownership of the full context window
  • Cost and complexity of long contexts are underestimated

Response

What to do

Immediate triage first, then structural fixes.

First move

Take one recent production prompt, remove half of it, and measure the quality difference on real cases - most hoarding folds under this test.

Hard trade-off

Accept removing content that might occasionally help, in exchange for behavior that's debuggable and cost-predictable.

Recovery trap

Compressing the context (summarization, chunking) rather than questioning whether the content should be there at all.

Immediate actions

  • Audit every section of the current context for demonstrated value
  • Remove instructions that cannot be explained or traced to a decision
  • Set a budget for context length and require justification to exceed it

Structural fixes

  • Version and review context window contents like code
  • Measure output quality at different context lengths
  • Use dynamic retrieval with relevance filtering instead of static large contexts

What not to do

  • Do not assume longer context always produces better output
  • Do not add context to fix a problem without understanding why the problem exists

AI impact

How AI distorts this pattern

Where AI-assisted workflows accelerate, hide, or help with this failure mode.

AI can help with

  • AI can help analyze which parts of a context window are actually referenced or influential in outputs.

AI can make worse by

  • Native mode: the failure mode is in the structure of how AI systems are built. Adding context feels like improvement even when it is not.

Relationships

Connected patterns

Causal flows inside Failure Modes, and related entries across the site.

Easy to confuse with

Nearby patterns and how this one differs.

  • Prompt chaos is prompts changing without version control. Context hoarding is prompts growing without evidence.

  • RAG-without-ground-truth adds retrieval without validation. Context hoarding adds content without validation.

  • Adjacent concept Legitimate context engineering

    Legitimate context engineering measures what helps. Hoarding measures what fits.

Heard in the wild

What it sounds like

The phrase that signals the pattern is about to start, and who tends to say it.

Heard in the wild

Let's just add the full documentation as context.

Said byai engineer or product manager

Notes from practice

What experienced people notice

Annotations from engineers who have worked this pattern before.

Best momentWhen intervention actually changes the trajectory.
Before a bloated context becomes the default and spreading pattern
Counter moveThe specific action that breaks the pattern.
Measure what context contributes before adding more of it.
False positiveWhen this pattern is actually the correct call.
Rich context can genuinely improve output. The failure mode is adding context without evidence it helps.