Skip to main content
The Hard Parts.dev
RF-38 Ai · Ai Quality RF Red Flags
Severity medium-high Freq increasing

Prompt changes replace system thinking

Teams keep tuning prompts when the real problem is workflow design, source quality, evaluation, or tool structure.

Severity
medium-high
Frequency
increasing
trend
First noticed by
AI engineers · product builders · staff engineers
Detectability
visible-if-you-look
Confidence
high
At a glanceRF-38
Where you see this

early AI productsagent workflowsRAG systems

Not necessarily a problem when
the task is genuinely small and prompt tuning is the right local control surface
Often mistaken for
a better prompt is always the fastest correct fix
Time horizon
near-term
Best placed to act

AI system ownerarchitect

The signal

What you would actually notice

Prompt iteration can create local improvements while system quality remains fundamentally unstable.

Field observation

Behavior issues trigger more prompt iterations, while data, retrieval, tool use, and workflow design remain weak.

Also observed

  • Let us just add one more instruction.
  • The prompt is now 3 pages long, but quality is still unstable.

Primary reading

What it usually indicates

Most likely underlying patterns when this signal shows up. Not a diagnosis, a starting hypothesis.

Usually indicates

Most likely underlying patterns when this signal shows up.

  • missing workflow structure
  • unclear problem decomposition
  • overreliance on language-layer fixes

Stakes

Why it matters

Prompt iteration can create local improvements while system quality remains fundamentally unstable.

Inspection

What to check next

Deliberate steps to confirm or disconfirm the primary reading above. Not a checklist. An order of inspection.

  1. failure taxonomy
  2. source quality
  3. tooling and orchestration design
  4. evaluation harness

Diagnostic questions

Questions to ask the team, or yourself, before concluding anything.

  1. Is this a prompt problem or a system problem?
  2. What evidence says the prompt is the right lever?
  3. What does the model lack besides wording?

Progression

Under the signal

Where this pattern tends to come from, what's holding it up, and where it goes if nothing changes.

Leading indicators

What tends to show up first.

  • prompts keep getting longer
  • failure categories recur despite prompt edits
  • workflow and tool design stays static

Common root causes

What is usually sitting under the signal.

  • prompt-layer overfocus
  • underinvestment in system design
  • low observability

Likely consequences

What happens if nothing changes.

  • fragile behavior
  • prompt ops chaos
  • false local wins

Look-alikes

Not what it looks like

Patterns that can be mistaken for this signal, and 'fix' attempts that make it worse.

False friends Things the signal is often confused with, but isn't.
  • a better prompt is always the fastest correct fix

Anti-patterns when responding

Responses that feel sensible and usually make the underlying pattern worse.

  • solving every failure with another paragraph in the prompt
  • avoiding instrumentation by over-editing wording

Context

Context and ownership

Where this signal surfaces, who sees it first, who can actually act, and how much runway there usually is before escalation.

Common contexts

Where it shows up

  • early AI products
  • agent workflows
  • RAG systems
Most likely to notice

Who sees it first

Before it escalates.

  • AI engineers
  • product builders
  • staff engineers
Best placed to act

Who can move on it

Not always the same as who notices it.

  • AI system owner
  • architect
Time horizon

near-term

How much runway there usually is before the signal hardens into the underlying pattern.

AI impact

AI effects on this signal

How AI-assisted and AI-driven workflows tend to amplify or hide this signal.

AI amplifies

Ways AI tooling tends to make this signal louder or more common.

  • This red flag is AI-native.

AI masks

Ways AI tooling tends to hide this signal, so it keeps growing under the surface.

  • Prompt tweaks can create enough visible improvement to delay deeper fixes.

Relationships

Connected signals

Related failure modes, decisions behind the signal, response playbooks, and neighboring red flags.