RAG uses sources nobody actually trusts

Severity: high
Frequency: increasing
First noticed by: domain experts · users · AI evaluators
Detectability: subtle
Confidence: high

At a glanceRF-41

Where you see this: enterprise assistantsinternal knowledge botssupport copilots
Not necessarily a problem when: the system is explicitly exploratory and not treated as authoritative
Often mistaken for: it cited sources, so it is grounded
Time horizon: near-term
Best placed to act: AI engineerknowledge system ownerdomain owner

The signal

What you would actually notice

RAG can look responsible while still grounding to bad truth.

Field observation

The system cites documents that people would not use as authoritative in serious work.

Also observed

It cited a document nobody would trust in an actual review.
The source exists, but it is outdated and unofficial.

Primary reading

What it usually indicates

Most likely underlying patterns when this signal shows up. Not a diagnosis, a starting hypothesis.

Usually indicates

Most likely underlying patterns when this signal shows up.

weak source curation
index-all-the-things mentality
no source trust model

Stakes

Why it matters

RAG can look responsible while still grounding to bad truth.

Inspection

What to check next

Deliberate steps to confirm or disconfirm the primary reading above. Not a checklist. An order of inspection.

source inclusion rules
ranking logic
freshness controls
authority model

Diagnostic questions

Questions to ask the team, or yourself, before concluding anything.

Would a human trust this source for a real decision?
What signals determine source authority?
Do we distinguish between accessible and authoritative?

Progression

Under the signal

Where this pattern tends to come from, what's holding it up, and where it goes if nothing changes.

Leading indicators

What tends to show up first.

users say the citations are technically relevant but not reliable
important docs and casual notes are treated similarly
stale sources keep appearing

Common root causes

What is usually sitting under the signal.

unfiltered corpus expansion
weak governance of sources
ranking based on recall over trust

Likely consequences

What happens if nothing changes.

plausible but wrong answers
erosion of trust in the assistant
hidden misinformation

Look-alikes

Not what it looks like

Patterns that can be mistaken for this signal, and 'fix' attempts that make it worse.

False friends Things the signal is often confused with, but isn't.

it cited sources, so it is grounded
the source is available, so it is trustworthy

Anti-patterns when responding

Responses that feel sensible and usually make the underlying pattern worse.

assuming citation equals correctness
indexing everything because more data feels safer

Context

Context and ownership

Where this signal surfaces, who sees it first, who can actually act, and how much runway there usually is before escalation.

Common contexts

Where it shows up

enterprise assistants
internal knowledge bots
support copilots

Most likely to notice

Who sees it first

Before it escalates.

domain experts
users
AI evaluators

Best placed to act

Who can move on it

Not always the same as who notices it.

AI engineer
knowledge system owner
domain owner

Time horizon

near-term

How much runway there usually is before the signal hardens into the underlying pattern.

AI impact

AI effects on this signal

How AI-assisted and AI-driven workflows tend to amplify or hide this signal.

AI amplifies

Ways AI tooling tends to make this signal louder or more common.

This is a core AI-specific red flag.

AI masks

Ways AI tooling tends to hide this signal, so it keeps growing under the surface.

Citations make low-trust sources feel authoritative.

Relationships

Connected signals

Related failure modes, decisions behind the signal, response playbooks, and neighboring red flags.