Model drift is noticed informally, not measured

Severity: high
Frequency: increasing
First noticed by: users · support teams · AI evaluators
Detectability: subtle
Confidence: high

At a glanceRF-42

Where you see this: vendor model APIsRAG systemsprompt-evolving copilots
Not necessarily a problem when: a temporary exploratory prototype is explicitly not treated as stable
Often mistaken for: we would know if the model meaningfully changed
Time horizon: near-term
Best placed to act: AI engineerevaluation ownerproduct owner

The signal

What you would actually notice

Unmeasured drift weakens trust and delays corrective action.

Field observation

Users say the system feels different or worse, but the team cannot quantify when, how, or why behavior shifted.

Also observed

It feels worse lately, but we do not know why.
Users noticed the change before our monitoring did.

Primary reading

What it usually indicates

Most likely underlying patterns when this signal shows up. Not a diagnosis, a starting hypothesis.

Usually indicates

Most likely underlying patterns when this signal shows up.

weak eval monitoring
provider drift exposure
poor baseline comparison

Stakes

Why it matters

Unmeasured drift weakens trust and delays corrective action.

Inspection

What to check next

Deliberate steps to confirm or disconfirm the primary reading above. Not a checklist. An order of inspection.

eval time series
provider change logs
prompt and workflow version history
user complaint clusters

Diagnostic questions

Questions to ask the team, or yourself, before concluding anything.

What baseline behavior are we comparing against?
What changed in prompt, model, retrieval, or tooling?
Which task slices are most vulnerable to drift?

Progression

Under the signal

Where this pattern tends to come from, what's holding it up, and where it goes if nothing changes.

Leading indicators

What tends to show up first.

user anecdotes drive concern before metrics do
qualitative complaints recur without quantified evidence
provider or prompt changes are poorly tracked

Common root causes

What is usually sitting under the signal.

weak evaluation harness
poor version tracking
insufficient monitoring

Likely consequences

What happens if nothing changes.

trust erosion
slow issue response
benchmark mirage

Look-alikes

Not what it looks like

Patterns that can be mistaken for this signal, and 'fix' attempts that make it worse.

False friends Things the signal is often confused with, but isn't.

we would know if the model meaningfully changed

Anti-patterns when responding

Responses that feel sensible and usually make the underlying pattern worse.

relying on anecdote without building drift measurement
treating provider updates as invisible infrastructure changes

Context

Context and ownership

Where this signal surfaces, who sees it first, who can actually act, and how much runway there usually is before escalation.

Common contexts

Where it shows up

vendor model APIs
RAG systems
prompt-evolving copilots

Most likely to notice

Who sees it first

Before it escalates.

users
support teams
AI evaluators

Best placed to act

Who can move on it

Not always the same as who notices it.

AI engineer
evaluation owner
product owner

Time horizon

near-term

How much runway there usually is before the signal hardens into the underlying pattern.

AI impact

AI effects on this signal

How AI-assisted and AI-driven workflows tend to amplify or hide this signal.

AI amplifies

Ways AI tooling tends to make this signal louder or more common.

This is an AI-specific operating risk.

AI masks

Ways AI tooling tends to hide this signal, so it keeps growing under the surface.

General system success can hide slice-specific degradation until users complain loudly enough.

Relationships

Connected signals

Related failure modes, decisions behind the signal, response playbooks, and neighboring red flags.