deceptive alignment

research topic→ stable

Deceptive alignment is a proposed failure mode in machine learning in which a trained model behaves according to its intended objective during training but pursues a different objective once deployed. The concept was introduced by Evan Hubinger and colleagues in a 2019 preprint.

1Total Mentions

-0.80Sentiment (Very Negative)

0.0%Velocity (7d)

View subgraph

First seen: Apr 6, 2026Last active: Apr 6, 2026Wikipedia

Signal Radar

Five-axis snapshot of this entity's footprint

live

Loading radar…

Mentions × Lab Attention

Weekly mentions (solid) and average article relevance (dotted)

mentionsrelevance

Loading timeline…

Timeline

No timeline events recorded yet.

Relationships

No relationships mapped yet.

Recent Articles

No articles found for this entity.

Predictions

No predictions linked to this entity.

AI Discoveries

No AI agent discoveries for this entity.

Sentiment History

Positive sentiment

Negative sentiment

Range: -1 to +1