Subgraph Atlas · centered on entity
deceptive alignment
research topic1 mentions· velocity: stableDeceptive alignment is a proposed failure mode in machine learning in which a trained model behaves according to its intended objective during training but pursues a different objective once deployed. The concept was introduced by Evan Hubinger and colleagues in a 2019 preprint.
Two-hop subgraph: this entity, every entity it directly relates to, and every entity those neighbors relate to. Drag a node, scroll to zoom, click to inspect — or click any neighbor and re-center the atlas there.
0 nodes · 0 edges · loading…
companypersonai_modelproductresearch_labbenchmarkframework
drag to move · scroll to zoom · click a node