Technique · interpretability

Sparse Autoencoders for Interpretability

Training sparse autoencoders on residual-stream activations to extract monosemantic, human-interpretable features from transformer internals.

Origin: Anthropic, 2023-10Read origin paper →Also known as: SAE, Monosemantic features

Products deploying

—

Avg research → prod

—

First commercial deploy

Deployment timeline

No verified deployments yet in our tracked product set.