InterpretabilityResearch

Toy Models of Superposition

Sep 14, 2022
Read Paper

Abstract

In this paper, we use toy models — small ReLU networks trained on synthetic data with sparse input features — to investigate how and when models represent more features than they have dimensions. We call this phenomenon superposition. When features are sparse, superposition allows compression beyond what a linear model would do, at the cost of "interference" that requires nonlinear filtering.

Related content

Anthropic Economic Index report: Learning curves

Anthropic's fifth Economic Index report studies Claude usage in February 2026, building on the economic primitives framework introduced in our previous report.

Read more

Introducing our Science Blog

We’re launching a new blog about AI and science. We’ll share research happening at Anthropic and elsewhere, collaborations with external researchers and labs, and discuss practical workflows for scientists using AI in their own work.

Read more

Long-running Claude for scientific computing

A practical guide to running Claude Code for multi-day scientific tasks—test oracles, persistent memory, and orchestration patterns.

Read more