InterpretabilityResearch

Toy Models of Superposition

Sep 14, 2022
Read Paper

Abstract

In this paper, we use toy models — small ReLU networks trained on synthetic data with sparse input features — to investigate how and when models represent more features than they have dimensions. We call this phenomenon superposition. When features are sparse, superposition allows compression beyond what a linear model would do, at the cost of "interference" that requires nonlinear filtering.

Related content

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Read more

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

Read more

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.

Read more
Toy Models of Superposition \ Anthropic