InterpretabilityResearch

Toy Models of Superposition

Sep 14, 2022

Abstract

In this paper, we use toy models — small ReLU networks trained on synthetic data with sparse input features — to investigate how and when models represent more features than they have dimensions. We call this phenomenon superposition. When features are sparse, superposition allows compression beyond what a linear model would do, at the cost of "interference" that requires nonlinear filtering.

Research

Claude Opus 4 and 4.1 can now end a rare subset of conversations

Aug 15, 2025

Research

Persona vectors: Monitoring and controlling character traits in language models

Aug 01, 2025

Research

Project Vend: Can Claude run a small shop? (And why does that matter?)

Jun 27, 2025