InterpretabilityResearch

A Mathematical Framework for Transformer Circuits

Dec 22, 2021
Read Paper


Related content

Teaching Claude why

New research on how we've reduced agentic misalignment.

Read more

Natural Language Autoencoders: Turning Claude’s thoughts into text

AI models like Claude talk in words but think in numbers. In this study we train Claude to translate its thoughts into human-readable text.

Read more

Donating our open-source alignment tool

Read more
A Mathematical Framework for Transformer Circuits \ Anthropic