InterpretabilityResearch

A Mathematical Framework for Transformer Circuits

Dec 22, 2021
Read Paper


Related content

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.

Read more

Teaching Claude why

New research on how we've reduced agentic misalignment.

Read more

Natural Language Autoencoders: Turning Claude’s thoughts into text

AI models like Claude talk in words but think in numbers. In this study we train Claude to translate its thoughts into human-readable text.

Read more
A Mathematical Framework for Transformer Circuits \ Anthropic