InterpretabilityResearchA Mathematical Framework for Transformer CircuitsDec 22, 2021Read PaperRelated contentTeaching Claude whyNew research on how we've reduced agentic misalignment.Read moreNatural Language Autoencoders: Turning Claude’s thoughts into textAI models like Claude talk in words but think in numbers. In this study we train Claude to translate its thoughts into human-readable text.Read moreDonating our open-source alignment toolRead more