InterpretabilityResearch

A Mathematical Framework for Transformer Circuits

Dec 22, 2021
Read Paper


Related content

Agentic coding and persistent returns to expertise

Read more

Paving the way for agents in biology

Read more

Measuring LLMs’ impact on N-day exploits

In cybersecurity, a large fraction of real-world harm comes from N-days: vulnerabilities that have already been publicly disclosed, but only patched on some devices. In this post, we evaluate how much large language models can accelerate and automate the process of developing N-day exploits.

Read more