InterpretabilityResearchA Mathematical Framework for Transformer CircuitsDec 22, 2021Read PaperResearchAgentic Misalignment: How LLMs could be insider threatsJun 20, 2025ResearchConfidential Inference via Trusted Virtual MachinesJun 18, 2025ResearchSHADE-Arena: Evaluating sabotage and monitoring in LLM agentsJun 16, 2025