Make safe AI systems
Deploy them reliably
We develop large-scale AI systems so that we can study their safety properties at the technological frontier, where new problems are most likely to arise. We use these insights to create safer, steerable, and more reliable models, and to generate systems that we deploy externally, like Claude.
Subject
No results found.
Privileged Bases in the Transformer Residual Stream
The Capacity for Moral Self-Correction in Large Language Models
Superposition, Memorization, and Double Descent
Discovering Language Model Behaviors with Model-Written Evaluations
Constitutional AI: Harmlessness from AI Feedback
Measuring Progress on Scalable Oversight for Large Language Models
Toy Models of Superposition
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Language Models (Mostly) Know What They Know
Softmax Linear Units
Scaling Laws and Interpretability of Learning from Repeated Data
In-context Learning and Induction Heads
Predictability and Surprise in Large Generative Models
A Mathematical Framework for Transformer Circuits
A General Language Assistant as a Laboratory for Alignment