Research
Privileged Bases in the Transformer Residual Stream

Product
Introducing Claude

Company
Core Views on AI Safety: When, Why, What, and How
Research
The Capacity for Moral Self-Correction in Large Language Models
Company
Anthropic Partners with Google Cloud
Research
Superposition, Memorization, and Double Descent
Research
Discovering Language Model Behaviors with Model-Written Evaluations
Research
Constitutional AI: Harmlessness from AI Feedback
Research
Measuring Progress on Scalable Oversight for Large Language Models
Research
Toy Models of Superposition
Research
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Research
Language Models (Mostly) Know What They Know
Research
Softmax Linear Units
Research
Scaling Laws and Interpretability of Learning from Repeated Data
Company
Anthropic Raises Series B to Build Steerable, Interpretable, Robust AI Systems
Research
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Research
In-context Learning and Induction Heads
Research
Predictability and Surprise in Large Generative Models
Research
A Mathematical Framework for Transformer Circuits
Research
A General Language Assistant as a Laboratory for Alignment
Company
Anthropic raises $124 million to build more reliable, general AI systems