Research
Circuits Updates — May 2023
Research
Interpretability Dreams

Company
Anthropic Raises $450 Million in Series C Funding to Scale Reliable AI Products

Company
Zoom Partnership and Investment in Anthropic

Product
Introducing 100K Context Windows

Company
Claude’s Constitution

Research
Distributed Representations: Composition & Superposition

Company
Partnering with Scale to Bring Generative AI to Enterprises

Company
An AI Policy Tool for Today: Ambitiously Invest in NIST

Product
Claude, now in Slack
Research
Privileged Bases in the Transformer Residual Stream

Product
Introducing Claude

Company
Core Views on AI Safety: When, Why, What, and How
Research
The Capacity for Moral Self-Correction in Large Language Models
Company
Anthropic Partners with Google Cloud
Research
Superposition, Memorization, and Double Descent
Research
Discovering Language Model Behaviors with Model-Written Evaluations
Research
Constitutional AI: Harmlessness from AI Feedback
Research
Measuring Progress on Scalable Oversight for Large Language Models
Research
Toy Models of Superposition
Research
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Research
Language Models (Mostly) Know What They Know
Research
Softmax Linear Units
Research
Scaling Laws and Interpretability of Learning from Repeated Data
Company
Anthropic Raises Series B to Build Steerable, Interpretable, Robust AI Systems
Research
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Research
In-context Learning and Induction Heads
Research
Predictability and Surprise in Large Generative Models
Research
A Mathematical Framework for Transformer Circuits
Research
A General Language Assistant as a Laboratory for Alignment
Company
Anthropic raises $124 million to build more reliable, general AI systems