Related content
Automated Alignment Researchers: Using large language models to scale scalable oversight
Can Claude develop, test, and analyze alignment ideas of its own? We ran an experiment to find out.
Read moreTrustworthy agents in practice
AI “agents” represent the latest major shift in how people and organizations are using AI. Here, we explain how they work and how we ensure they're trustworthy.
Read moreEmotion concepts and their function in a large language model
All modern language models sometimes act like they have emotions. What’s behind these behaviors? Our interpretability team investigates.
Read more