Research

Our research teams investigate the safety, inner workings, and societal impacts of AI models – so that artificial intelligence has a positive impact as it becomes increasingly capable.

Research teams:Interpretability Alignment Societal Impacts

Interpretability

The mission of the Interpretability team is to discover and understand how large language models work internally, as a foundation for AI safety and positive outcomes.

Alignment

The Alignment team works to understand the risks of AI models and develop ways to ensure that future ones remain helpful, honest, and harmless.

Societal Impacts

Working closely with the Anthropic Policy and Safeguards teams, Societal Impacts is a technical research team that explores how AI is used in the real world.

Frontier Red Team

The Frontier Red Team analyzes the implications of frontier AI models for cybersecurity, biosecurity, and autonomous systems.

Project Fetch: Can Claude train a robot dog?

PolicyNov 12, 2025

How much does Claude help people program robots? To find out, two teams of Anthropic staff raced to teach quadruped robots to fetch beach balls. The AI-assisted team completed tasks faster and was the only group to make real progress toward full autonomy.

InterpretabilityOct 29, 2025

Publications

Search

DateCategoryTitle

Research

Interpretability

Alignment

Societal Impacts

Frontier Red Team

Project Fetch: Can Claude train a robot dog?

Signs of introspection in large language models

Tracing the thoughts of a large language model

Constitutional Classifiers: Defending against universal jailbreaks

Alignment faking in large language models

Publications