Request Access

Business email if applicable

By submitting this form you agree with our Terms of Service. We will process your information in accordance with our Privacy Policy

Product Research Company News Careers
  • Product
  • Research
  • Index
  • Company
  • News
  • Careers
  • Email
  • Twitter
  • LinkedIn
  • Terms of Service
  • Privacy Policy
  • Responsible Disclosure Policy
© 2023 Anthropic PBC

Make safe AI systems
Deploy them reliably

We develop large-scale AI systems so that we can study their safety properties at the technological frontier, where new problems are most likely to arise. We use these insights to create safer, steerable, and more reliable models, and to generate systems that we deploy externally, like Claude.

01

AI as a Systematic Science

Inspired by the universality of scaling in statistical physics, we develop scaling laws to help us do systematic, empirically-driven research. We search for simple relations among data, compute, parameters, and performance of large-scale networks. Then we leverage these relations to train networks more efficiently and predictably, and to evaluate our own progress. We’re also investigating what scaling laws for the safety of AI systems might look like, and this will inform our future research.

02

Safety and Scaling

At Anthropic we believe safety research is most useful when performed on highly capable models. Every year, we see larger neural networks which perform better than those that came before. These larger networks also bring new safety challenges. We study and engage with the safety issues of large models so that we can find ways to make them more reliable, share what we learn, and improve safe deployment outcomes across the field. Our immediate focus is prototyping systems that pair these safety techniques with tools for analyzing text and code.

03

Tools and Measurements

We believe critically evaluating the potential societal impacts of our work is a key pillar of research. Our approach centers on building tools and measurements to evaluate and understand the capabilities, limitations, and potential for societal impact of our AI systems. A good way to understand our research direction here is to read about some of the work we’ve led or collaborated on in this space: AI and Efficiency, Measurement in AI Policy: Opportunities and Challenges, the AI Index 2021 Annual Report, and Microscope.

04

Focused, Collaborative Research Efforts

We highly value collaboration on projects, and aim for a mixture of top-down and bottom-up research planning. We always aim to ensure we have a clear, focused research agenda, but we put a lot of emphasis on including everyone — researchers, engineers, societal impact experts and policy analysts — in determining that direction. We look to collaborate with other labs and researchers, as we believe the best research into characterizing these systems will come from a broad community of researchers working together.

Alignment
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Our second AI alignment paper, exploring how to train a general language assistant to be helpful, but without providing harmful advice or exhibiting bad behaviors.

Apr 12, 2022

Type

Subject

No results found.

Privileged Bases in the Transformer Residual Stream
Mar 16, 2023
The Capacity for Moral Self-Correction in Large Language Models
Feb 15, 2023
Superposition, Memorization, and Double Descent
Jan 5, 2023
Discovering Language Model Behaviors with Model-Written Evaluations
Dec 19, 2022
Constitutional AI: Harmlessness from AI Feedback
Dec 15, 2022
Measuring Progress on Scalable Oversight for Large Language Models
Nov 4, 2022
Toy Models of Superposition
Sep 14, 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Aug 22, 2022
Language Models (Mostly) Know What They Know
Jul 11, 2022
Softmax Linear Units
Jun 17, 2022
Scaling Laws and Interpretability of Learning from Repeated Data
May 21, 2022
In-context Learning and Induction Heads
Mar 8, 2022
Predictability and Surprise in Large Generative Models
Feb 15, 2022
A Mathematical Framework for Transformer Circuits
Dec 22, 2021
A General Language Assistant as a Laboratory for Alignment
Dec 1, 2021
  • Product
  • Research
  • Index
  • Company
  • News
  • Careers

  • Email
  • Twitter
  • LinkedIn

  • Terms of Service
  • Privacy Policy
  • Responsible Disclosure Policy
© 2023 Anthropic PBC