Exploring model welfare
Human welfare is at the heart of our work at Anthropic: our mission is to make sure that increasingly capable and sophisticated AI systems remain beneficial to humanity.
But as we build those AI systems, and as they begin to approximate or surpass many human qualities, another question arises. Should we also be concerned about the potential consciousness and experiences of the models themselves? Should we be concerned about model welfare, too?
This is an open question, and one that’s both philosophically and scientifically difficult. But now that models can communicate, relate, plan, problem-solve, and pursue goals—along with very many more characteristics we associate with people—we think it’s time to address it.
To that end, we recently started a research program to investigate, and prepare to navigate, model welfare.
We’re not alone in considering these questions. A recent report from world-leading experts—including David Chalmers, arguably the best-known and most respected living philosopher of mind—highlighted the near-term possibility of both consciousness and high degrees of agency in AI systems, and argued that models with these features might deserve moral consideration. We supported an early project on which that report was based, and we’re now expanding our internal work in this area as part of our effort to address all aspects of safe and responsible AI development.
This new program intersects with many existing Anthropic efforts, including Alignment Science, Safeguards, Claude’s Character, and Interpretability. It also opens up entirely new and challenging research directions. We’ll be exploring how to determine when, or if, the welfare of AI systems deserves moral consideration; the potential importance of model preferences and signs of distress; and possible practical, low-cost interventions.
For now, we remain deeply uncertain about many of the questions that are relevant to model welfare. There’s no scientific consensus on whether current or future AI systems could be conscious, or could have experiences that deserve consideration. There’s no scientific consensus on how to even approach these questions or make progress on them. In light of this, we’re approaching the topic with humility and with as few assumptions as possible. We recognize that we'll need to regularly revise our ideas as the field develops.
We look forward to sharing more about this research soon.
Related content
Introducing Bloom: an open source tool for automated behavioral evaluations
Read moreProject Vend: Phase two
In June, we revealed that we’d set up a small shop in our San Francisco office lunchroom, run by an AI shopkeeper. It was part of Project Vend, a free-form experiment exploring how well AIs could do on complex, real-world tasks. How has Claude's business been since we last wrote?
Read moreIntroducing Anthropic Interviewer: What 1,250 professionals told us about working with AI
We built an interview tool called Anthropic Interviewer. Powered by Claude, Anthropic Interviewer runs detailed interviews automatically and at unprecedented scale.
Read more