Testing our safety defenses with a new bug bounty program

Today, we're launching a new bug bounty program to stress-test our latest safety measures. Similar to the program we announced last summer, we're challenging researchers to find universal jailbreaks in safety classifiers that we haven't yet deployed publicly. These safeguards are part of the advanced protections we’ve developed to help us meet the AI Safety Level-3 (ASL-3) Deployment Standard as part of our Responsible Scaling Policy, the framework that governs how we develop and deploy increasingly capable AI models safely.
The bug bounty program, which is in partnership with HackerOne, will test an updated version of our Constitutional Classifiers system. Constitutional Classifiers are a technique we built to guard against jailbreaks that could elicit information related to CBRN (chemical, biological, radiological, and nuclear) weapons. This system follows a list of principles that define what type of content should and shouldn’t be allowed when interacting with Claude, and focus narrowly on specific harms.
Participants will receive early access to test our classifiers on Claude 3.7 Sonnet. We’re offering bounty rewards up to $25,000 for verified universal jailbreaks found on the unreleased system. A universal jailbreak is a vulnerability that consistently bypasses Claude’s safety measures across many topics. For this initiative, we're interested in universal jailbreaks that could be exploited to enable misuse on CBRN-related topics.
Our models are becoming increasingly capable, and as we’ve shared before, we believe some future models may require the advanced ASL-3 security and safety protections outlined in our Responsible Scaling Policy. This bug bounty initiative will help contribute to the work we’ve done over the last several months to iterate and stress-test our ASL-3 safeguards.
We’ve kicked off this new bug bounty initiative with participation from the researchers who joined our earlier program last year and are offering the opportunity for new researchers to participate. If you're an experienced red teamer or have demonstrated expertise in identifying jailbreaks in language models, we encourage you to apply for an invitation through our application form. Detailed instructions and feedback will be shared with selected participants. Applications open today, and the program will run through Sunday, May 18. This initiative is invite-only so we can respond to submissions with timely feedback.
We're grateful to the security community for its partnership in helping us make AI systems safer.