Reporting structure for vulnerabilities

As large language models become more capable, Anthropic believes that model developers need a 'responsible disclosure process': a process by which we can disclose to each other any significant risks and corresponding mitigations that we collectively discover. In cybersecurity, the Coordinated Vulnerability Disclosure process allows firms, governments, and researchers to safely disclose and patch software vulnerabilities.

The goal of the responsible disclosure process is to make current and future models safer and to reduce the opportunity for misuse of currently-deployed models. By disclosing evidence for risks and their mitigations, model developers and deployers can evaluate their models and implement mitigations. The importance of such a process was highlighted in the White House commitments, and we are piloting this process with other AI labs.

As AI is in an early period, it is not clear what ‘best practices’ exist for efforts like responsible disclosure processes. Therefore, Anthropic is committing to experimenting with and identifying such processes and sharing what we learn with governments and other labs, thereby contributing to the development of relevant best practices.