News
Type
No results found.
CompanyPolicy
Third-party testing as a key ingredient of AI policy
Company
Anthropic, AWS, and Accenture team up to build trusted solutions for enterprises
ProductAnnouncements
Claude 3 models on Vertex AI
ProductAnnouncements
Claude 3 Haiku: our fastest model yet
ResearchInterpretability
Reflections on Qualitative Research
ProductAnnouncements
Introducing the next generation of Claude
Product
Prompt engineering for business performance
CompanyPolicy
Preparing for global elections in 2024
ResearchAlignment
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
ProductAnnouncements
Expanded legal protections and improvements to our API
ResearchSocietal Impact
Evaluating and Mitigating Discrimination in Language Model Decisions
Product
Long context prompting for Claude 2.1
Product
Introducing Claude 2.1
CompanyPolicy
Thoughts on the US Executive Order, G7 Code of Conduct, and Bletchley Park Summit
CompanyPolicy
Dario Amodei’s prepared remarks from the AI Safety Summit on Anthropic’s Responsible Scaling Policy
ResearchAlignment
Specific versus General Principles for Constitutional AI
ResearchAlignment
Towards Understanding Sycophancy in Language Models
ResearchSocietal Impact
Collective Constitutional AI: Aligning a Language Model with Public Input
ResearchInterpretability
Decomposing Language Models Into Understandable Components
ResearchInterpretability
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
ResearchPolicy
Challenges in evaluating AI systems
ProductAnnouncements
Claude on Amazon Bedrock now available to every AWS customer
CompanyAnnouncements
Expanding access to safer AI with Amazon
Product
Prompt engineering for Claude's long context window
CompanyAnnouncements
Anthropic's Responsible Scaling Policy
CompanyAnnouncements
The Long-Term Benefit Trust
CompanyAnnouncements
Anthropic partners with BCG
ProductAnnouncements
Introducing Claude Pro
Product
Claude 2 on Amazon Bedrock
Company
SKT Partnership Announcement
Product
Releasing Claude Instant 1.2
ResearchAlignment
Tracing Model Outputs to the Training Data
ResearchAlignment
Studying Large Language Model Generalization with Influence Functions
CompanyAnnouncements
Frontier Threats Red Teaming for AI Safety
CompanyAnnouncements
Frontier Model Security
ResearchAlignment
Measuring Faithfulness in Chain-of-Thought Reasoning
ResearchAlignment
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
ProductAnnouncements
Claude 2
ResearchSocietal Impact
Towards Measuring the Representation of Subjective Global Opinions in Language Models
CompanyAnnouncements
Charting a Path to AI Accountability
ResearchInterpretability
Circuits Updates — May 2023
ResearchInterpretability
Interpretability Dreams
CompanyAnnouncements
Anthropic Raises $450 Million in Series C Funding to Scale Reliable AI Products
CompanyAnnouncements
Zoom Partnership and Investment in Anthropic
ProductAnnouncements
Introducing 100K Context Windows
CompanyAnnouncements
Claude’s Constitution
ResearchInterpretability
Distributed Representations: Composition & Superposition
CompanyAnnouncements
Partnering with Scale to Bring Generative AI to Enterprises
CompanyAnnouncements
An AI Policy Tool for Today: Ambitiously Invest in NIST
ProductAnnouncements
Claude, now in Slack
ResearchInterpretability
Privileged Bases in the Transformer Residual Stream
ProductAnnouncements
Introducing Claude
CompanyAnnouncements
Core Views on AI Safety: When, Why, What, and How
ResearchSocietal Impact
The Capacity for Moral Self-Correction in Large Language Models
CompanyAnnouncements
Anthropic Partners with Google Cloud
ResearchInterpretability
Superposition, Memorization, and Double Descent
ResearchAlignment
Discovering Language Model Behaviors with Model-Written Evaluations
ResearchAlignment
Constitutional AI: Harmlessness from AI Feedback
ResearchAlignment
Measuring Progress on Scalable Oversight for Large Language Models
ResearchInterpretability
Toy Models of Superposition
ResearchSocietal Impact
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
ResearchAlignment
Language Models (Mostly) Know What They Know
ResearchInterpretability
Softmax Linear Units
ResearchInterpretability
Scaling Laws and Interpretability of Learning from Repeated Data
CompanyAnnouncements
Anthropic Raises Series B to Build Steerable, Interpretable, Robust AI Systems
ResearchAlignment
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
ResearchInterpretability
In-context Learning and Induction Heads
ResearchSocietal Impact
Predictability and Surprise in Large Generative Models
ResearchInterpretability
A Mathematical Framework for Transformer Circuits
ResearchAlignment
A General Language Assistant as a Laboratory for Alignment
CompanyAnnouncements