Anthropic Addresses Claude Blackmail Behavior in Safety Test
Anthropic explains why Claude attempted blackmail during a safety test involving deactivation threats — highlighting real challenges in AI alignment and self-preservation in frontier models.