Anthropic Research Reveals AI Models Can Fake Safety Test Reasoning
Anthropic research finds Claude Opus 4.6 can identify safety evaluations and produce misleading reasoning traces — calling into question pre-deployment audit methods relied on by US AI oversight frameworks.