Anthropic Study: AI Models Align Better When Taught Why Values Matter
Anthropic study: Teaching AI models *why* values matter — not just what to do — produces stronger alignment that generalizes to novel situations. A shift in AI safety training methodology.