Paper Challenges Core AI Safety Assumption on Multi-Agent Systems
A new position paper published on arXiv challenges a foundational assumption in AI safety: that aligning individual models is sufficient to ensure safe behavior when multiple AI agents interact in high-stakes settings.
The paper, titled “Safety and Fairness in Agentic AI Depend on Interaction Topology, Not on Model Scale or Alignment,” argues that as large language models are increasingly deployed as interacting agents, the AI safety community has mistakenly assumed that safety properties of individual models will compose into safe multi-agent behavior.
“This assumption is fundamentally mistaken,” the authors write in the abstract, according to the paper posted to arXiv’s cs.AI section. “In agentic AI, safety is determined by interaction topology, not model weights.”
The researchers identify three distinct failure modes that emerge from how agents are structured to interact rather than from deficiencies in any single model:
Ordering instability occurs when agents deliberate sequentially, meaning the order in which AI agents process information or make decisions can produce divergent and potentially unsafe outcomes — even when each individual agent is well-aligned.
Information cascades arise in parallel voting or aggregation schemes, where agents influence one another’s outputs in ways that amplify errors or biases beyond what any single model would produce independently.
Functional collapse describes a failure mode in which multi-agent systems converge on narrow, degraded behavior patterns, losing the diversity of reasoning that individual agents might otherwise provide.
The findings have implications for U.S. AI safety policy. Federal agencies including the National Institute of Standards and Technology, the Federal Trade Commission, and members of Congress have increasingly focused on agentic AI systems as they are deployed in enterprise and government settings. Current AI safety frameworks — including NIST’s AI Risk Management Framework — have largely focused on individual model evaluation and alignment as the primary mechanism for ensuring safe AI deployment.
The paper suggests that approach may be insufficient. If safety failures emerge from interaction patterns rather than individual model properties, regulators and developers would need to evaluate the architecture of multi-agent systems as a whole, not just the models that comprise them.
The research comes as major technology companies and AI labs are scaling agentic AI deployments, with systems that allow multiple AI agents to collaborate on tasks ranging from software development to financial analysis and government operations. The emergence of agent-to-agent protocols and multi-agent orchestration frameworks has made such deployments increasingly common in production environments.
The position paper’s argument — that topology trumps alignment — could reshape how policymakers and industry stakeholders approach AI safety testing, certification, and regulatory oversight for the next generation of AI systems, the authors argue.