New AI Benchmark Tests Automation in US Healthcare Workflows
A new benchmark called CHI-Bench has been introduced to assess AI agents’ ability to automate complex healthcare workflows in the U.S. system, according to a preprint study published on arXiv. The benchmark focuses on policy-dense tasks such as prior authorization and care management, which require adherence to medical, insurance, and operational rules.
Developed using a high-fidelity simulator with 87 managed-care policy tools, CHI-Bench emphasizes three underrepresented capabilities in current AI benchmarks: policy density, multi-role composition, and multilateral interaction. These involve navigating extensive rule sets, switching between roles with handoffs, and engaging in multi-turn dialogues like peer reviews and patient consultations.
The framework is tailored to U.S. healthcare operations, including managed-care policies and clinical workflows critical for providers and insurers. Researchers argue that existing benchmarks fail to capture the complexity of real-world healthcare automation, which often requires simultaneous decision-making across legal, clinical, and administrative domains.
As AI adoption grows in healthcare, benchmarks like CHI-Bench could help identify gaps in systems designed to handle the U.S. healthcare ecosystem’s unique regulatory and operational demands.