OpenAI Adds WebSocket Support to Responses API for Faster Agent Performance

OpenAI on Thursday unveiled WebSocket support and connection-scoped caching for its Responses API, a technical upgrade the company said significantly reduces overhead in agentic workflows that require rapid, repeated calls to its models.

The new capabilities are designed to address a core bottleneck in AI agent systems: the cumulative latency cost of establishing new HTTP connections for each step in a multi-turn agent loop. By maintaining persistent WebSocket connections, developers can eliminate repeated handshake overhead and keep sessions alive across sequential API calls.

OpenAI detailed the changes in a blog post that uses the company’s Codex coding agent as a reference implementation. In the Codex agent loop, where the system may execute dozens of sequential reasoning and tool-use steps to complete a single task, the per-request connection cost compounds quickly. WebSockets allow the agent to maintain a single open channel for the duration of a workflow.

Connection-scoped caching adds a second layer of optimization. Rather than re-transmitting full context with each request, the API can cache and reference prior inputs within an active connection. OpenAI said this reduces both bandwidth consumption and processing time, particularly for long-running agent sessions that build on extensive conversation histories.

The company did not disclose specific latency benchmarks in the announcement but characterized the performance gains as meaningful for production agent deployments. The features are available immediately through the Responses API.

The update reflects a broader industry push to optimize infrastructure for agentic AI applications, which place different demands on APIs than traditional single-turn query systems. Anthropic, Google and other major AI providers have similarly invested in reducing latency for multi-step agent workflows in recent months.

OpenAI’s Responses API, which launched earlier this year as a successor to the Chat Completions endpoint, was designed from the outset to support tool use, multi-turn interactions and other patterns common in agent architectures. The WebSocket addition extends that foundation with a transport-layer optimization aimed squarely at developers building autonomous coding assistants, research agents and other systems that chain multiple model calls together.

Source

OpenAI Blog

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *