AI benchmarks

Research

New Benchmark Suite Evaluates Financial AI Competence
Byswgoettelman May 19, 2026

Researchers introduce FINESSE-Bench, a new benchmark for evaluating financial AI’s technical analysis skills. Addresses gaps in existing LLM frameworks. #AI #FinancialTech

Read More New Benchmark Suite Evaluates Financial AI Competence
Research

New Benchmark Introduced for Agentic Political Fact Discovery
Byswgoettelman May 15, 2026

Researchers unveil PolitNuggets: a multilingual benchmark testing AI agents’ ability to discover rare political facts through FactNet protocol. Advances evaluation beyond static QA to open-ended discovery.

Read More New Benchmark Introduced for Agentic Political Fact Discovery
AI Labs

Google DeepMind Takes Minority Stake in EVE Online Studio
Byswgoettelman May 7, 2026

Google DeepMind takes minority stake in EVE Online studio CCP Games, turning the 20-year-old space MMO into a testing ground for advanced multi-agent AI research. Terms undisclosed.

Read More Google DeepMind Takes Minority Stake in EVE Online Studio
AI Labs

OpenAI Releases GPT-5.5, Edges Past Anthropic on Agentic Benchmark
Byswgoettelman April 28, 2026

OpenAI’s GPT-5.5 edges past Anthropic’s Claude Mythos Preview on Terminal-Bench 2.0, the agentic benchmark — underscoring the tightening race between U.S. AI labs. #AI #OpenAI #Anthropic

Read More OpenAI Releases GPT-5.5, Edges Past Anthropic on Agentic Benchmark