New 'APEX-Agents' Benchmark Reveals AI Models Struggle with Real-World Professional Tasks
A new benchmark called APEX-Agents shows that even leading AI models like GPT-5.2 and Gemini 3 Flash fail on most complex, multi-domain tasks drawn from professional fields like law and finance, raising doubts about their immediate readiness for the workplace.

