New 'Humanity's Last Exam' AI Benchmark Aims to Measure Progress Toward Human-Level Intelligence

Researchers from the Center for AI Safety and Scale AI have introduced "Humanity's Last Exam," a rigorous test designed to gauge how close advanced AI models are to human-level knowledge across over 100 subjects.

The exam, detailed in a new Nature study, features 2,500 PhD-level questions vetted by over 1,000 global experts. Questions are designed to be unambiguous, verifiable, and not solvable via simple web search.

In initial tests, leading models like OpenAI's o1 scored only 8.3%. As of February 2026, the top score is 48.4% by Google's Gemini 3 Deep Think, while human experts average 90% in their fields.

The creators state that while high performance on this benchmark is a necessary step, it alone does not signify the achievement of Artificial General Intelligence (AGI).

New 'Humanity's Last Exam' AI Benchmark Aims to Measure Progress Toward Human-Level Intelligence

Google Unveils Gemini 3.1 Flash-Lite AI Model for High-Volume Developer Workloads

Google Launches Now Playing as a Standalone App on Play Store

Apple Unveils Affordable iPhone 17e at $599 with Performance and Camera Upgrades

Blackout Returns to Call of Duty: Warzone - Official Release Date Announced

Switch 2 Game Upgrade Faces Criticism and Refunds Over Visual Issues