
For the past several years, the race toward Artificial General Intelligence (AGI) has been largely defined by a pursuit of higher scores on static, knowledge-based benchmarks. While these metrics have served their purpose in measuring the rapid evolution of large language models, they are increasingly criticized for their vulnerability to data contamination and their inability to capture the nuance of true general intelligence. Google DeepMind is now seeking to shift this paradigm, unveiling a rigorous, science-backed approach to measuring AI progress through a newly released cognitive taxonomy.
The initiative, detailed in the paper "Measuring Progress Toward AGI: A Cognitive Taxonomy," moves beyond mere knowledge retrieval. It proposes a fundamental restructuring of how we evaluate AI systems, anchoring the assessment of "general intelligence" in established principles of cognitive science, neuroscience, and psychology. To catalyze this transition, Google DeepMind has also launched a $200,000 Kaggle hackathon, inviting the global research community to help build the necessary benchmarking infrastructure.
At the heart of this new framework lies a breakdown of general intelligence into ten discrete cognitive abilities. This taxonomy is designed to provide a comprehensive view of how an AI system functions, not just what it knows. By deconstructing intelligence into these specific faculties, researchers can better pinpoint the strengths and weaknesses of different architectures.
The proposed taxonomy includes the following key abilities:
To understand the magnitude of this shift, it is helpful to contrast traditional benchmarking methods with the new cognitive-first approach proposed by the DeepMind team.
| Evaluation Focus | Traditional Benchmarks | Cognitive Taxonomy |
|---|---|---|
| Primary Objective | Static knowledge retrieval | Dynamic cognitive performance |
| Data Integrity | Highly prone to contamination | Resilient via generative testing |
| Human Alignment | Correlates with test scores | Maps to human cognitive distribution |
| System View | Unified performance score | Granular ability breakdown |
While the publication of the framework provides the theoretical foundation, DeepMind acknowledges that a framework alone is insufficient. The challenge lies in creating evaluation protocols that are scalable, robust, and meaningful. To bridge this gap, Google DeepMind has partnered with Kaggle to launch a high-stakes hackathon titled “Measuring progress toward AGI: Cognitive abilities.”
The hackathon is specifically designed to address the "evaluation gap"—the significant scarcity of standardized tests for the more complex, abstract capabilities of modern AI. The competition focuses on five core tracks where current evaluation methods are weakest:
The hackathon offers a total prize pool of $200,000 to incentivize high-quality submissions. The structure is designed to reward both specific track excellence and overall innovation:
Participants will leverage Kaggle’s Community Benchmarks platform, allowing them to test their evaluations against a variety of frontier AI models. The submission window is open from March 17 through April 16, 2026, with the final results slated for announcement on June 1, 2026.
The introduction of this cognitive framework represents a mature step forward for the AI research community. By standardizing the language of "intelligence" through a lens of cognitive science, DeepMind is effectively raising the bar for what constitutes meaningful progress.
One of the most critical aspects of this approach is the proposed three-stage evaluation protocol. By collecting human baselines from demographically representative samples and mapping AI performance against these distributions, researchers can create a normalized score that indicates how a model performs relative to human capabilities in specific domains. This is a significant improvement over leaderboard-chasing, which often masks fundamental flaws in model reasoning or reliability.
As the industry moves closer to the theoretical milestone of AGI, the ability to measure internal "cognitive" progress will become as important as the deployment of the models themselves. With this framework, Google DeepMind is not only asking "how smart is this AI?" but providing a structured, verifiable methodology to answer that question with scientific rigor. For researchers and developers, the Kaggle hackathon serves as an open invitation to help define the metrics that will shape the next era of artificial intelligence.