Google DeepMind Releases Cognitive Framework to Measure AGI Progress, Launches $200K Kaggle Hackathon

Redefining Intelligence: Google DeepMind’s New Cognitive Framework

For the past several years, the race toward Artificial General Intelligence (AGI) has been largely defined by a pursuit of higher scores on static, knowledge-based benchmarks. While these metrics have served their purpose in measuring the rapid evolution of large language models, they are increasingly criticized for their vulnerability to data contamination and their inability to capture the nuance of true general intelligence. Google DeepMind is now seeking to shift this paradigm, unveiling a rigorous, science-backed approach to measuring AI progress through a newly released cognitive taxonomy.

The initiative, detailed in the paper "Measuring Progress Toward AGI: A Cognitive Taxonomy," moves beyond mere knowledge retrieval. It proposes a fundamental restructuring of how we evaluate AI systems, anchoring the assessment of "general intelligence" in established principles of cognitive science, neuroscience, and psychology. To catalyze this transition, Google DeepMind has also launched a $200,000 Kaggle hackathon, inviting the global research community to help build the necessary benchmarking infrastructure.

The 10-Ability Cognitive Taxonomy

At the heart of this new framework lies a breakdown of general intelligence into ten discrete cognitive abilities. This taxonomy is designed to provide a comprehensive view of how an AI system functions, not just what it knows. By deconstructing intelligence into these specific faculties, researchers can better pinpoint the strengths and weaknesses of different architectures.

The proposed taxonomy includes the following key abilities:

Perception: The capacity to extract and process complex sensory information from the environment.
Generation: The ability to produce outputs, including text, speech, and executable actions.
Attention: The skill of focusing cognitive resources on relevant stimuli amidst noise.
Learning: The continuous process of acquiring new knowledge through experience, interaction, and instruction.
Memory: The ability to store, maintain, and retrieve information over varying time scales.
Reasoning: The application of logical inference to draw valid conclusions from available data.
Metacognition: The higher-order capacity for knowledge and monitoring of one's own internal cognitive processes.
Executive Functions: The orchestration of planning, inhibition, and cognitive flexibility.
Problem Solving: The specialized ability to find effective solutions within domain-specific contexts.
Social Cognition: The capacity to interpret complex social cues and respond appropriately in dynamic interpersonal situations.

Comparing Evaluation Paradigms

To understand the magnitude of this shift, it is helpful to contrast traditional benchmarking methods with the new cognitive-first approach proposed by the DeepMind team.

Evaluation Focus	Traditional Benchmarks	Cognitive Taxonomy
Primary Objective	Static knowledge retrieval	Dynamic cognitive performance
Data Integrity	Highly prone to contamination	Resilient via generative testing
Human Alignment	Correlates with test scores	Maps to human cognitive distribution
System View	Unified performance score	Granular ability breakdown

Moving from Theory to Practice: The Kaggle Hackathon

While the publication of the framework provides the theoretical foundation, DeepMind acknowledges that a framework alone is insufficient. The challenge lies in creating evaluation protocols that are scalable, robust, and meaningful. To bridge this gap, Google DeepMind has partnered with Kaggle to launch a high-stakes hackathon titled “Measuring progress toward AGI: Cognitive abilities.”

The hackathon is specifically designed to address the "evaluation gap"—the significant scarcity of standardized tests for the more complex, abstract capabilities of modern AI. The competition focuses on five core tracks where current evaluation methods are weakest:

Learning: Testing an AI's ability to internalize and apply new information effectively.
Metacognition: Evaluating an AI's awareness of its own reasoning limitations.
Attention: Assessing the model's ability to maintain focus on critical tasks in complex environments.
Executive Functions: Measuring cognitive flexibility and planning under constraints.
Social Cognition: Assessing the ability to interpret and engage in nuanced social interactions.

Prize Pool and Logistics

The hackathon offers a total prize pool of $200,000 to incentivize high-quality submissions. The structure is designed to reward both specific track excellence and overall innovation:

Track Awards: $10,000 prizes for the top two submissions in each of the five cognitive tracks.
Grand Prizes: $25,000 prizes awarded to the four absolute best overall submissions across the entire competition.

Participants will leverage Kaggle’s Community Benchmarks platform, allowing them to test their evaluations against a variety of frontier AI models. The submission window is open from March 17 through April 16, 2026, with the final results slated for announcement on June 1, 2026.

Implications for the Future of AGI Research

The introduction of this cognitive framework represents a mature step forward for the AI research community. By standardizing the language of "intelligence" through a lens of cognitive science, DeepMind is effectively raising the bar for what constitutes meaningful progress.

One of the most critical aspects of this approach is the proposed three-stage evaluation protocol. By collecting human baselines from demographically representative samples and mapping AI performance against these distributions, researchers can create a normalized score that indicates how a model performs relative to human capabilities in specific domains. This is a significant improvement over leaderboard-chasing, which often masks fundamental flaws in model reasoning or reliability.

As the industry moves closer to the theoretical milestone of AGI, the ability to measure internal "cognitive" progress will become as important as the deployment of the models themselves. With this framework, Google DeepMind is not only asking "how smart is this AI?" but providing a structured, verifiable methodology to answer that question with scientific rigor. For researchers and developers, the Kaggle hackathon serves as an open invitation to help define the metrics that will shape the next era of artificial intelligence.