
As the global race for artificial intelligence supremacy accelerates, India stands at a critical juncture. While western-developed Large Language Models (LLMs) dominate the current landscape, a growing consensus among industry experts and policymakers suggests that reliance on these imported technologies poses significant risks to India's cultural integrity and strategic autonomy.
Experts at EY India have issued a compelling call to action, arguing that for India to develop truly "Sovereign AI," the government must prioritize the strategic release of public data. This move is seen as the cornerstone for building indigenous AI systems capable of understanding the subcontinent's unparalleled linguistic and cultural diversity, thereby countering the inherent biases found in global models trained primarily on western datasets.
The limitations of current global AI models when applied to the Indian context are becoming increasingly apparent. Most leading LLMs are trained on data scraped from the open web, which is heavily skewed towards English content from North America and Europe. This "data bias" results in AI systems that struggle to grasp the nuance, sentiment, and context of Indian languages and social structures.
For a nation home to 23 official languages and over 10,000 unique dialects, the "one-size-fits-all" approach of western AI is inadequate. Industry leaders have pointed out that mere translation is insufficient; true understanding requires models trained on native datasets that capture local idioms, cultural references, and historical context.
Key areas where western models often fail in the Indian context include:
The concept of "Sovereign AI" has emerged as a central theme in India's technology roadmap. It refers to a nation's capacity to design, develop, and regulate AI systems using domestic infrastructure, national data, and an indigenous workforce. This is not merely a technological ambition but a matter of national security and economic resilience.
EY India’s recent analysis suggests that sovereign capabilities are essential to safeguard sensitive information and ensure that the economic value generated by AI remains within the country. Without a sovereign stack, India risks becoming a "digital colony," dependent on foreign API providers for critical infrastructure, from healthcare diagnostics to financial inclusion tools.
The primary bottleneck for developing robust Indian AI models is not talent or compute power, but data. While western corporations have had decades to harvest the open web, high-quality, structured data regarding India is often siloed within government archives.
EY India experts argue that the Indian government holds a "goldmine" of diverse datasets—ranging from census demographics and meteorological records to legal texts and public health statistics. Unlocking this data for responsible use by Indian startups and researchers could provide the fuel needed to train world-class indigenous models.
Proposed Framework for Data Release:
| Data Category | Potential AI Application | Impact |
|---|---|---|
| Linguistic Archives | Training Multilingual LLMs | Preserving dialects and enabling vernacular digital services |
| Public Health Records | Predictive Healthcare Models | Early disease detection and resource allocation in rural areas |
| Legal & Judicial Data | Legal Tech Assistants | Reducing case pendency and improving access to justice |
| Agricultural Statistics | Precision Farming AI | Optimizing crop yields and weather forecasting for farmers |
| Infrastructure Data | Smart City Planning | Improving traffic management and urban utility distribution |
While the release of government data is critical, it must be balanced with stringent privacy protections. The recommendation is not for an unbridled data dump, but for the creation of "Data Trusts" or secure sandboxes where anonymized data can be accessed for training purposes without compromising individual privacy.
The implementation of the Digital Personal Data Protection (DPDP) Act will play a pivotal role here, setting the ground rules for how data can be processed. Experts suggest that a clear policy framework treating anonymized government data as a "Digital Public Good" could replicate the success of the Unified Payments Interface (UPI) in the AI sector, fostering a vibrant ecosystem of innovation.
India's ambition is to transition from being the world's largest consumer of digital services to becoming a global creator of AI solutions. By grounding AI development in the reality of its own population, India can create models that are not only culturally accurate but also highly efficient and frugal—characteristics that the Global South desperately needs.
The economic stakes are massive. Projections indicate that AI could contribute nearly $1.7 trillion to India’s economy by 2035. However, capturing this value requires a shift in strategy. It demands a move away from fine-tuning western models towards building foundational models from the ground up, powered by the vast, diverse, and deep ocean of Indian data.
As 2026 unfolds, the collaboration between the public sector's data stewardship and the private sector's innovation engine will likely define the trajectory of India's AI journey. The message from experts is clear: to build AI that works for India, we must start with data that is India.