OPEN DATA FOR PUBLIC INTEREST AI

We call for DPGs that can make identifying, preparing, sharing, and using higher-quality open training data easier, particularly for the following use cases:

Development of language models that address language gaps in AI development.
Solutions for public service delivery.
Research based climate action (monitoring, mitigation, adaptation).

KEY ACTIONS

Identify, create, and fund relevant open-source technical and governance toolkits, case studies and capacity building efforts, that can increase the availability and use of high-quality open training data particularly for the following use cases:

Development of language models that address language gaps in AI development
Solutions for public service delivery
Research based climate action (monitoring, mitigation, adaptation)

WHY THIS

The development of public interest AI, including AI systems as digital public goods, depends on the opportunity to train models on both existing and new high-quality openly licensed datasets. Many challenges exist that impede doing this at a larger scale, one of which is the resources required to produce and share open data in different geographical contexts. One way to address this challenge is by creating an adaptable and reusable toolkit that can be recommended to countries and stakeholders to facilitate the collection, extraction, processing, and preparation of data.

WHY NOW

Generative AI is advancing at break-neck speed, and the term “open-source AI” is often misused to describe systems that only have open weights but where there is no transparency and sharing of the data the system has been trained on. This lack of transparency poses a significant risk, as these systems are increasingly shaping our norms, values, understanding of reality, and access to information and services at the most fundamental level. It is urgent to overcome barriers to a more transparent and open way of building AI systems that serve the public interest. This includes reducing some of the main technical barriers to having more high-quality open training data.

HOW TO SUPPORT

Identify the main technical barriers to unlocking more open training data for the priority use cases identified.
Share existing open-source AI-development tools and suggest areas where new tools should be developed for addressing these barriers.
Fund and/or develop promising toolkit approaches and open-source tools for the priority use cases.
Identify and highlight examples of AI-systems that meet the DPG Standard.
Advocate to policy-makers and funders that public interest AI systems can and should be built in an open and transparent way.

Explore Open Data for Public Interest AI Toolkit

OPEN DATA FOR PUBLIC INTEREST AI

KEY ACTIONS

WHY THIS

WHY NOW

HOW TO SUPPORT

Connect

Inquiries

Job Opportunities