# Mozilla Data Collective

> Mozilla Data Collective is rebuilding the AI data ecosystem with communities at the centre. Access over 600 high-quality global datasets, built by and for the community in a transparent and ethical way.

> Mozilla Data Collective is the data platform for human agency and fair value exchange. People should be able to choose where their datasets show up, and they should be able to define what it looks like to benefit; whether that’s swapping data for tool access, or for expertise, donating it to the public, or asking for fair compensation.

> Data providers can share data openly, using existing licenses like Creative Commons, or by building their own license. Datasets can be open for everyone, or just for some types of downloaders. Uploaders can set custom constraints, ask for exchange, compensation or recognition. Downloaders who access datasets are fully authenticated, and held in legally binding contracts, and we have a number of dataset protection features.

## Datasets

- [Datasets](/datasets): Mozilla Data Collective has 600+ ethically sourced datasets shared by over 150 organizations.

> Each dataset is identified by a unique slug: /datasets/<dataset slug>

> Uploaders provide their set of terms and conditions for accessing the dataset alongside the license to the dataset itself. Downloaders must agree to these conditions before they are able to download the dataset.

> Some datasets require the downloader to share their email address to request access before the uploader grants access to the dataset.

## Uploading
- [Uploads](/profile/submissions/create): Account holders can submit requests to become uploaders on Mozilla Data Collective by submitting a form explaining the type of data that they want to publish and their goals in sharing their data. To request to become an uploader, account holders should visit their profile page and submit a request.

> Once approved as an uploader on the platform, which generally takes less than a week, uploaders can create a dataset listing by sharing a `.tar.gz` file and filling out an accompanying datasheet.

- [How to Upload](https://community.mozilladatacollective.com/uploading-your-dataset-to-the-mozilla-data-collective-platform/)

## Organizations
- [Organizations](/organization/): Organizations who upload to Mozilla Data Collective can create organizational profiles that show all of their available datasets.

## Legal Documents

- [Terms of Service](/terms): Use of Mozilla Data Collective is governed by different sets of terms for uploader organizations and downloaders.

- [Privacy](/privacy): The Mozilla Data Collective privacy policy explains how we use information gathered by the platform.

## API

- [API](/api-reference): The Mozilla Data Collective API allows you to programatically download datasets. It provides a REST API and [Python SDK] ([datacollective](https://pypi.org/project/datacollective/)). The API is in beta. Authentication uses Bearer tokens via the Authorization header. The base URL is https://mozilladatacollective.com/api and downloads are limited to 30 per day per organization. Users must agree to dataset terms via the web interface before downloading and presigned URLs are returned for direct storage downloads (valid 12 hours).

### API Documentation

- [API Reference Overview](https://mozilladatacollective.com/api-reference): Landing page with quickstart steps for API access
- [REST API Docs](https://mozilladatacollective.com/api-reference/docs): Full endpoint reference including GET /datasets/:datasetId, POST /datasets/:datasetId/download, authentication, rate limiting, and error handling

### Python SDK

- [datacollective Python SDK Documentation](https://mozilla-data-collective.github.io/datacollective-python/): Installation, configuration, usage guide for download_dataset, load_dataset, and programmatic uploads
- [datacollective on PyPI](https://pypi.org/project/datacollective/): Python package installation
- [SDK Source Code](https://github.com/Mozilla-Data-Collective/datacollective-python): GitHub repository

### Optional

- [Schema-Based Loading](https://mozilla-data-collective.github.io/datacollective-python/schema_documentation/): How `schema.yaml` files drive dataset loading into DataFrames
- [ASR Loader](https://mozilla-data-collective.github.io/datacollective-python/loaders/asr/): ASR-specific dataset loading
- [TTS Loader](https://mozilla-data-collective.github.io/datacollective-python/loaders/tts/): TTS-specific dataset loading
- [Programmatic Uploads](https://mozilla-data-collective.github.io/datacollective-python/upload/): Creating submissions and uploading datasets via the SDK

## Relationship to Common Voice
> In 2025, our Founder and CEO E.M. Lewis-Jong, was leading Common Voice (the world’s largest public participation speech dataset) at Mozilla Foundation, and was looking for a release platform that would give Common Voice communities more choice: choice of license, features that undergirded stronger control, and a radically anti-extractivist form of value exchange.

> The team couldn’t find that platform, so in September we built Mozilla Data Collective. Common Voice was Mozilla Data Collective’s first community user, piloting the platform for its own datasets.

## Relationship to Mozilla Foundation
> Mozilla Data Collective was incubated at the Mozilla Foundation. In April 2026, we spun out a UK entity specifically focused on the Mozilla Data Collective platform. Common Voice continues to be stewarded by Mozilla Foundation.

## Optional

- [More information](https://community.mozilladatacollective.com/): The Mozilla Data Collective community blog.
- [How Mozilla Data Collective came to be](https://community.mozilladatacollective.com/about/): About Mozilla Data Collective as an organization
- [Guides for using Mozilla Data Collective](https://community.mozilladatacollective.com/tag/guides/)

### Social media 

You can find Mozilla Data Collective at the following social media sites: 

- [LinkedIn](https://www.linkedin.com/company/mozilla-data-collective)
- [Reddit](https://www.reddit.com/r/MozillaDataCollective/)
- [Discord](https://discord.gg/cs9tJPqQB5)