MDC Logo

Mozilla Data Collective is rebuilding the AI data ecosystem with communities at the centre. Access over 300 high-quality global datasets, built by and for the community in a transparent and ethical way.

Hero Line

Datasets

Common Voice

Common Voice Scripted Speech 24.0 - Teutila Cuicatec

A collection of scripted spoken phrases in Teutila Cuicatec.

Gear IconTask: ASR

Folder IconFormat: MP3

License IconLicense: CC0-1.0

Size: 209.52 MB

Calendar IconCreated: 12/5/2025

Globe IconLocale: cut

Common Voice

Common Voice Scripted Speech 24.0 - Norwegian Nynorsk

A collection of scripted spoken phrases in Norwegian Nynorsk.

Gear IconTask: ASR

Folder IconFormat: MP3

License IconLicense: CC0-1.0

Size: 33.55 MB

Calendar IconCreated: 12/5/2025

Globe IconLocale: nn-NO

My Cool Organization Changed Again

rm-vallader test

rm-vallader test

Gear IconTask: NLP

Folder IconFormat: MP3

License IconLicense: BSD-3-Clause

Size: 2.63 MB

Calendar IconCreated: 12/11/2025

Globe IconLocale: rm-vallader

Mozilla

checksum dataset

Gear IconTask: N/A

Folder IconFormat: N/A

License IconLicense: Apache-2.0

Size: 914.69 KB

Calendar IconCreated: 12/11/2025

Globe IconLocale: en-US

Mozilla

dawdad

wadaddwa

Gear IconTask: NLP

Folder IconFormat: awdawd

License IconLicense: Apache-2.0

Size: 34.00 MB

Calendar IconCreated: 12/11/2025

Globe IconLocale: awdad

Rotimi org

Wonderful test now

Gear IconTask: NLP

Folder IconFormat: mp3

License IconLicense: CC-SA-1.0

Size: 1.82 MB

Calendar IconCreated: 12/8/2025

Globe IconLocale: en-US

MozFam

Common Voice AZ DF

That's my little test upload. It contains the cv 10 corpus for az.

Gear IconTask: ASR

Folder IconFormat: mp3

License IconLicense: CC0-1.0

Size: 3.41 MB

Calendar IconCreated: 12/8/2025

Globe IconLocale: az

Mozilla

test

test

Gear IconTask: N/A

Folder IconFormat: WAV

License IconLicense: Apache-2.0

Size: 2.63 MB

Calendar IconCreated: 12/8/2025

Globe IconLocale: en-US

Common Voice

Dataset for API & Python SDK Tests [Do not remove] - Mock Spontaneous Speech English

DO NOT DELETE. E2E tests of the Python SDK depend on this test dataset

Gear IconTask: NLP

Folder IconFormat: CSV

License IconLicense: CC-BY-4.0

Size: 119.84 KB

Calendar IconCreated: 12/3/2025

Globe IconLocale: en-US

Community

Community Dataset

Community Dataset

Gear IconTask: RAG

Folder IconFormat: MP3

License IconLicense: CC-BY-SA-4.0

Size: 2.76 MB

Calendar IconCreated: 12/2/2025

Globe IconLocale: en-US

My Cool Organization Changed Again

Community Dataset

My Community Dataset

Gear IconTask: MT

Folder IconFormat: MP3

License IconLicense: BSD-3-Clause

Size: 2.76 MB

Calendar IconCreated: 12/2/2025

Globe IconLocale: en-US

Mozilla

Otro dataset bonito

Esta es una descripción bastante corta para describir mi dataset

Gear IconTask: NLP

Folder IconFormat: wav

License IconLicense: CC-BY-ND-4.0

Size: 180.78 MB

Calendar IconCreated: 11/25/2025

Globe IconLocale: es_MX

Line Logo
Line Logo

JOIN THE MOVEMENT

Join Mozilla Data Collective

Community members showing peace signs and smiling

Mozilla Data Collective wants to radically reimagine our data as power. We are anti-extractivism, anti-monopoly and deeply, profoundly pro-people. We are a collective of linguists, technologists, activists, researchers and creatives who want AI to be all it promises to be - not all it threatens to be. Here, you can share your datasets on your own terms.

FAQs

Find answers quickly

What is Mozilla Data Collective?

Mozilla Data Collective is a platform in the truest sense. It’s yours to stand on, and make of it what you will. We have dual roots in two Mozilla projects - Common Voice, a CC0 public dataset to help tech speak your language - and the Data Futures Lab - an experimental space for instigating new approaches to data stewardship challenges. Mozilla Data Collective works by allowing you to share your data, retain ownership of it, and control who uses it.


How does it work?

We partner with organizations and individuals to make their data available through Mozilla Data Collective. You can share openly, using existing licenses like Creative Commons, or you can build your own. You can open up your data for everyone, or just for some types of downloaders, you can set custom constraints, ask for exchange, compensation or recognition. You can govern it as an individual, a co-operative, a trust or something else. After all, it’s your data. The people who access your datasets are authenticated, and held in legally binding contracts, and we have a number of dataset protection features. If you are interested in hosting data on Mozilla Data Collective, please reach out to us at mozilladatacollective@mozillafoundation.org.


Who is behind Mozilla Data Collective?

We are backed and stewarded by Mozilla Foundation - the non-profit, movement-building, and philanthropy arm of Mozilla.