Datasets

Common Voice

Common Voice Scripted Speech 24.0 - Teutila Cuicatec

A collection of scripted spoken phrases in Teutila Cuicatec.

Gear IconTask: ASR

Folder IconFormat: MP3

License IconLicense: CC0-1.0

Size: 209.52 MB

Calendar IconCreated: 12/5/2025

Globe IconLocale: cut

Common Voice

Common Voice Scripted Speech 24.0 - Norwegian Nynorsk

A collection of scripted spoken phrases in Norwegian Nynorsk.

Gear IconTask: ASR

Folder IconFormat: MP3

License IconLicense: CC0-1.0

Size: 33.55 MB

Calendar IconCreated: 12/5/2025

Globe IconLocale: nn-NO

My Cool Organization Changed Again

rm-vallader test

rm-vallader test

Gear IconTask: NLP

Folder IconFormat: MP3

License IconLicense: BSD-3-Clause

Size: 2.63 MB

Calendar IconCreated: 12/11/2025

Globe IconLocale: rm-vallader

Mozilla

checksum dataset

Gear IconTask: N/A

Folder IconFormat: N/A

License IconLicense: Apache-2.0

Size: 914.69 KB

Calendar IconCreated: 12/11/2025

Globe IconLocale: en-US

Mozilla

dawdad

wadaddwa

Gear IconTask: NLP

Folder IconFormat: awdawd

License IconLicense: Apache-2.0

Size: 34.00 MB

Calendar IconCreated: 12/11/2025

Globe IconLocale: awdad

Rotimi org

Wonderful test now

Gear IconTask: NLP

Folder IconFormat: mp3

License IconLicense: CC-SA-1.0

Size: 1.82 MB

Calendar IconCreated: 12/8/2025

Globe IconLocale: en-US

MozFam

Common Voice AZ DF

That's my little test upload. It contains the cv 10 corpus for az.

Gear IconTask: ASR

Folder IconFormat: mp3

License IconLicense: CC0-1.0

Size: 3.41 MB

Calendar IconCreated: 12/8/2025

Globe IconLocale: az

Mozilla

test

test

Gear IconTask: N/A

Folder IconFormat: WAV

License IconLicense: Apache-2.0

Size: 2.63 MB

Calendar IconCreated: 12/8/2025

Globe IconLocale: en-US

Common Voice

Dataset for API & Python SDK Tests [Do not remove] - Mock Spontaneous Speech English

DO NOT DELETE. E2E tests of the Python SDK depend on this test dataset

Gear IconTask: NLP

Folder IconFormat: CSV

License IconLicense: CC-BY-4.0

Size: 119.84 KB

Calendar IconCreated: 12/3/2025

Globe IconLocale: en-US

Community

Community Dataset

Community Dataset

Gear IconTask: RAG

Folder IconFormat: MP3

License IconLicense: CC-BY-SA-4.0

Size: 2.76 MB

Calendar IconCreated: 12/2/2025

Globe IconLocale: en-US

My Cool Organization Changed Again

Community Dataset

My Community Dataset

Gear IconTask: MT

Folder IconFormat: MP3

License IconLicense: BSD-3-Clause

Size: 2.76 MB

Calendar IconCreated: 12/2/2025

Globe IconLocale: en-US

Mozilla

Otro dataset bonito

Esta es una descripción bastante corta para describir mi dataset

Gear IconTask: NLP

Folder IconFormat: wav

License IconLicense: CC-BY-ND-4.0

Size: 180.78 MB

Calendar IconCreated: 11/25/2025

Globe IconLocale: es_MX

Mozilla Foundation

test 3.0

testing stuff

Gear IconTask: MT

Folder IconFormat: N/A

License IconLicense: CC-BY-NC-SA-4.0

Size: 914.69 KB

Calendar IconCreated: 11/19/2025

Globe IconLocale: en

Mozilla Foundation

Test 2.0

Test 2

Gear IconTask: NLP

Folder IconFormat: TXT

License IconLicense: CC-BY-4.0

Size: 914.69 KB

Calendar IconCreated: 10/31/2025

Globe IconLocale: nhi

MoFo-BetaBugBash

JohannBetaBugBashDataset

My Beta Bug Bash Dataset

Gear IconTask: CALL

Folder IconFormat: MP3

License IconLicense: CC-0

Size: 7.57 MB

Calendar IconCreated: 10/30/2025

Globe IconLocale: en-US

MDC

Antarctic Penguin Observation

A comprehensive collection of field observations of three Antarctic penguin species (Emperor, Adelie, Gentoo) gathered between 2015-2023.

Gear IconTask: ASR

Folder IconFormat: MP3

License IconLicense: BSD Zero Clause License

Size: 34.00 MB

Calendar IconCreated: 10/29/2025

Globe IconLocale: en-US

Elotl

Otro bonito dataset

Este dataset es para probar que puedo subirlos a MDC

Gear IconTask: ASR

Folder IconFormat: MP3

License IconLicense: CC-BY-4.0

Size: 3.15 MB

Calendar IconCreated: 10/29/2025

Globe IconLocale: es-MX

Elotl

My bonito dataset

Esta es una descripción muy adecuada para mi dataset. TQM Elotl

Gear IconTask: NLP

Folder IconFormat: wav

License IconLicense: CC-BY-4.0

Size: 3.15 MB

Calendar IconCreated: 10/28/2025

Globe IconLocale: en-US

Common Voice

kostis-test-28oct

Gear IconTask: N/A

Folder IconFormat: N/A

License IconLicense: cc

Size: 12.06 MB

Calendar IconCreated: 10/28/2025

Globe IconLocale: en-US

Mozilla Foundation

ReRooted 1.0

A speech corpus of Syrian Armenian refugee testimonials

Gear IconTask: OTH

Folder IconFormat: WAV, TSV

License IconLicense: GPL-3.0

Size: 914.69 KB

Calendar IconCreated: 10/28/2025

Globe IconLocale: en-US

Common Voice

md test

testing markdown

Gear IconTask: NLU

Folder IconFormat: mp3

License IconLicense: cc-0

Size: 2.76 MB

Calendar IconCreated: 10/27/2025

Globe IconLocale: en-US

Common Voice

Example Dataset Upload - 2025 10 23

Example Dataset Upload - 2025 10 23

Gear IconTask: NLP

Folder IconFormat: mp3

License IconLicense: cc-0

Size: 72.21 MB

Calendar IconCreated: 10/27/2025

Globe IconLocale: en-US

Community

Test dataset - random

This is a test dataset that I will search for on my computer.

Gear IconTask: NLP

Folder IconFormat: wav,conllu

License IconLicense: CC-BY-4.0

Size: 330.70 KB

Calendar IconCreated: 10/24/2025

Globe IconLocale: nhi

Common Voice

newest test

test

Gear IconTask: CV

Folder IconFormat: tar.gz

License IconLicense: CC0-1.0

Size: 72.21 MB

Calendar IconCreated: 10/17/2025

Globe IconLocale: en-US