Datasets
Common Voice
Common Voice Scripted Speech 24.0 - Teutila Cuicatec
A collection of scripted spoken phrases in Teutila Cuicatec.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 209.52 MB
Created: 12/5/2025
Locale: cut
Common Voice
Common Voice Scripted Speech 24.0 - Norwegian Nynorsk
A collection of scripted spoken phrases in Norwegian Nynorsk.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 33.55 MB
Created: 12/5/2025
Locale: nn-NO
My Cool Organization Changed Again
rm-vallader test
rm-vallader test
Task: NLP
Format: MP3
License: BSD-3-Clause
Size: 2.63 MB
Created: 12/11/2025
Locale: rm-vallader
Mozilla
checksum dataset
Task: N/A
Format: N/A
License: Apache-2.0
Size: 914.69 KB
Created: 12/11/2025
Locale: en-US
Mozilla
dawdad
wadaddwa
Task: NLP
Format: awdawd
License: Apache-2.0
Size: 34.00 MB
Created: 12/11/2025
Locale: awdad
Rotimi org
Wonderful test now
Task: NLP
Format: mp3
License: CC-SA-1.0
Size: 1.82 MB
Created: 12/8/2025
Locale: en-US
MozFam
Common Voice AZ DF
That's my little test upload. It contains the cv 10 corpus for az.
Task: ASR
Format: mp3
License: CC0-1.0
Size: 3.41 MB
Created: 12/8/2025
Locale: az
Mozilla
test
test
Task: N/A
Format: WAV
License: Apache-2.0
Size: 2.63 MB
Created: 12/8/2025
Locale: en-US
Common Voice
Dataset for API & Python SDK Tests [Do not remove] - Mock Spontaneous Speech English
DO NOT DELETE. E2E tests of the Python SDK depend on this test dataset
Task: NLP
Format: CSV
License: CC-BY-4.0
Size: 119.84 KB
Created: 12/3/2025
Locale: en-US
Community
Community Dataset
Community Dataset
Task: RAG
Format: MP3
License: CC-BY-SA-4.0
Size: 2.76 MB
Created: 12/2/2025
Locale: en-US
My Cool Organization Changed Again
Community Dataset
My Community Dataset
Task: MT
Format: MP3
License: BSD-3-Clause
Size: 2.76 MB
Created: 12/2/2025
Locale: en-US
Mozilla
Otro dataset bonito
Esta es una descripción bastante corta para describir mi dataset
Task: NLP
Format: wav
License: CC-BY-ND-4.0
Size: 180.78 MB
Created: 11/25/2025
Locale: es_MX
Mozilla Foundation
test 3.0
testing stuff
Task: MT
Format: N/A
License: CC-BY-NC-SA-4.0
Size: 914.69 KB
Created: 11/19/2025
Locale: en
Mozilla Foundation
Test 2.0
Test 2
Task: NLP
Format: TXT
License: CC-BY-4.0
Size: 914.69 KB
Created: 10/31/2025
Locale: nhi
MoFo-BetaBugBash
JohannBetaBugBashDataset
My Beta Bug Bash Dataset
Task: CALL
Format: MP3
License: CC-0
Size: 7.57 MB
Created: 10/30/2025
Locale: en-US
MDC
Antarctic Penguin Observation
A comprehensive collection of field observations of three Antarctic penguin species (Emperor, Adelie, Gentoo) gathered between 2015-2023.
Task: ASR
Format: MP3
License: BSD Zero Clause License
Size: 34.00 MB
Created: 10/29/2025
Locale: en-US
Elotl
Otro bonito dataset
Este dataset es para probar que puedo subirlos a MDC
Task: ASR
Format: MP3
License: CC-BY-4.0
Size: 3.15 MB
Created: 10/29/2025
Locale: es-MX
Elotl
My bonito dataset
Esta es una descripción muy adecuada para mi dataset. TQM Elotl
Task: NLP
Format: wav
License: CC-BY-4.0
Size: 3.15 MB
Created: 10/28/2025
Locale: en-US
Common Voice
kostis-test-28oct
Task: N/A
Format: N/A
License: cc
Size: 12.06 MB
Created: 10/28/2025
Locale: en-US
Mozilla Foundation
ReRooted 1.0
A speech corpus of Syrian Armenian refugee testimonials
Task: OTH
Format: WAV, TSV
License: GPL-3.0
Size: 914.69 KB
Created: 10/28/2025
Locale: en-US
Common Voice
md test
testing markdown
Task: NLU
Format: mp3
License: cc-0
Size: 2.76 MB
Created: 10/27/2025
Locale: en-US
Common Voice
Example Dataset Upload - 2025 10 23
Example Dataset Upload - 2025 10 23
Task: NLP
Format: mp3
License: cc-0
Size: 72.21 MB
Created: 10/27/2025
Locale: en-US
Community
Test dataset - random
This is a test dataset that I will search for on my computer.
Task: NLP
Format: wav,conllu
License: CC-BY-4.0
Size: 330.70 KB
Created: 10/24/2025
Locale: nhi
Common Voice
newest test
test
Task: CV
Format: tar.gz
License: CC0-1.0
Size: 72.21 MB
Created: 10/17/2025
Locale: en-US
