Datasets

Filters:
Community

python-sdk-e2e-16:18 - 19/03/2026

SDK live upload test
License Icon

License: CC-BY-4.0

Locale Icon

Locale: en-US

Task Icon

Task: ASR

Format Icon

Format: TSV

Size Icon

Size: 175 B

Community

new acc upload

new acc upload
License Icon

License: CC-SA-1.0

Locale Icon

Locale: new acc upload

Task Icon

Task: MT

Format Icon

Format: new

Size Icon

Size: 4.20 MB

Aaron Tello-Wharton

awdawd

wadwa
License Icon

License: BSD-3-Clause

Locale Icon

Locale: en-

Task Icon

Task: NLU

Format Icon

Format: wad

Size Icon

Size: 175 B

Common Voice

[Do not remove] Python SDK e2e test for PUP versioning

Test
License Icon

License: CC-BY-4.0

Locale Icon

Locale: test

Task Icon

Task: OTH

Format Icon

Format: test

Size Icon

Size: 175 B

Mozilla Foundation

Common Voice Spontaneous Speech 3.0 - Irish

A collection of spontaneous responses to questions in Irish.
License Icon

License: CC0-1.0

Locale Icon

Locale: ga-IE

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 3.14 MB

Common Voice

Dataset Name

A brief description of the dataset.
License Icon

License: CC-BY-4.0

Locale Icon

Locale: en-US

Task Icon

Task: ASR

Format Icon

Format: TSV

Size Icon

Size: 175 B

Common Voice

Kostis test pythoon

Short description
License Icon

License: CC-BY

Locale Icon

Locale: en-US

Task Icon

Task: ASR

Format Icon

Format: tar.gz

Size Icon

Size: 37.12 MB

MozFam

SmokeTestSubAPI

asd
License Icon

License: CC0-1.0

Locale Icon

Locale: az

Task Icon

Task: LID

Format Icon

Format: tar.gz

Size Icon

Size: 3.37 MB

Aaron Tello-Wharton

awdawddwa

awdadw
License Icon

License: CC-BY-4.0

Locale Icon

Locale: wadwdwawd

Task Icon

Task: NLP

Format Icon

Format: wdadaw

Size Icon

Size: 15.36 MB

Common Voice

uploadtest1

aASD
License Icon

License: Apache-2.0

Locale Icon

Locale: ASD

Task Icon

Task: NLP

Format Icon

Format: ASD

Size Icon

Size: 15.36 MB

MozFam

R2 Dataset Test

asdf
License Icon

License: Apache-2.0

Locale Icon

Locale: de

Task Icon

Task: NLP

Format Icon

Format: tar.gz

Size Icon

Size: 77.37 KB

Community

Bangor Talk Siarad Welsh-English corpus

Welsh-English bilingual speech corpus with 40 hours of recorded audio and 450,000 words
License Icon

License: GPL-3.0

Locale Icon

Locale: cym

Task Icon

Task: ASR

Format Icon

Format: MP3, CHA. TSV

Size Icon

Size: 2.13 GB

Aaron Tello-Wharton

Test

Test
License Icon

License: EUPL-1.2

Locale Icon

Locale: Test

Task Icon

Task: LID

Format Icon

Format: Test

Size Icon

Size: 249.04 MB

Aaron Tello-Wharton

swdad

awdad
License Icon

License: Apache-2.0

Locale Icon

Locale: dadawd

Task Icon

Task: NLP

Format Icon

Format: adwad

Size Icon

Size: 15.36 MB

Aaron Tello-Wharton

awdadad

wadadwwadawd
License Icon

License: Apache-2.0

Locale Icon

Locale: en-US

Task Icon

Task: NLP

Format Icon

Format: wadada

Size Icon

Size: 249.04 MB

Mozilla Foundation

Govtube - Kuña Rembiasa

Audio consisting of 1 hour of Spanish and Guarani (approx. 10%) collected from the US Embassy in Paraguay on the subject of human trafficking.
License Icon

License: CC0-1.0

Locale Icon

Locale: es-PY, gn-PY

Task Icon

Task: ASR

Format Icon

Format: TSV, MP3

Size Icon

Size: 52.52 MB

New Test Organization by Liv

A new test with the new flow v2

A basic description would go here
License Icon

License: BSD-3-Clause

Locale Icon

Locale: en-US

Task Icon

Task: MT

Format Icon

Format: Unknown

Size Icon

Size: 249.04 MB

Rotimi very very very very very long org name

Wonderful new dataset

Short description of the dataset
License Icon

License: BSD-3-Clause

Locale Icon

Locale: en-US

Task Icon

Task: LID

Format Icon

Format: mp3

Size Icon

Size: 1.82 MB

My Cool Organization Changed Again

Long Other Information Description Dataset

Long Other Information Description Dataset
License Icon

License: Apache-2.0

Locale Icon

Locale: en

Task Icon

Task: NLP

Format Icon

Format: WAV

Size Icon

Size: 4.20 MB

Rotimi very very very very very long org name

Dataset with long & short desc

I added a short dec
License Icon

License: CC-SA-1.0

Locale Icon

Locale: en-US

Task Icon

Task: CV

Format Icon

Format: mp3

Size Icon

Size: 15.36 MB

Aaron Tello-Wharton

dawdawd

adwada
License Icon

License: CC-BY-ND-4.0

Locale Icon

Locale: en-us

Task Icon

Task: NLP

Format Icon

Format: adwad

Size Icon

Size: 231.89 MB

Rotimi very very very very very long org name

Dataset with long desc

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Curabitur pretium tincidunt lacus. Nulla gravida orci a odio. Nullam varius, turpis et commodo pharetra, est eros bibendum elit.
License Icon

License: CC-BY-SA-4.0

Locale Icon

Locale: en-US

Task Icon

Task: ASR

Format Icon

Format: mp3

Size Icon

Size: 15.36 MB

Rotimi very very very very very long org name

New Dev Dataset

Brief description after edits
License Icon

License: CC-SA-1.0

Locale Icon

Locale: en-US

Task Icon

Task: TTS

Format Icon

Format: mp3

Size Icon

Size: 15.36 MB

Rotimi very very very very very long org name

New Dev Dataset

This is a wonderful dev dataset. I have now edited the description after approval. This is a live edit after approval
License Icon

License: Apache-2.0

Locale Icon

Locale: en-US

Task Icon

Task: NLP

Format Icon

Format: mp3

Size Icon

Size: 15.36 MB