Base URL
All API requests should be made to the following base URL:
https://dev.datacollective.mozillafoundation.org/apiAuthentication
All authenticated endpoints require an API key in the Authorization header:
Authorization: Bearer YOUR_API_KEYYou can create and manage your API keys in your profile settings.
API Endpoints
/datasets/:datasetIdGet Dataset Details
Retrieves the details of a specific dataset.
Authentication
Bearer token in Authorization header
Path Parameters
datasetIdstring
RequiredThe ID of the dataset
Success Response (200 OK)
{
"id": "dataset-1",
"slug": "common-voice-corpus-22",
"name": "Common Voice Corpus 22.0",
"shortDescription": "English Speech Dataset",
"longDescription": "Community-generated audio dataset featuring English speech recordings.",
"locale": "en-US",
"sizeBytes": "268435456000",
"createdAt": "2025-08-26T12:00:00Z",
"organization": {
"name": "Mozilla",
"slug": "mozilla"
},
"license": "CC0-1.0",
"task": "ASR",
"format": "MP3",
"datasetUrl": "https://dev.datacollective.mozillafoundation.org/datasets/dataset-1"
}Error Responses
Dataset not found
Access denied. Private dataset requires organization membership
/datasets/:datasetId/downloadGet Dataset Download URL
Creates a download session and returns the dataset's download URL for direct download from storage. The user must have previously agreed to the dataset's terms of use through the web interface.
Authentication
Bearer token in Authorization header
Path Parameters
datasetIdstring
RequiredThe ID of the dataset
Success Response (200 OK)
{
"accessRecordId": "abc123def456",
"downloadToken": "tok_xyz789",
"downloadUrl": "https://storage.example.com/datasets/common-voice-corpus-22.tar.gz?signature=...",
"expiresAt": "2025-08-27T01:00:00Z",
"sizeBytes": "268435456000",
"contentType": "application/gzip",
"filename": "common-voice-corpus-22.tar.gz",
"checksum": "sha256:abcdef123456..."
}Error Responses
You must agree to the terms of use before downloading this dataset
Dataset not found
Authentication required
Rate limit exceeded
/datasets/:datasetId/download/:downloadTokenREMOVEDDownload Dataset File (Deprecated)
This endpoint has been removed. The POST /download endpoint now returns presigned URLs directly. Clients should download from the storage URL returned in the POST response.
Authentication
Path Parameters
datasetIdstring
RequiredThe ID of the dataset
downloadTokenstring
RequiredThe temporary download token
Error Responses
This endpoint has been removed. Use POST /download to get a presigned URL instead.
Rate Limiting
The API employs organization-level rate limiting to ensure fair usage and stability. Rate limits apply to both API requests and bandwidth consumption.
Request Rate Limiting
When request limits are exceeded, the API responds with status code 429 and includes these headers:
X-RateLimit-LimitTotal requests allowed in current windowX-RateLimit-RemainingRequests remaining in current windowRetry-AfterSeconds until next request allowedDownload Rate Limiting
Organizations are limited to 30 dataset downloads per day. The limit resets at midnight UTC. When exceeded, the API responds with a 429 error.
{
"error": "Rate limit exceeded",
"limit": {
"period": "daily",
"remaining": 0,
"resetsAt": "2025-08-27T00:00:00Z"
}
}Implementation Notes
Direct Storage Downloads
The API returns presigned URLs that allow direct downloads from storage (S3/R2). This reduces bandwidth costs and improves download performance. Presigned URLs are valid for 12 hours.
Resumable Downloads
Storage supports range requests for resumable downloads. If a presigned URL expires during download, request a new URL and resume using Range headers.
Rate Limiting
Downloads are rate limited at 30 per day per organization. Each POST request to create a presigned URL counts toward this limit.
Terms Agreement Required
Users must agree to dataset terms through the web interface before downloading. API-only terms agreement is not supported.
Error Handling
Common Error Responses
Bad Request
Malformed request or invalid parameters
{
"error": "Invalid request parameters.",
"details": {
"page": "Must be a valid integer."
}
}Unauthorized
Missing or invalid authentication
{
"error": "Authentication required."
}Too Many Requests
Rate limit exceeded
{
"error": "Rate limit exceeded. Please try again later.",
"type": "request_limit"
}