Base URL
All API requests should be made to the following base URL:
https://dev.datacollective.mozillafoundation.org/apiAuthentication
All authenticated endpoints require an API key in the Authorization header:
Authorization: Bearer YOUR_API_KEYYou can create and manage your API keys in your profile settings.
API Endpoints
/datasets/:datasetIdGet Dataset Details
Retrieves the details of a specific dataset.
Authentication
Bearer token in Authorization header
Path Parameters
datasetIdstring
RequiredThe ID of the dataset
Success Response (200 OK)
{
"id": "dataset-1",
"slug": "common-voice-corpus-22",
"name": "Common Voice Corpus 22.0",
"locale": "en-US",
"visibility": "public",
"sizeBytes": 268435456000,
"createdAt": "2025-08-26T12:00:00Z",
"updatedAt": "2025-08-26T12:00:00Z",
"organization": {
"name": "Mozilla",
"slug": "mozilla"
},
"datasetUrl": https://dev.datacollective.mozillafoundation.org/datasets/dataset-1,
}Error Responses
Dataset not found
Access denied. Private dataset requires organization membership
/datasets/:datasetId/downloadCreate Download Session
Creates a download session and returns a download token. The user must have previously agreed to the dataset's terms of use through the web interface.
Authentication
Bearer token in Authorization header
Path Parameters
datasetIdstring
RequiredThe ID of the dataset
Success Response (200 OK)
{
"downloadToken": "dlt_abc123xyz",
"downloadUrl": "https://dev.datacollective.mozillafoundation.org/api/datasets/dataset-1/download/dlt_abc123xyz",
"expiresAt": "2025-08-26T13:00:00Z",
"sizeBytes": 268435456000,
"contentType": "application/zip",
"filename": "common-voice-corpus-22.tar.gz"
}Error Responses
You must agree to the terms of use before downloading this dataset
Dataset not found
Authentication required
Rate limit exceeded
/datasets/:datasetId/download/:downloadTokenDownload Dataset File
Downloads the actual dataset file.
Authentication
Bearer token in Authorization header
Path Parameters
datasetIdstring
RequiredThe ID of the dataset
downloadTokenstring
RequiredThe temporary download token
Success Response (200 OK)
Response Headers:
Binary file dataError Responses
Invalid or expired download token
Dataset or download session not found
Bandwidth limit exceeded
Rate Limiting
The API employs organization-level rate limiting to ensure fair usage and stability. Rate limits apply to both API requests and bandwidth consumption.
Request Rate Limiting
When request limits are exceeded, the API responds with status code 429 and includes these headers:
X-RateLimit-LimitTotal requests allowed in current windowX-RateLimit-RemainingRequests remaining in current windowRetry-AfterSeconds until next request allowedBandwidth Rate Limiting
Download endpoints enforce bandwidth limits at the organization level. When exceeded, connections are terminated with a 429 error.
{
"error": "Bandwidth limit exceeded",
"type": "bandwidth_limit",
"retryAfter": 3600,
"limit": {
"bytesPerHour": 10737418240,
"bytesRemaining": 0,
"resetsAt": "2025-08-26T14:00:00Z"
}
}Implementation Notes
Single Use Downloads
Each download token can only be used for one complete download session. Once a file is fully downloaded, the token is invalidated.
Proxied Downloads
All downloads are proxied through the API server for real-time rate limiting, access control, and analytics tracking.
Terms Agreement Required
Users must agree to dataset terms through the web interface before downloading. API-only terms agreement is not supported.
Error Handling
Common Error Responses
Bad Request
Malformed request or invalid parameters
{
"error": "Invalid request parameters.",
"details": {
"page": "Must be a valid integer."
}
}Unauthorized
Missing or invalid authentication
{
"error": "Authentication required."
}Too Many Requests
Rate limit exceeded
{
"error": "Rate limit exceeded. Please try again later.",
"type": "request_limit"
}