Base URL
All API requests should be made to the following base URL:
https://datacollective.mozilla.org/api
Authentication
All authenticated endpoints require an API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY
You can create and manage your API keys in your profile settings.
API Endpoints
/datasets/:datasetId
Get Dataset Details
Retrieves the details of a specific dataset.
Authentication
Bearer token in Authorization header
Path Parameters
datasetId
string
RequiredThe ID of the dataset
Success Response (200 OK)
{
"id": "dataset-1",
"slug": "common-voice-corpus-22",
"name": "Common Voice Corpus 22.0",
"locale": "en-US",
"visibility": "public",
"sizeBytes": 268435456000,
"createdAt": "2025-08-26T12:00:00Z",
"updatedAt": "2025-08-26T12:00:00Z",
"organization": {
"name": "Mozilla",
"slug": "mozilla"
},
"datasetUrl": "https://datacollective.mozilla.org/datasets/dataset-1"
}
Error Responses
Dataset not found
Access denied. Private dataset requires organization membership
/datasets/:datasetId/download
Create Download Session
Creates a download session and returns a download token. The user must have previously agreed to the dataset's terms of use through the web interface.
Authentication
Bearer token in Authorization header
Path Parameters
datasetId
string
RequiredThe ID of the dataset
Success Response (200 OK)
{
"downloadToken": "dlt_abc123xyz",
"downloadUrl": "https://datacollective.mozilla.org/api/datasets/dataset-1/download/dlt_abc123xyz",
"expiresAt": "2025-08-26T13:00:00Z",
"sizeBytes": 268435456000,
"contentType": "application/zip",
"filename": "common-voice-corpus-22.zip"
}
Error Responses
You must agree to the terms of use before downloading this dataset
Dataset not found
Authentication required
Rate limit exceeded
/datasets/:datasetId/download/:downloadToken
Download Dataset File
Downloads the actual dataset file. Supports resumable downloads via HTTP Range headers.
Authentication
Bearer token in Authorization header
Path Parameters
datasetId
string
RequiredThe ID of the dataset
downloadToken
string
RequiredThe temporary download token
Request Headers
Range
For resumable downloads, e.g., bytes=1024-2047
Success Response (200 OK)
Response Headers:
Binary file data
Error Responses
Invalid or expired download token
Dataset or download session not found
Bandwidth limit exceeded
Rate Limiting
The API employs organization-level rate limiting to ensure fair usage and stability. Rate limits apply to both API requests and bandwidth consumption.
Request Rate Limiting
When request limits are exceeded, the API responds with status code 429 and includes these headers:
X-RateLimit-Limit
Total requests allowed in current windowX-RateLimit-Remaining
Requests remaining in current windowRetry-After
Seconds until next request allowedBandwidth Rate Limiting
Download endpoints enforce bandwidth limits at the organization level. When exceeded, connections are terminated with a 429 error.
{
"error": "Bandwidth limit exceeded",
"type": "bandwidth_limit",
"retryAfter": 3600,
"limit": {
"bytesPerHour": 10737418240,
"bytesRemaining": 0,
"resetsAt": "2025-08-26T14:00:00Z"
}
}
Implementation Notes
Single Use Downloads
Each download token can only be used for one complete download session. Once a file is fully downloaded, the token is invalidated.
Resumable Downloads
The download endpoint supports HTTP Range requests for resuming interrupted downloads. Use the ETag header to validate file integrity between resume attempts.
Proxied Downloads
All downloads are proxied through the API server for real-time rate limiting, access control, and analytics tracking.
Terms Agreement Required
Users must agree to dataset terms through the web interface before downloading. API-only terms agreement is not supported.
Error Handling
Common Error Responses
Bad Request
Malformed request or invalid parameters
{
"error": "Invalid request parameters.",
"details": {
"page": "Must be a valid integer."
}
}
Unauthorized
Missing or invalid authentication
{
"error": "Authentication required."
}
Too Many Requests
Rate limit exceeded
{
"error": "Rate limit exceeded. Please try again later.",
"type": "request_limit"
}