API Documentation

Reference for the Mozilla Data Collective REST API. Programmatically access community-driven datasets in any programming language.

Base URL

All API requests should be made to the following base URL:

https://datacollective.mozilla.org/api

Authentication

All authenticated endpoints require an API key in the Authorization header:

Authorization: Bearer YOUR_API_KEY

You can create and manage your API keys in your profile settings.

API Endpoints

GET/datasets/:datasetId
Get Dataset Details

Retrieves the details of a specific dataset.

Authentication
Required

Bearer token in Authorization header

Path Parameters
datasetId

string

Required

The ID of the dataset

Success Response (200 OK)
{
  "id": "dataset-1",
  "slug": "common-voice-corpus-22",
  "name": "Common Voice Corpus 22.0",
  "locale": "en-US",
  "visibility": "public",
  "sizeBytes": 268435456000,
  "createdAt": "2025-08-26T12:00:00Z",
  "updatedAt": "2025-08-26T12:00:00Z",
  "organization": {
    "name": "Mozilla",
    "slug": "mozilla"
  },
  "datasetUrl": "https://datacollective.mozilla.org/datasets/dataset-1"
}
Error Responses
404

Dataset not found

403

Access denied. Private dataset requires organization membership

POST/datasets/:datasetId/download
Create Download Session

Creates a download session and returns a download token. The user must have previously agreed to the dataset's terms of use through the web interface.

Authentication
Required

Bearer token in Authorization header

Path Parameters
datasetId

string

Required

The ID of the dataset

Success Response (200 OK)
{
  "downloadToken": "dlt_abc123xyz",
  "downloadUrl": "https://datacollective.mozilla.org/api/datasets/dataset-1/download/dlt_abc123xyz",
  "expiresAt": "2025-08-26T13:00:00Z",
  "sizeBytes": 268435456000,
  "contentType": "application/zip",
  "filename": "common-voice-corpus-22.zip"
}
Error Responses
403

You must agree to the terms of use before downloading this dataset

404

Dataset not found

401

Authentication required

429

Rate limit exceeded

GET/datasets/:datasetId/download/:downloadToken
Download Dataset File

Downloads the actual dataset file. Supports resumable downloads via HTTP Range headers.

Authentication
Required

Bearer token in Authorization header

Path Parameters
datasetId

string

Required

The ID of the dataset

downloadToken

string

Required

The temporary download token

Request Headers
Range
Optional

For resumable downloads, e.g., bytes=1024-2047

Success Response (200 OK)

Response Headers:

Content-Length: 268435456000
Content-Type: application/zip
Content-Disposition: attachment; filename="common-voice-corpus-22.zip"
Accept-Ranges: bytes
ETag: "9bb58f26192e4ba00f01e2e7b136bbd8"
Binary file data
Error Responses
401

Invalid or expired download token

404

Dataset or download session not found

429

Bandwidth limit exceeded

Rate Limiting

The API employs organization-level rate limiting to ensure fair usage and stability. Rate limits apply to both API requests and bandwidth consumption.

Request Rate Limiting

When request limits are exceeded, the API responds with status code 429 and includes these headers:

X-RateLimit-LimitTotal requests allowed in current window
X-RateLimit-RemainingRequests remaining in current window
Retry-AfterSeconds until next request allowed
Bandwidth Rate Limiting

Download endpoints enforce bandwidth limits at the organization level. When exceeded, connections are terminated with a 429 error.

{
  "error": "Bandwidth limit exceeded",
  "type": "bandwidth_limit", 
  "retryAfter": 3600,
  "limit": {
    "bytesPerHour": 10737418240,
    "bytesRemaining": 0,
    "resetsAt": "2025-08-26T14:00:00Z"
  }
}

Implementation Notes

Single Use Downloads

Each download token can only be used for one complete download session. Once a file is fully downloaded, the token is invalidated.

Resumable Downloads

The download endpoint supports HTTP Range requests for resuming interrupted downloads. Use the ETag header to validate file integrity between resume attempts.

Proxied Downloads

All downloads are proxied through the API server for real-time rate limiting, access control, and analytics tracking.

Terms Agreement Required

Users must agree to dataset terms through the web interface before downloading. API-only terms agreement is not supported.

Error Handling

Common Error Responses
400
Bad Request

Malformed request or invalid parameters

{
  "error": "Invalid request parameters.",
  "details": {
    "page": "Must be a valid integer."
  }
}
401
Unauthorized

Missing or invalid authentication

{
  "error": "Authentication required."
}
429
Too Many Requests

Rate limit exceeded

{
  "error": "Rate limit exceeded. Please try again later.",
  "type": "request_limit"
}