API Documentation

Reference for the Mozilla Data Collective REST API. Programmatically access community-driven datasets in any programming language.

Base URL

All API requests should be made to the following base URL:

https://dev.datacollective.mozillafoundation.org/api

Authentication

All authenticated endpoints require an API key in the Authorization header:

Authorization: Bearer YOUR_API_KEY

You can create and manage your API keys in your profile settings.

API Endpoints

GET/datasets/:datasetId
Get Dataset Details

Retrieves the details of a specific dataset.

Authentication
Required

Bearer token in Authorization header

Path Parameters
datasetId

string

Required

The ID of the dataset

Success Response (200 OK)
{
  "id": "dataset-1",
  "slug": "common-voice-corpus-22",
  "name": "Common Voice Corpus 22.0",
  "shortDescription": "English Speech Dataset",
  "longDescription": "Community-generated audio dataset featuring English speech recordings.",
  "locale": "en-US",
  "sizeBytes": "268435456000",
  "createdAt": "2025-08-26T12:00:00Z",
  "organization": {
    "name": "Mozilla",
    "slug": "mozilla"
  },
  "license": "CC0-1.0",
  "task": "ASR",
  "format": "MP3",
  "datasetUrl": "https://dev.datacollective.mozillafoundation.org/datasets/dataset-1"
}
Error Responses
404

Dataset not found

403

Access denied. Private dataset requires organization membership

POST/datasets/:datasetId/download
Get Dataset Download URL

Creates a download session and returns the dataset's download URL for direct download from storage. The user must have previously agreed to the dataset's terms of use through the web interface.

Authentication
Required

Bearer token in Authorization header

Path Parameters
datasetId

string

Required

The ID of the dataset

Success Response (200 OK)
{
  "accessRecordId": "abc123def456",
  "downloadToken": "tok_xyz789",
  "downloadUrl": "https://storage.example.com/datasets/common-voice-corpus-22.tar.gz?signature=...",
  "expiresAt": "2025-08-27T01:00:00Z",
  "sizeBytes": "268435456000",
  "contentType": "application/gzip",
  "filename": "common-voice-corpus-22.tar.gz",
  "checksum": "sha256:abcdef123456..."
}
Error Responses
403

You must agree to the terms of use before downloading this dataset

404

Dataset not found

401

Authentication required

429

Rate limit exceeded

GET/datasets/:datasetId/download/:downloadTokenREMOVED
Download Dataset File (Deprecated)

This endpoint has been removed. The POST /download endpoint now returns presigned URLs directly. Clients should download from the storage URL returned in the POST response.

Authentication
Not Required
Path Parameters
datasetId

string

Required

The ID of the dataset

downloadToken

string

Required

The temporary download token

Error Responses
410

This endpoint has been removed. Use POST /download to get a presigned URL instead.

Rate Limiting

The API employs organization-level rate limiting to ensure fair usage and stability. Rate limits apply to both API requests and bandwidth consumption.

Request Rate Limiting

When request limits are exceeded, the API responds with status code 429 and includes these headers:

X-RateLimit-LimitTotal requests allowed in current window
X-RateLimit-RemainingRequests remaining in current window
Retry-AfterSeconds until next request allowed
Download Rate Limiting

Organizations are limited to 30 dataset downloads per day. The limit resets at midnight UTC. When exceeded, the API responds with a 429 error.

{
  "error": "Rate limit exceeded",
  "limit": {
    "period": "daily",
    "remaining": 0,
    "resetsAt": "2025-08-27T00:00:00Z"
  }
}

Implementation Notes

Direct Storage Downloads

The API returns presigned URLs that allow direct downloads from storage (S3/R2). This reduces bandwidth costs and improves download performance. Presigned URLs are valid for 12 hours.

Resumable Downloads

Storage supports range requests for resumable downloads. If a presigned URL expires during download, request a new URL and resume using Range headers.

Rate Limiting

Downloads are rate limited at 30 per day per organization. Each POST request to create a presigned URL counts toward this limit.

Terms Agreement Required

Users must agree to dataset terms through the web interface before downloading. API-only terms agreement is not supported.

Error Handling

Common Error Responses
400
Bad Request

Malformed request or invalid parameters

{
  "error": "Invalid request parameters.",
  "details": {
    "page": "Must be a valid integer."
  }
}
401
Unauthorized

Missing or invalid authentication

{
  "error": "Authentication required."
}
429
Too Many Requests

Rate limit exceeded

{
  "error": "Rate limit exceeded. Please try again later.",
  "type": "request_limit"
}