Wikimedia Enterprise API Documentation

Welcome to the Wikimedia Enterprise API Documentation and Reference Guide. Wikimedia Enterprise provides a suite of APIs designed for high-volume read-only access to Wikipedia and other supported Wikimedia projects. These APIs have been developed and are maintained by the Wikimedia Foundation. All endpoints return data in a consistent schema, so you can combine articles from Snapshot, On-demand, or Realtime without creating separate parsers. Our Snapshot and Realtime: Batch endpoints return tar.gz files and an NDJSON header. All other API endpoints return JSON by default.

Read our Data Primer to understand how data from Wikimedia projects is created and maintained. With Wikimedia Enterprise APIs, you can:

Download an entire Wikimedia project (HTML Database Dump)
- Call the Snapshot API to receive a binary data file (compressed into a .tar.gz file) containing every single article in a specific project, e.g. all of English Wikipedia or German Wiktionary.
Request the latest public version of a single article:
- Query the On-demand API by article name, like “Albert_Einstein” or “SpaceX”, to receive the most recent article version in all projects. You can also use filters for language and/or project of choice.
Stream real-time updates (firehose) from supported projects:
- Connect to the Realtime API stream to have all article revisions (updates) pushed directly to your system as they happen.
- Use Realtime Batch to download new updates in a tarball (.tar.gz). These are updated on the hour with the last day of updates available.

First step: Sign Up

Note: if you are a Wikimedia community member, you can get exclusive access to Wikimedia Enterprise APIs. Find out more here.

To get started, sign up for a free account (no credit card required), verify your email, then follow the next steps on this page.

On this page:

Getting your API Keys
Example API Calls
SDKs
Next Steps

Getting your API Keys

Now that you’ve set up and verified your account, send the username (all lowercase)and password you created to the /login endpoint to receive your tokens.

Wikimedia Enterprise APIs use JWT authentication passed in the header for access verification. All API requests must be made over HTTPS and must pass that Bearer access token in the header; without it the requests will fail.

curl -L https://auth.enterprise.wikimedia.com/v1/login -H "Content-Type: application/json" -d '{"username":"yourusername", "password":"secret"}'

Refresh tokens expire in 90 days.
Access and ID tokens expire in 24 hours.
Use your Refresh token to obtain a new Access token before it expires.

An important reminder: Your credentials carry many privileges related to your account, so be careful to keep them secure. Do not share your credentials in publicly accessible areas such as GitHub, client-side code, etc.

{
  "id_token": "string",
  "access_token": "string",
  "refresh_token": "string",
  "expires_in": 86400
}

Example API Calls

Now you are ready to make your first call. Start by running this cURL command using your valid access_token. You’ll receive a list of all supported projects available to you in our APIs, including the project name, identifier, language, and more.

Note: the project identifier (e.g. “enwiki” for English Wikipedia) is the identifier you will use to identify the project and language in requests.

curl -H "Authorization: Bearer ACCESS_TOKEN" -L https://api.enterprise.wikimedia.com/v2/projects

Next, try using the Snapshot API. Run this cURL command to download a compressed file (HTML dump) containing every article in English Wikipedia (it’s large).

Note: the “Snapshot identifier” is constructed with 3 items from our metadata endpoints: <language><project_name>_namespace_<number>

Uncompress that file to see NDJSON with each line representing a single article in the project. An example of an article’s payload can be seen on our API page, and to learn more about each field in the payload, see our data dictionary.

curl -H "Authorization: Bearer ACCESS_TOKEN" -L https://api.enterprise.wikimedia.com/v2/snapshots/enwiki_namespace_0/download --output enwiki.tar.gz

Next, try calling the On-demand API to get a single article (page) of interest. Let’s use NASA in English Wikipedia as an example, with this cURL command.

You’ll receive a JSON response containing the same article data and format represented in the Snapshot API’s file. The main difference here (besides being able to query pages individually) is that this response returns the live version of the article that is actively on the project today, whereas the Snapshot updates daily (for paid users) or twice-monthly (for free users).

curl -X 'POST' \
  'https://api.enterprise.wikimedia.com/v2/articles/NASA' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer ACCESS_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
  "filters": [
    {
      "field": "is_part_of.identifier",
      "value": "enwiki"
    }
  ],
  "limit": 1
}'

Software Development Kits (SDKs)

We have built some software development kits (SDK) in Go and Python to help you get started:

Go SDK

Python SDK

Next Steps

There you have it! You’ve created your account and made your first calls to Wikimedia Enterprise.

Choose which Enterprise APIs fit your use case and start making some requests.
The Data Dictionary explains our schema and response fields.
Import the OpenAPI definition YAML file into your favorite API discovery platform.
The Best Practices page describes optimal ways of using the Enterprise APIs.

If you have some additional questions, look through the FAQs or use your account dashboard to contact support or provide feedback.

Wikimedia Enterprise operates a status page at status.enterprise.wikimedia.com displaying uptime history and current status.

If you are interested in daily Snapshot dumps, Realtime firehose, or additional egress, contact us.