Wikimedia Enterprise API Documentation

Introduction

Welcome to the Wikimedia Enterprise API Documentation and Reference guide. Wikimedia Enterprise provides a suite of APIs designed for high-volume access to Wikimedia project data; including Wikipedia and many others. Regardless of which API you obtain data from, the format is the same so you don’t need to build multiple parsers. Our binary API endpoints (Snapshot and Realtime Batch) return NDJSON and our HTTP API endpoints default to JSON but can also return NDJSON with a simple header.

Here are a few examples of what you can do with Wikimedia Enterprise APIs:

Download an entire Wikimedia project:

  • Call the Snapshot API to receive a packaged tarball file containing every single article in a specific project; example being English Wikipedia or German Wiktionary.

Request the latest public version of a single article:

  • Query the On-demand API by article name, like “Albert_Einstein” or “SpaceX”, to receive the most recent article version in all projects. You an also use filters for language and/or project of choice.

Stream real-time updates (firehose) from supported projects into your system:

  • Connect to the Realtime API stream to have all article revisions (updates) pushed directly to your system as they happen.
  • Use Realtime Batch to download new updates in a packaged tarball file format. These are updated on the hour with the last day of updates available.

Getting Started – First Steps

Accessing Wikimedia Enterprise APIs requires authentication credentials, known as JWT tokens, that you must pass with each request. If you do not already have an account, signup for free and get started immediately. If you’re already signed up, let’s jump into it.

Getting your API Keys

Wikimedia Enterprise APIs use JWT authentication passed via the header for access control. Now that you’ve setup and verified your account, use the username (all lowercase) and password you created and send that to the /login endpoint to receive your tokens.

All API requests must be made over HTTPS. Calls without authentication in the header will fail.

curl -L https://auth.enterprise.wikimedia.com/v1/login -H "Content-Type: application/json" -d '{"username":"yourusername", "password":"secret"}'

Refresh token expires in 90 days, Access and ID tokens expire in 1 day. Use your Refresh token to obtain a new access token after it expires.

An important reminder: Your credentials carry many privileges related to your account, so be careful to keep them secure. Do not share your credentials in publicly accessible areas such as GitHub, client-side code, etc.

{
  "id_token": "string",
  "access_token": "string",
  "refresh_token": "string",
  "expires_in": 86400
}

Example API Calls

Now you are ready to make your first call. Start by running this cURL command using your valid access_token to fetch the list of supported projects in Wikimedia Enterprise APIs.

You should receive a list of all supported Wikimedia projects, including the project name, identifier, language, and more. If not, check your credentials, and try again. Note the project identifier (e.g. “enwiki” for English Wikipedia) as that is the identifier you will use to identify that project in other calls and responses.

curl -H "Authorization: Bearer ACCESS_TOKEN" -L https://api.enterprise.wikimedia.com/v2/projects

Next, try using the Snapshot API. Run this cURL command to download a compressed file containing every article in English Wikipedia (it’s large).

Note: the “Snapshot identifier” is constructed with 3 items from our metadata endpoints: <language><project_name>_namespace_<number>

Uncompress that file to see NDJSON with each line representing an article in the project. To learn about the data returned in each article object, see our data dictionary.

curl -H "Authorization: Bearer ACCESS_TOKEN" -L https://api.enterprise.wikimedia.com/v2/snapshots/enwiki_namespace_0/download --output enwiki.tar.gz

Next, try calling the On-demand API to get single article (page) of interest. Let’s use NASA in English Wikipedia as an example, with this cURL command.

You should have received a JSON response containing the same data as an article represented in the Snapshot API’s file. The main difference here (besides being able to query individually) is that this response returns the live version of the article that is actively on the project today, whereas the Snapshot updates daily (for paid users) or monthly (for free users). If you are interested in weekly or daily Snapshot dumps, contact us.

curl -X 'POST' \
  'https://api.enterprise.wikimedia.com/v2/articles/NASA' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer ACCESS_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
  "filters": [
    {
      "field": "is_part_of.identifier",
      "value": "enwiki"
    }
  ],
  "limit": 1
}'

Software development kits (SDKs)

Software development kits (SDK) in Go and Python to help you get started:

There you have it! You’ve created your account and made your first calls with Wikimedia Enterprise. Please explore the APIs, and the data dictionary that explains fields and schema. An OpenAPI spec yaml is available for you to ingest, and if you have some additional questions look through the FAQs or use your dashboard to contact support or provide feedback.