API Data Dictionary

In this document we give you a breakdown of the data structure / schema found in Wikimedia Enterprise APIs. We cover the fields and objects in response payloads, their data type, whether or not they’re always returned or only optionally, and an example. We also add some context for metadata like namespace IDs and their descriptions. All of our APIs are built around the same article response format, so this reference guide will be useful to gain a better understanding of the data you have access to.

Response payload tag reference:

  • required – means that this field is always present in the response (unless you explicitly filter it out)
  • optional – means that the field might or might not be present in the payload
  • omitempty – means that the field will be omitted from the encoded JSON output in two cases: 1) when the field is not applicable or the value of the field is not present for the article; 2) if its value is the “zero” value for its type:
    • 0 – for numeric types
    • false – for the boolean type
    • “” (the empty string) – for strings
    • null – for objects
    • [ ] – zero length entries for arrays

credibility – This badge designates that the field is a “Credibility Signal” which provides additional qualitative metadata in each article revision, empowering reusers to make better-informed decisions in real-time about how they might handle returned data.

(BETA) items listed below are part of our experimental Structured Contents endpoint in On-Demand.

Namespace IDs

Identifier

Name

Description

0

(Main)

The Article namespace, also known as the main or mainspace, is where a project’s primary content resides.

6

File

The Media namespace, identified as the File namespace, hosts all of a project’s multimedia content. This includes images, videos, audio clips, and other media files, each starting with the prefix ‘File:’.

10

Template

The Template namespace is dedicated to storing Wikitext templates for inclusion on multiple pages, primarily through transclusion. While most templates are in this namespace, some may reside in others, like the User namespace.

14

Category

The Category namespace groups pages on similar topics, using a MediaWiki feature that automatically lists pages marked with [[Category:XYZ]] in their markup. This helps in navigating and discovering related articles and topics.

API Response Schema

Field Name

Type

Description

name

required

string

Name displayed at the top of the article page

Example: "Squirrel"

identifier

required

number

Unique identifier of the article.

Note: This is different from the revision identifier and maps directly to an article’s MediaWiki ID (primary key).

Example: 28492

abstract

optional

omitempty

string

Short summary of the article.

Example: "Squirrels are members of the family Sciuridae, a family that includes small or medium-size rodents. The squirrel family includes tree squirrels, ground squirrels, and flying squirrels. Squirrels are indigenous to the Americas, Eurasia, and Africa, and were introduced by humans to Australia. The earliest known fossilized squirrels date from the Eocene epoch, and among other living rodent families, the squirrels are most closely related to the mountain beaver and to the dormice."

watchers_count

credibility

optional

omitempty

number

Number of unique editors watching the page.

Example: 12

date_modified

required

string

Timestamp of the last revision of the article in RFC3339 format.

Example: "2021-08-31T04:51:39Z"

date_created

required

string

Timestamp of the article creation event, or the article’s first revision, in RFC3339 format.

Example: "2021-08-31T04:51:39Z"

date_previously_modified

optional

omitempty

string

Timestamp of the before-last revision in RFC3339 format.

Example: "2021-08-31T04:51:39Z"

protection

credibility

optional

omitempty

array

List of community-specific protections and restrictions on the article. This is how you can tell which editor permissions are needed to work on this article, where:

  • Type (required): the type of event that the protection is applied to
  • Level (required): editor status needed to operate in the type of protection
  • Expiry (required): timestamp (number, seconds), length of time that this protection or restriction is active, can be infinite, indefinite, infinity, or never, for a never-expiring protection

In order to figure out available projection levels for English Wiki for example use this API call. For more information use “Restrictions” section on this page.

[
  {
    "type": "edit",
    "level": "autoconfirmed",
    "expiry": "infinity"
  },
  {
    "type": "move",
    "level": "autoconfirmed",
    "expiry": "infinity"
  }
]

version

credibility

required

object

Metadata related to the latest specific revision of the article.

version.identifier

required

number

Revision ID. Unique identifier of the revision, different from the article identifier.

Example: 1041549311

version.comment

credibility

optional

omitempty

string

Comment attached by the editor to the latest version. This returns a parsed response.

Note: When a revision is saved, the editor has the ability to leave a comment describing why they created this revision.

Example: "Sample comment describing reason for revision"

version.tags

credibility

optional

omitempty

array

Tags attached to the version. You can find list of core change tags here. To figure out what tags that are being used on English Wikipedia for example, you can use Tags:API.

Note: Tags can be added to an article to signify that this revision was part of a campaign, specific project-run initiative, tracked edit type, or other Wikimedia event.

{
  "mobile edit",
  "mobile app edit",
  "android app edit"
}

version.has_tag_needs_citation

credibility

optional

omitempty

boolean

When an editor deems an assertion made in an article needs to be supported with a reference, it will carry this tag. For more information refer to this Wikipedia article.

Example: TRUE

version.is_minor_edit

credibility

optional

omitempty

boolean

Was this change considered minor by the editor. For more information use this Help page.

Note: When a revision is saved, the editor has the ability to mark whether this is a minor revision. Although this option is set by editors themselves and can lack consistency, minor revisions are typically grammar corrections and are less critical to review.

Example: FALSE

version.is_flagged_stable

credibility

optional

omitempty

boolean

Was this revision marked stable by the project community.

Note: Depending on the project, there are different community approaches to having administrative approval on revisions as they happen. This will reflect whether or not the revision has been approved. While false does not mean this is a vandalized article, true is typically a good indicator that this is a good revision.

Example: FALSE

version.scores

credibility

optional

omitempty

object

This object contains version scores calculated as part of Wikimedia’s LiftWing project.

There is one model included:

  • Revertrisk (optional, omitempty): predicts whether a revision may be reverted

Included in these models are three data points:

  • Prediction (optional, omitempty): wrapper object for the prediction
  • Probability true (required): probability percentage of the prediction being True
  • Probability false (required): probability percentage of the prediction being False
[
  "revertrisk": {
    "prediction": true,
    "probability": {
      "true": 0.959002615965355,
      "false": 0.040997384034645014
    }
  }
]

version.editor

credibility

optional

omitempty

object

Editor-specific signals that can help contextualize the revision:

  • Identifier (optional): unique MediaWiki ID for the editor
  • Name (required): username of the editor
  • Edit Count (optional, omitempty): number of edits this editor has made so far
  • Groups (optional, omitempty): set of groups this editor belongs to
  • Is Bot (optional, omitempty): signals if editor is a bot or not
  • Is Anonymous (optional, omitempty): signals if editor is anonymous
  • Date Started (optional, omitempty): displays start date of the editor in RFC3339 format
  • Is Admin (optional, omitempty): signals if editor is admin or not
  • Is Patroller (optional, omitempty): signals if editor is patroller or not
  • Has Advanced Rights (optional, omitempty): checks if user has advanced rights
[
  "identifier": 4904587,
  "name": "USERNAME",
  "groups": {
    "extendedconfirmed",
    "*",
    "user",
    "autoconfirmed"
  },
  "date_started": "2010-10-20T05:41:16Z",
  "edit_count": 25123,
  "is_anonymous": true,
  "is_admin": true,
  "is_patroller": true,
  "is_bot": true,
  "has_advanced_rights": true
]

version.size

credibility

optional

omitempty

object

Size information of the whole article at its current revision:

  • Value (optional, omitempty): actual size in bytes
  • Unit Text (optional, omitempty): unit text for the Value, by default it’s B (bytes)
{
  "value": 12,
  "unit_text": "B"
}

version.is_breaking_news

credibility

optional

omitempty

boolean

Is this version of the article considered a breaking news?

Example: TRUE

version.noindex

credibility

omitempty

boolean

Is this version of the article non-indexable to search engines?

Example: FALSE

version.number_of_characters

credibility

optional

omitempty

number

Number of the characters calculated from Wikitext.

Example: 305917

version.maintenance_tags

optional

omitempty

object

Counts of occurrences of certain templates in the article body (wikitext).

  • Citation Needed Count (optional, omitempty): Count of occurrences of Citation needed template in the wikitext
  • PoV Count (optional, omitempty): Count of occurrences of point of view (PoV) template in the wikitext
  • Clarification Needed Count (optional, omitempty): Count of occurrences of Clarify template in the wikitext
  • Update Count (optional, omitempty): Count of occurrences of Update template in the wikitext
{
  "citation_needed_count": 8,
  "pov_count": 1,
  "clarification_needed_count": 1,
  "update_count": 100
}

previous_version

optional

omitempty

object

Metadata related to the before last revision of the article.

  • Identifier (required): identifier of the previous revision
  • Number Of Characters (optional, omitempty): number of the characters calculated from Wikitext
{
  "number_of_characters": 123,
  "identifier": 17380
}

url

required

string

URL of the article

Example: "https://en.wikipedia.org/wiki/Squirrel"

namespace

required

object

Namespace that this article belongs to:

  • Identifier (required): namespace identifier for the article

Note: Within Wikimedia, namespaces are used to define the type of article that you are looking at. This indicates the difference between articles, discussion pages, category pages, and other article types. Use namespaces API to get the list of supported namespaces and additional metadata.

{
  "identifier": 0
}

in_language

required

object

Human language in which the article is written:

  • Identifier (required): the language code for the article

Note: Use the languages APIs for a list of supported languages and additional metadata.

{
  "identifier": "fr"
}

main_entity

optional

omitempty

object

Wikidata QID that this article is related to:

  • Identifier (required): Wikidata QID article is related to
  • URL (required): link to the Wikidata QID

Note: Quick Help on QIDs

{
  "identifier": "Q9482",
  "url": "https://www.wikidata.org/entity/Q9482"
}

additional_entities

optional

omitempty

array

Array of Wikidata entities used in this article page and how they are used, where:

  • Identifier (required): Wikidata QID / PID of the property or entity
  • URL (required): Wikidata QID / PID URL for the property or entity
  • Aspects (required): What aspects of the property or entity were used on this Article:
    • S – the entity’s sitelinks are used
    • L – the entity’s label is used
    • D – the entity’s description is used
    • T – the title of the local page corresponding to the entity is used
    • C – statements from the entity are used
    • X – all aspects of an entity are or may be used
    • O – something else about the entity is used
[
  {
    "identifier": "P1992",
    "url": "https://www.wikidata.org/entity/P1992",
    "aspects": {
      "C.P1630"
    }
  },
  {
    "identifier": "P3031",
    "url": "https://www.wikidata.org/entity/P3031",
    "aspects": {
      "C.P1630"
    }
  },
  ...
]

categories

optional

omitempty

array

Project categories that this article belongs to, where:

  • Name (required): MediaWiki category name
  • URL (required): MediaWiki category URL

Note: Within Wikimedia, the Category namespace is used to aggregate articles around specific topics. At the article level, this helps conceptualize the landscape of topics an article belongs to.

[
  {
    "name": "Category:All articles to be expanded",
    "url": "https://en.wikipedia.org/wiki/Category:All_articles_to_be_expanded"
  }
]

templates

optional

omitempty

array

Wikitext templates used in this article, where:

  • Name (required): MediaWiki template name
  • URL (required): MediaWiki template URL

Note: You can reference these if parsing the wikitext content directly.

[
  {
    "name": "Squirrel",
    "url": "https://en.wikipedia.org/wiki/Squirrel"
  },
  {
    "name": "Template:POV",
    "url": "https://en.wikipedia.org/wiki/Template:POV"
  },
  {
    "name": "Template:About",
    "url": "https://en.wikipedia.org/wiki/Template:About"
  },
  {
    "name": "Template:Anglicise rank",
    "url": "https://en.wikipedia.org/wiki/Template:Anglicise_rank"
  },
  ...
]

redirects

optional

omitempty

array

Wikimedia articles that redirect to this article.

  • Name (required): MediaWiki redirect name
  • URL (required): MediaWiki redirect URL

Note: Each name value is likely an alternate language spelling, similar topic item, or general consolidation of the content around this article.

[
  {
    "url": "https://en.wikipedia.org/wiki/Sciuridae",
    "name": "Sciuridae"
  },
  {
    "url": "https://en.wikipedia.org/wiki/Sciurid",
    "name": "Sciurid"
  },
  {
    "url": "https://en.wikipedia.org/wiki/Squirrels",
    "name": "Squirrels"
  },
  {
    "url": "https://en.wikipedia.org/wiki/Bushy_tailed_tree_rat",
    "name": "Bushy tailed tree rat"
  },
  ...
]

is_part_of

required

array

Wikimedia project this article belongs to, where:

Identifier (required): unique identifier of the project

Note: You can get a full list of projects using projects API.

{
  "identifier": "enwiki"
}

article_body

required

object

Article content in HTML and wikitext. The HTML is optimized for parsing out content, visit here for DOM Specs.

  • HTML (optional, omitempty): parsed HTML of the article
  • Wikitext (optional, omitempty): markup content of the article
{
  "html": "...html goes here...",
  "wiktext": "...wikitext goes here..."
}

license

required

array

List of relevant licenses that affect this article and content reuse, where:

  • Name (required): name of the license
  • Identifier (required): unique identifier of the license
  • URL (required): URL to the license description
[
  {
     "name":"Creative Commons Attribution Share Alike 3.0 Unported",
     "identifier":"CC-BY-SA-3.0",
     "url":"https://creativecommons.org/licenses/by-sa/3.0/"
  }
]

visibility

credibility

optional

omitempty

object

If the editing community has flagged a particular, often older, revision as containing potentially damaging information, they will change its visibility. The three booleans offer insight into whether an article’s body, the revision’s editor, or an edit comment may contain harmful data. When these return “false” it indicates where the potentially harmful data is.

  • Text (required): indicates if the text of this particular revision is visible
  • Editor (required): indicates if the editor name of this particular revision is visible
  • Comment (required): indicates if the comment attached to this particular revision is visible

Present only in visibility-change event type, see event field in the dictionary.

{
  "text": true,
  "editor": false,
  "comment": false
}

image

optional

omitempty

object

The main image for the article, where:

  • Content URL (required): link to the image
  • Width (optional, omitempty): width of the image in pixels
  • Height (optional, omitempty): height of the image in pixels
{
  "content_url": "https://upload.wikimedia.org/wikipedia/commons/6/68/Sciuridae.jpg",
  "width": 600,
  "height": 600
}

event

required

object

This object is important for the Realtime and Realtime Batch API, helps identify how to interpret and handle responses from those API:

  • Identifier (required): UUID of the event to track it through the WME system
  • Date Created (required): date when this particular event entered WME system
  • Type (required): visibility-change, update or delete
  • Date Published (optional): timestamp (in RFC3339 format) when this event was published to the partition it belongs to
  • Partition (optional): the partition this event belongs to
  • Offset (optional): this event’s offset in the partition it belongs to
{
  "identifier":"e69c5020-5b60-4a03-98c2-9c572fe0a0f6",
  "type":"update",
  "date_created":"2023-04-10T16:05:39.751737Z",
  "date_published": "2023-04-10T16:31:57.033Z",
  "partition": 4,
  "offset": 3593806
}

NOTE: Fields listed below are items that are only in our BETA endpoint. These are experimental and not covered by SLA. You can read more about the Structured Contents beta endpoint in our article. More will be added over time and these will eventually graduate into production endpoints when they’re ready.

description

(BETA)

optional

omitempty

string

Short description of what the article is about, shorter and more concise than in the ‘abstract‘ field.

Read about the structured contents beta release with parsed Wikipedia infobox.

Example: "Family of rodents"

infobox

(BETA)

optional

object

Array of parsed infobox parts. This is recursive tree like data structure that will contain parsed parts of the page infobox:

  • Name (optional): name of the part
  • Type (required): type of the page part (field, infobox, section, image, list)
  • Value (optional): value of the page part, specific to the field type, will contain a string value
  • Values (optional): list of values for the page part, specific to the list type, will contain array of strings
  • HasParts (optional): will contain a list of sub “parts” for the current object
  • Images (optional): will contain a list of images for the part, see top level image object
  • Links (optional): list of links with URL, Text, and Images (if there are any in the link itself)

Read about the structured contents beta release with parsed Wikipedia infobox.

[
  {
    "name": "Automatic taxobox",
    "type": "infobox",
    "has_parts": [
      {
        "name": "Kingdom:",
        "type": "field",
        "value": "Animalia",
        "links": [
          {
            "url": "https:https://en.wikipedia.org/wiki/Animal",
            "text": "Animalia"
          }
        ]
      },
      {
        "name": "Phylum:",
        "type": "field",
        "value": "Chordata",
        "links": [
          {
            "url": "https:https://en.wikipedia.org/wiki/Chordate",
            "text": "Chordata"
          }
        ]
      }
    ]
  }
]

article_sections

(BETA)

optional

object

An article is made up of many HTML sections. The parser extracts raw text from <section> tags and outputs their child for paragraphs <p>, lists <ul> <ol>, and links <a href...>
Note: A section can have descendant sections, these will be output in JSON within the has_parts array. Therefore the article_sections object is an array of sections with each top level section having children:

  • Type (required): type of the page part (section, list, list_item, paragraph)
  • Name (optional): this is the first HTML header (h2-h6) in this section, if the first section has no header then we set the name to “Abstract”
  • Value (optional): is the plain text in a section or paragraph, it ignores: table, input, script, style, link, sub, sup, .reflist, #External_links, mw-reflink-text
  • Links (optional): list of links with URL, Text, and Images (if there are any in the link itself)
  • HasParts (optional): will contain a list of sub “parts” for the current <section>
[
  {
    "name": "Abstract",
    "type": "section",
    "has_parts": [
      {
        "type": "paragraph",
        "value": "Squirrels are members of the family Sciuridae...",
        "links": [
          {
            "url": "https://en.wikipedia.org/wiki/Family_(biology)",
            "text": "family"
          },
          {
            "url": "https://en.wikipedia.org/wiki/Rodent",
            "text": "rodents"
          },...
        ]
      }
    ]
  },
  {
    "name": "Taxonomy",
    "type": "section",
    "has_parts": [
      {
        "type": "paragraph",
        "value": "The living squirrels are divided into five subfamilies, with about 58 genera and some 285 species . The oldest squirrel fossil, Hesperopetes, dates back to the Chadronian (late Eocene, about 40–35 million years ago) and is similar to modern flying squirrels.",
        "links": [...]
      }...
    ]
  },...
]
lastmod: 2024-01-18T21:45:35+00:00