API Data Dictionary
In this document we give you a breakdown of the data structure / schema found in Wikimedia Enterprise APIs. We cover the fields and objects in response payloads, their data type, whether or not they’re always returned or only optionally, and an example. We also add some context for metadata like namespace IDs and their descriptions. All of our APIs are built around the same article response format, so this reference guide will be useful to gain a better understanding of the data you have access to.
Response payload tag reference:
required
– means that this field is always present in the response (unless you explicitly filter it out)optional
– means that the field might or might not be present in the payloadomitempty
– means that the field will be omitted from the encoded JSON output in two cases: 1) when the field is not applicable or the value of the field is not present for the article; 2) if its value is the “zero” value for its type:- 0 – for numeric types
- false – for the boolean type
- “” (the empty string) – for strings
- null – for objects
- [ ] – zero length entries for arrays
credibility – This badge designates that the field is a “Credibility Signal” which provides additional qualitative metadata in each article revision, empowering reusers to make better-informed decisions in real-time about how they might handle returned data.
(BETA) items listed below are part of our experimental Structured Contents endpoints in On-Demand API and Snapshot API.
Namespace IDs
Identifier
Name
Description
0
(Main)
The Article namespace, also known as the main or mainspace, is where a project’s primary content resides.
6
File
The Media namespace, identified as the File namespace, hosts all of a project’s multimedia content. This includes images, videos, audio clips, and other media files, each starting with the prefix ‘File:’.
10
Template
The Template namespace is dedicated to storing Wikitext templates for inclusion on multiple pages, primarily through transclusion. While most templates are in this namespace, some may reside in others, like the User namespace.
14
Category
The Category namespace groups pages on similar topics, using a MediaWiki feature that automatically lists pages marked with [[Category:XYZ]] in their markup. This helps in navigating and discovering related articles and topics.
For a deeper understanding of namespaces, see Wikipedia:Namespace
API Response Schema
Field Name
Type
Description
name
required
string
Name displayed at the top of the article page
Example: "Squirrel"
identifier
required
number
Unique identifier of the article.
Note: This is different from the revision identifier and maps directly to an article’s MediaWiki ID (primary key).
Example: 28492
abstract
optional
omitempty
string
Short summary of the article.
Example: "Squirrels are members of the family Sciuridae, a family that includes small or medium-size rodents. The squirrel family includes tree squirrels, ground squirrels, and flying squirrels. Squirrels are indigenous to the Americas, Eurasia, and Africa, and were introduced by humans to Australia. The earliest known fossilized squirrels date from the Eocene epoch, and among other living rodent families, the squirrels are most closely related to the mountain beaver and to the dormice."
number
Number of unique editors watching the page.
Example: 12
date_modified
required
string
Timestamp of the last revision of the article in RFC3339 format.
Example: "2021-08-31T04:51:39Z"
date_created
required
string
Timestamp of the article creation event, or the article’s first revision, in RFC3339 format.
Example: "2021-08-31T04:51:39Z"
date_previously_modified
optional
omitempty
string
Timestamp of the before-last revision in RFC3339 format.
Example: "2021-08-31T04:51:39Z"
array
List of community-specific protections and restrictions on the article. This is how you can tell which editor permissions are needed to work on this article, where:
- Type (
required
): the type of event that the protection is applied to - Level (
required
): editor status needed to operate in the type of protection - Expiry (
required
): timestamp (number, seconds), length of time that this protection or restriction is active, can beinfinite
,indefinite
,infinity
, ornever
, for a never-expiring protection
In order to figure out available projection levels for English Wiki for example use this API call. For more information use “Restrictions” section on this page.
[
{
"type": "edit",
"level": "autoconfirmed",
"expiry": "infinity"
},
{
"type": "move",
"level": "autoconfirmed",
"expiry": "infinity"
}
]
object
Metadata related to the latest specific revision of the article.
version.identifier
required
number
Revision ID. Unique identifier of the revision, different from the article identifier.
Example: 1041549311
string
Comment attached by the editor to the latest version. This returns a parsed response.
Note: When a revision is saved, the editor has the ability to leave a comment describing why they created this revision.
Example: "Sample comment describing reason for revision"
array
Tags attached to the version. You can find list of core change tags here. To figure out what tags that are being used on English Wikipedia for example, you can use Tags:API.
Note: Tags can be added to an article to signify that this revision was part of a campaign, specific project-run initiative, tracked edit type, or other Wikimedia event.
{
"mobile edit",
"mobile app edit",
"android app edit"
}
boolean
When an editor deems an assertion made in an article needs to be supported with a reference, it will carry this tag. For more information refer to this Wikipedia article.
Example: TRUE
boolean
Was this change considered minor by the editor. For more information use this Help page.
Note: When a revision is saved, the editor has the ability to mark whether this is a minor revision. Although this option is set by editors themselves and can lack consistency, minor revisions are typically grammar corrections and are less critical to review.
Example: FALSE
boolean
Was this revision marked stable by the project community.
Note: Depending on the project, there are different community approaches to having administrative approval on revisions as they happen. This will reflect whether or not the revision has been approved. While false does not mean this is a vandalized article, true is typically a good indicator that this is a good revision.
Example: FALSE
object
This object contains version scores calculated as part of Wikimedia’s LiftWing project.
There is one model included:
- Revertrisk (
optional
,omitempty
): predicts whether a revision may be reverted
Included in these models are three data points:
- Prediction (
optional
,omitempty
): wrapper object for the prediction, “False” values are omitted - Probability (
optional
,omitempty
): wrapper object for a probability object - Probability property true (
optional
,omitempty
): probability percentage of the prediction beingTrue
, 0.0 values are omitted - Probability property false (
optional
,omitempty
): probability percentage of the prediction beingFalse
, 0.0 values are omitted
[
"revertrisk": {
"prediction": true,
"probability": {
"true": 0.959002615965355,
"false": 0.040997384034645014
}
}
]
object
Editor-specific signals that can help contextualize the revision:
- Identifier (
optional
,omitempty
): unique MediaWiki ID for the editor (anonymous users have no Identifier value) - Name (
optional
,omitempty
): username of the editor (it’s optional because name can we removed for vandalism. For anonymous users the name is their IP address) - Edit Count (
optional
,omitempty
): number of edits this editor has made so far - Groups (optional,
omitempty
): set of groups this editor belongs to - Is Bot (optional,
omitempty
): signals if editor is a bot or not - Is Anonymous (
optional
,omitempty
): signals if editor is anonymous - Date Started (
optional
,omitempty
): displays start date of the editor in RFC3339 format - Is Admin (
optional
,omitempty
): signals if editor is admin or not - Is Patroller (
optional
,omitempty
): signals if editor is patroller or not - Has Advanced Rights (
optional
,omitempty
): checks if user has advanced rights
[
"identifier": 4904587,
"name": "USERNAME",
"groups": {
"extendedconfirmed",
"*",
"user",
"autoconfirmed"
},
"date_started": "2010-10-20T05:41:16Z",
"edit_count": 25123,
"is_anonymous": true,
"is_admin": true,
"is_patroller": true,
"is_bot": true,
"has_advanced_rights": true
]
object
Size information of the whole article at its current revision:
- Value (
optional
,omitempty
): actual size in bytes - Unit Text (
optional
,omitempty
): unit text for the Value, by default it’sB
(bytes)
{
"value": 12,
"unit_text": "B"
}
boolean
Is this version of the article considered a breaking news?
Example: TRUE
boolean
Is this version of the article non-indexable to search engines?
Example: FALSE
number
Number of the characters calculated from Wikitext.
Example: 305917
omitempty
object
Counts of occurrences of certain templates in the article body (wikitext).
- Citation Needed Count (
optional
,omitempty
): Count of occurrences of Citation needed template in the wikitext - PoV Count (
optional
,omitempty
): Count of occurrences of point of view (PoV) template in the wikitext - Clarification Needed Count (
optional
,omitempty
): Count of occurrences of Clarify template in the wikitext - Update Count (
optional
,omitempty
): Count of occurrences of Update template in the wikitext
{
"citation_needed_count": 8,
"pov_count": 1,
"clarification_needed_count": 1,
"update_count": 100
}
previous_version
optional
omitempty
object
Metadata related to the before last revision of the article.
- Identifier (
required
): identifier of the previous revision - Number Of Characters (
optional
,omitempty
): number of the characters calculated from Wikitext
{
"number_of_characters": 123,
"identifier": 17380
}
url
required
string
URL of the article
Example: "https://en.wikipedia.org/wiki/Squirrel"
namespace
required
object
Namespace that this article belongs to:
- Identifier (
required
): namespace identifier for the article
Note: Within Wikimedia, namespaces are used to define the type of article that you are looking at. This indicates the difference between articles, discussion pages, category pages, and other article types. Use namespaces API to get the list of supported namespaces and additional metadata.
{
"identifier": 0
}
in_language
required
object
Human language in which the article is written:
- Identifier (
required
): the language code for the article
Note: Use the languages APIs for a list of supported languages and additional metadata.
{
"identifier": "fr"
}
main_entity
optional
omitempty
object
Wikidata QID that this article is related to:
- Identifier (
required
): Wikidata QID article is related to - URL (
required
): link to the Wikidata QID
Note: Quick Help on QIDs
{
"identifier": "Q9482",
"url": "https://www.wikidata.org/entity/Q9482"
}
additional_entities
optional
omitempty
array
Array of Wikidata entities used in this article page and how they are used, where:
- Identifier (
required
): Wikidata QID / PID of the property or entity - URL (
required
): Wikidata QID / PID URL for the property or entity - Aspects (
required
): What aspects of the property or entity were used on this Article:S
– the entity’s sitelinks are usedL
– the entity’s label is usedD
– the entity’s description is usedT
– the title of the local page corresponding to the entity is usedC
– statements from the entity are usedX
– all aspects of an entity are or may be usedO
– something else about the entity is used
[
{
"identifier": "P1992",
"url": "https://www.wikidata.org/entity/P1992",
"aspects": {
"C.P1630"
}
},
{
"identifier": "P3031",
"url": "https://www.wikidata.org/entity/P3031",
"aspects": {
"C.P1630"
}
},
...
]
categories
optional
omitempty
array
Project categories that this article belongs to, where:
- Name (
required
): MediaWiki category name - URL (
required
): MediaWiki category URL
Note: Within Wikimedia, the Category namespace is used to aggregate articles around specific topics. At the article level, this helps conceptualize the landscape of topics an article belongs to.
[
{
"name": "Category:All articles to be expanded",
"url": "https://en.wikipedia.org/wiki/Category:All_articles_to_be_expanded"
}
]
templates
optional
omitempty
array
Wikitext templates used in this article, where:
- Name (
required
): MediaWiki template name - URL (
required
): MediaWiki template URL
Note: You can reference these if parsing the wikitext content directly.
[
{
"name": "Squirrel",
"url": "https://en.wikipedia.org/wiki/Squirrel"
},
{
"name": "Template:POV",
"url": "https://en.wikipedia.org/wiki/Template:POV"
},
{
"name": "Template:About",
"url": "https://en.wikipedia.org/wiki/Template:About"
},
{
"name": "Template:Anglicise rank",
"url": "https://en.wikipedia.org/wiki/Template:Anglicise_rank"
},
...
]
redirects
optional
omitempty
array
Wikimedia articles that redirect to this article.
- Name (
required
): MediaWiki redirect name - URL (
required
): MediaWiki redirect URL
Note: Each name value is likely an alternate language spelling, similar topic item, or general consolidation of the content around this article.
[
{
"url": "https://en.wikipedia.org/wiki/Sciuridae",
"name": "Sciuridae"
},
{
"url": "https://en.wikipedia.org/wiki/Sciurid",
"name": "Sciurid"
},
{
"url": "https://en.wikipedia.org/wiki/Squirrels",
"name": "Squirrels"
},
{
"url": "https://en.wikipedia.org/wiki/Bushy_tailed_tree_rat",
"name": "Bushy tailed tree rat"
},
...
]
is_part_of
required
array
Wikimedia project this article belongs to, where:
Identifier (required
): unique identifier of the project
Note: You can get a full list of projects using projects API.
{
"identifier": "enwiki"
}
article_body
optional
omitempty
object
Article content in HTML and wikitext. The HTML is optimized for parsing out content, visit here for DOM Specs.
- HTML (
optional
,omitempty
): parsed HTML of the article - Wikitext (
optional
,omitempty
): markup content of the article
This field maybe empty/omitted in case of delete and visibility-changed editor events
{
"html": "...html goes here...",
"wiktext": "...wikitext goes here..."
}
license
required
array
List of relevant licenses that affect this article and content reuse, where:
- Name (
required
): name of the license - Identifier (
required
): unique identifier of the license - URL (
required
): URL to the license description
[
{
"name":"Creative Commons Attribution Share Alike 3.0 Unported",
"identifier":"CC-BY-SA-3.0",
"url":"https://creativecommons.org/licenses/by-sa/3.0/"
}
]
object
If the editing community has flagged a particular, often older, revision as containing potentially damaging information, they will change its visibility. The three booleans offer insight into whether an article’s body, the revision’s editor, or an edit comment may contain harmful data. When these return “false” it indicates where the potentially harmful data is.
- Text (
required
): indicates if the text of this particular revision is visible - Editor (
required
): indicates if the editor name of this particular revision is visible - Comment (
required
): indicates if the comment attached to this particular revision is visible
Present only in visibility-change
event type, see event
field in the dictionary.
{
"text": true,
"editor": false,
"comment": false
}
image
optional
omitempty
object
The main image for the article, where:
- Content URL (
required
): link to the image - Width (
optional
,omitempty
): width of the image in pixels - Height (
optional
,omitempty
): height of the image in pixels
{
"content_url": "https://upload.wikimedia.org/wikipedia/commons/6/68/Sciuridae.jpg",
"width": 600,
"height": 600
}
event
required
object
This object is important for the Realtime and Realtime Batch API, helps identify how to interpret and handle responses from those API:
- Identifier (
required
): UUID of the event to track it through the Wikimedia Enterprise system - Date Created (
required
): date when this particular event entered Wikimedia Enterprise system - Type (
required
):visibility-change
,update
ordelete
(note: visibility-change is in Realtime API only) - Date Published (
optional
): timestamp (in RFC3339 format) when this event was published to the partition it belongs to (Realtime API only) - Partition (
optional
): the partition this event belongs to (Realtime API only) - Offset (
optional
): this event’s offset in the partition it belongs to (Realtime API only)
{
"identifier":"e69c5020-5b60-4a03-98c2-9c572fe0a0f6",
"type":"update",
"date_created":"2023-04-10T16:05:39.751737Z",
"date_published": "2023-04-10T16:31:57.033Z",
"partition": 4,
"offset": 3593806
}
NOTE: Fields listed below are items that are only in our BETA endpoint. These are experimental and not covered by SLA; we do not recommend these for use in a production environment. You can read more about the Structured Contents beta endpoint in our article. These fields and data can and will change over time and these will eventually graduate into production endpoints when they’re ready.
description
(BETA)
optional
omitempty
string
Short description of what the article is about, shorter and more concise than in the ‘abstract‘ field.
Read about the structured contents beta release with parsed Wikipedia infobox.
Example: "Family of rodents"
infobox
(BETA)
optional
omitempty
object
Array of parsed infobox parts. This is recursive tree like data structure that will contain parsed parts of the page infobox:
- Name (
optional
): name of the part - Type (
required
): type of the page part (field
,infobox
,section
,image
,list
) - Value (
optional
): value of the page part, specific to thefield
type, will contain a string value - Values (
optional
): list of values for the page part, specific to thelist
type, will contain array of strings - HasParts (
optional
): will contain a list of sub “parts” for the current object - Images (
optional
): will contain a list of images for the part, see top level image object - Links (
optional
): list of links with URL, Text, and Images (if there are any in the link itself)
Read about the structured contents beta release with parsed Wikipedia infobox.
[
{
"name": "Automatic taxobox",
"type": "infobox",
"has_parts": [
{
"name": "Kingdom:",
"type": "field",
"value": "Animalia",
"links": [
{
"url": "https:https://en.wikipedia.org/wiki/Animal",
"text": "Animalia"
}
]
},
{
"name": "Phylum:",
"type": "field",
"value": "Chordata",
"links": [
{
"url": "https:https://en.wikipedia.org/wiki/Chordate",
"text": "Chordata"
}
]
}
]
}
]
sections
(BETA)
optional
omitempty
object
An article is made up of many HTML sections. The parser extracts raw text from <section>
tags and outputs their child for paragraphs <p>
and links <a href...>
Note: A section can have descendant sections, these will be output in JSON within the has_parts
array. Therefore the sections
object is an array of sections with each top level section having children:
- Type (
required
): type of the page part (section
,paragraph
) - Name (
optional
): this is the first HTML header (h2-h6) in this section, if the first section has no header then we set the name to “Abstract” - Value (
optional
): is the plain text in a section or paragraph, it ignores:table
,input
,script
,style
,link
,sub
,sup
,.reflist
,#External_links
,mw-reflink-text
- Links (
optional
): list of links with URL, Text, and Images (if there are any in the link itself) - HasParts (
optional
): will contain a list of sub “parts” for the current <section>
[
{
"name": "Abstract",
"type": "section",
"has_parts": [
{
"type": "paragraph",
"value": "Squirrels are members of the family Sciuridae...",
"links": [
{
"url": "https://en.wikipedia.org/wiki/Family_(biology)",
"text": "family"
},
{
"url": "https://en.wikipedia.org/wiki/Rodent",
"text": "rodents"
},...
]
}
]
},
{
"name": "Taxonomy",
"type": "section",
"has_parts": [
{
"type": "paragraph",
"value": "The living squirrels are divided into five subfamilies, with about 58 genera and some 285 species . The oldest squirrel fossil, Hesperopetes, dates back to the Chadronian (late Eocene, about 40–35 million years ago) and is similar to modern flying squirrels.",
"links": [...]
}...
]
},...
]