a 15th Century drawing of a medieval scribe at work in his scriptorium

Article Images and Lists now in Structured Contents payloads

Structured Contents endpoints now parse and return article images and lists as structured JSON, directly within each section of a Wikimedia project article. Both features are available today in the On-demand API and Snapshot API.

Since launching the Structured Contents Initiative, the goal has been straightforward: deliver Wikimedia project data in clean, ready-to-use JSON so your pipeline doesn’t have to build its own parser. Images and list parsing are the next major step toward that. Every image attached to an article section, and every list whether ordered, unordered, or definition-style, now comes through in full structured detail.

In this article: Article Images | Article Lists

All Article Images now included

Wikipedia articles in Structured Contents have always had a main image representing the article as a whole; this field remains unchanged. What’s new is that images appearing throughout the article body are now parsed and included in the has_parts array of the section they belong to, returned alongside the text content they illustrate.

This matters because context is everything. An image of Josephine Baker in military uniform means something different sitting next to a caption and a paragraph about her World War II intelligence work than it does in isolation. By attaching each image to its section, Structured Contents preserves that relationship: the image, the surrounding text, and the caption all arrive together in a single coherent object. For search indexing, AI pipelines, and content applications, that co-located context is what makes the image actually useful rather than just present.

 The 'World War II' section of the Josephine Baker English Wikipedia article body, with a section image shown on the right, with the caption 'Baker in uniform, 1948'.
The ‘World War II’ section of the Josephine Baker English Wikipedia article body, with a section image shown on the right, with the caption ‘Baker in uniform, 1948’.

Each section image also includes encoding_format and media_type alongside its URL, dimensions, identifier, name, and caption so your application knows exactly what it’s receiving before it fetches anything. Decorative icons and images smaller than 16px are excluded, keeping payloads focused on content that carries meaning.

"name": "World War II",
"type": "section",
"has_parts": [
  {
    "type": "image",
    "images": [
      {
        "content_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3e/Baker_Harcourt_1948.jpg/250px-Baker_Harcourt_1948.jpg",
        "identifier": "5a03b69c25031ef6fdfee65e4b378020468f0d637ffafc0eb1018f03c4030ca3",
        "name": "File:Baker_Harcourt_1948.jpg",
        "caption": "Baker in uniform, 1948",
        "height": 344,
        "width": 250,
        "encoding_format": "image/jpeg",
        "media_type": "bitmap"
      }
    ]
  },[...]

Article Lists content is now available

Lists are one of the most information-dense structures in any Wikipedia article. Competition results, glossary terms, chronological timelines, participant rosters: information that is precise, ordered, and often critical to the subject of the article. Structured Contents now parses all of them, preserving the full hierarchy, nesting, and structure of the original article content and delivering it as clean JSON in the has_parts array.

Three list types are supported, each mapped to the structure Wikipedia uses to present them.

Unordered lists capture collections where the items belong together but have no required order. Think named participants in a conflict, award nominees, or feature sets. Ordered lists are for sequences where position carries meaning, and Structured Contents guarantees that order is preserved exactly as it appears in the article. Definition lists handle the term-and-description pattern common in glossaries, lexicons, and disambiguation-style content, keeping each term paired with its definition in the structured output.

Inline links within list items are preserved and returned with their URL and anchor text, so relationships between list content and other Wikipedia articles are not lost in parsing. Nested lists, lists inside infoboxes, and flat lists are all handled. Empty lists are omitted rather than returned as empty objects, keeping payloads clean.

Unordered lists

Unordered list in the 'History' section of the Paralympic Powerlifting English Wikipedia page
Unordered list in the ‘History’ section of the Paralympic Powerlifting English Wikipedia page
"name": "History",
"type": "section",
"has_parts": [
  {
    "type": "list",
    "has_parts": [
  {
    "type": "list_item",
    "value": "1964–1984: Wheelchair Powerlifting"
  },
  {
    "type": "list_item",
    "value": "1984–2016: Paralympic Powerlifting / IPC Powerlifting"
  },
  {
    "type": "list_item",
    "value": "2017–present: Para Powerlifting"
  }]
}]

Ordered lists

Ordered lists will always retain their order in Structured Contents JSON.

An ordered list showing the different definitions of 'game' on the Glossary of card game terms English Wikipedia page
An ordered list showing the different definitions of ‘game’ on the Glossary of card game terms English Wikipedia page
{
  "type": "ordered_list",
  "has_parts": [
    {
    "type": "list_item",
    "value": "A pastime in general, usually involving some form of competing.",
"links": [
      {
      "url":      "https://en.wikipedia.org/wiki/Glossary_of_card_game_terms#cite_note-FOOTNOTEPhillips1957401-64"
      }
    ]
  },
  {
    "type": "list_item",
    "value": "A variant of a basic game e.g. Gin Rummy or Wendish Schafkopf.",
    "links": [
      {
      "url": "https://en.wikipedia.org/wiki/Gin_Rummy",
    "text": "Gin Rummy"
    },
 [...]

Definition lists

A definition list starting with a definition for the term 'game points' on the Glossary of card game terms English Wikipedia page
A definition list starting with a definition for the term ‘game points’ on the Glossary of card game terms English Wikipedia page
{
  "type": "definition_list",
  "has_parts": [{
    "type": "definition_term",
    "value": "game points",
    "has_parts": [{
      "type": "definition",
      "value": "In point-trick games, the       score awarded to the players based on the outcome of a hand, the game value of a contract and any bonuses earned. Game points are accumulated (or deducted) to decide the overall winner. Not to be confused with card points.",
      "links": [{
        "url":         "https://en.wikipedia.org/wiki/Point-trick_game",
        "text": "point-trick games"
      },
      {
        "url":    "https://en.wikipedia.org/wiki/Glossary_of_card_game_terms#hand",
        "text": "hand"
      },
      {
        "url":     "https://en.wikipedia.org/wiki/Glossary_of_card_game_terms#contract",
        "text": "contract"
      },
      {
      "url" "https://en.wikipedia.org/wiki/Glossary_of_card_game_terms#bonus",
        "text": "bonuses"
      },
      {
        "url":
"https://en.wikipedia.org/wiki/Glossary_of_card_game_terms#card_points",
        "text": "card points"
      }]
    }]
  }]
}

Structured Contents is a living initiative. As it evolves, the features added reflect what teams building on Wikimedia project data actually need. Image and list parsing both grew directly from feedback from developers and organizations using the On-demand and Snapshot APIs. If you’re working with Structured Contents endpoints and have opinions on what we should prioritize next, we’d like to hear from you.

For a full overview of the initiative and what’s currently available and on the roadmap, visit the Structured Contents Initiative page.

Get Started

Structured Contents payloads are available for free in the On-demand API today. The same image and list parsing is also available across Snapshot API files for teams that need bulk access. Sign up for a free account to get started, or contact our sales team to discuss your use case.

— The Wikimedia Enterprise Team

Photo Credits

Portrait of Jean Miélot, by Jean le Tavernier, Public Domain Mark, via Wikimedia Commons