Structured Contents Initiative

More Structured Data, less BLOBs

Wikipedia uses wikitext, a markup language designed for formatting page content. While it has proven useful for editors authoring wiki articles, it creates complexity for developers parsing articles at scale.

When Wikimedia Enterprise launched in 2021, we built it to serve high-volume, high-frequency users of Wikimedia data. As part of that effort we also improved parsing by providing HTML blobs, a format that developers are more familiar with and for which many parsing libraries already exist.

The Structured Contents Initiative is the next step in serving easy-to-parse Wikimedia data. Currently in beta, it extracts infoboxes, sections, tables, references, and more from raw wikitext and HTML and delivers them as structured, machine-readable JSON.

What’s Available Now

Structured Contents currently extracts the following article pieces into JSON:

  • abstract
  • description
  • infobox
  • sections
  • citations & references
  • tables

For a full breakdown and explanation of the structured contents data schema in responses see our Data Dictionary: Beta section.

Wikitext blob compared with Structured Contents JSON

Showcase: BLOBs vs Structured Contents

Below are examples using Josephine Baker‘s English Wikipedia article. Each feature is shown side by side, comparing the raw HTML and wikitext BLOBs versus the clean JSON output from Structured Contents. These examples make it clear how the data is transformed and why it is easier for developers to use. Some of the payload output in these examples have been truncated (using […]).

Article Infoboxes

data-mw-deduplicate=\"TemplateStyles:r1295905060\" typeof=\"mw:Extension/templatestyles mw:Transclusion\" about=\"#mwt6\" data-mw='{\"name\":\"templatestyles\",\"attrs\":{\"src\":\"Module:Infobox/styles.css\"},\"body\":{\"extsrc\":\"\"},\"parts\":[{\"template\":{\"target\":{\"wt\":\"Infobox person\\n\",\"href\":\"./Template:Infobox_person\"},\"params\":{\"name\":{\"wt\":\"Josephine Baker\"},\"image\":{\"wt\":\"File:Baker Harcourt 1940 2.jpg\"},\"caption\":{\"wt\":\"Baker in 1940\"},\"birth_name\":{\"wt\":\"Freda Josephine McDonald\"},\"birth_date\":{\"wt\":\"{{birth date|mf=yes|1906|06|03}}\"},\"birth_place\":{\"wt\":\"[[St. Louis]], Missouri, U.S.\"}
[...]
{{Infobox person\n| name               = Josephine Baker\n| image              = File:Baker Harcourt 1940 2.jpg\n| caption            = Baker in 1940\n| birth_name         = Freda Josephine McDonald\n| birth_date         = {{birth date|mf=yes|1906|06|03}}\n| birth_place        = [[St. Louis]], Missouri, U.S.\n| [...]
"infoboxes": [{
  "name": "Infobox person",
  "type": "infobox",
  "has_parts": [
    {
      "name": "Josephine Baker",
      "type": "section",
      "has_parts": [
        {
          "type": "image",
          "value": "Baker in 1940",
          "images": [
            {
              "content_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/0/0b/Baker_Harcourt_1940_2.jpg/250px-Baker_Harcourt_1940_2.jpg",
              "caption": "Baker in 1940",
              "height": 250,
              "width": 250
            }
          ]
        },
        {
          "name": "Born",
          "type": "field",
          "value": "Freda Josephine McDonald June 3, 1906 St. Louis, Missouri, U.S.",
          "links": [
            {
              "url": "https://en.wikipedia.org/wiki/St._Louis",
              "text": "St. Louis"
            }
          ]
        },
        [...]

Article Tables

[...]<table class=\"wikitable sortable\" id=\"mwBpA\">\n<caption id=\"mwBpE\">Film credits for Josephine Baker</caption>\n<tbody id=\"mwBpI\"><tr id=\"mwBpM\">\n<th scope=\"col\" id=\"mwBpQ\">Year</th>\n<th scope=\"col\" id=\"mwBpU\">Title</th>\n<th scope=\"col\" id=\"mwBpY\">Role</th>\n<th scope=\"col\" class=\"unsortable\" id=\"mwBpc\">Notes</th>\n<th scope=\"col\" class=\"unsortable\" id=\"mwBpg\"><abbr title=\"Reference\" about=\"#mwt748\" typeof=\"mw:Transclusion mw:ExpandedAttrs\" data-mw='{\"attribs\":[[{\"txt\":\"title\"},{\"html\":\"&lt;span typeof=\\\"mw:Nowiki\\\" data-parsoid=\\\"{}\\\">Reference&lt;/span>\"}]],\"parts\":[{\"template\":{\"target\":{\"wt\":\"abbr\",\"href\":\"./Template:Abbr\"},\"params\":{\"1\":{\"wt\":\"Ref.\"},\"2\":{\"wt\":\"Reference\"}},\"i\":0}}]}' id=\"mwBpk\">Ref.</abbr></th></tr>\n<tr[...]
[...]\n\n== Film credits ==\n{| class=\"wikitable sortable\"\n|+Film credits for Josephine Baker\n|-\n! scope=\"col\"| Year\n! scope=\"col\"| Title\n! scope=\"col\"| Role\n! scope=\"col\" class=\"unsortable\"| Notes\n! scope=\"col\" class=\"unsortable\"| {{abbr|Ref.|Reference}}\n|-\n!scope=row| 1927\n| {{lang|fr|La Sirène des Tropiques}} (''[[Siren of the Tropics]]'')\n| Papitou\n| [[silent film]]\n|align=\"center\" |{{sfnp|Bergfelder|Harris|Street|2007|p=193}}{{sfnp|Francis|2021|p=68}}\n|-\n!scope=row| 1927\n| {{lang|de|Die Frauen von Folies Bergères}} (''[[The Woman from the Folies Bergères]]'')\n|\n| [[silent film]]\n|align=\"center\" |[...]
"tables": [
  {
    "identifier": "film_credits_table1",
    "headers": [
      [
        { "value": "Year" },
        { "value": "Title" },
        { "value": "Role" },
        { "value": "Notes" },
        { "value": "Ref." }
      ]
    ],
    "rows": [
      [
        { "value": "1927" },
        { "value": "La Sirène des Tropiques (Siren of the Tropics)" },
        { "value": "Papitou" },
        { "value": "silent film" },
        { "value": "" }
      ],
      [
        { "value": "1927" },
        { "value": "Die Frauen von Folies Bergères (The Woman from the Folies Bergères)" },
        { "value": "" },
        { "value": "silent film" },
        { "value": "" }
      ],[...]
    ],
    "confidence_score": 0.8
  }
],

Article Sections

id=\"mwIQ\"\u003eDuring her early career, Baker was among the most celebrated performers to headline the revues of the \u003cspan title=\"French-language text\" about=\"#mwt45\" typeof=\"mw:Transclusion\" data-mw='{\"parts\":[{\"template\":{\"target\":{\"wt\":\"lang\",\"href\":\"./Template:Lang\"},\"params\":{\"1\":{\"wt\":\"fr\"},\"2\":{\"wt\":\"[[Folies Bergère]]\"},\"italic\":{\"wt\":\"no\"}},\"i\":0}}]}' id=\"mwIg\"\u003e\u003cspan lang=\"fr\" style=\"font-style: normal;\"\u003e\u003ca rel=\"mw:WikiLink\" href=\"./Folies_Bergère\" title=\"Folies Bergère\"\u003eFolies Bergère\u003c/a\u003e\u003c/span\u003e\u003c/span\u003e\u003clink rel=\"mw:PageProp/Category\" href=\"./Category:Articles_containing_French-language_text\" about=\"#mwt45\" id=\"mwIw\"/\u003e in \u003ca rel=\"mw:WikiLink\" href=\"./Paris\" title=\"Paris\" id=\"mwJA\"\u003eParis\u003c/a\u003e.[...]
\n\nDuring her early career, Baker was among the most celebrated performers to headline the revues of the {{lang|fr|[[Folies Bergère]]|italic=no}} in [[Paris]].[...]
"sections": [{
  "type": "paragraph",
  "value": "During her early career, Baker was among the most celebrated performers to headline the revues of the Folies Bergère in Paris. [...]",
  "links": [
    {
      "url": "https://en.wikipedia.org/wiki/Folies_Bergère",
      "text": "Folies Bergère"
    },
    [...]
  ],
  "citations": [
    {
      "identifier": "cite_note-4",
      "text": "[4]"
    },
    [...]
  ]
}]

Article Description

[...]\u003e\u003cdiv class=\"shortdescription nomobile noexcerpt noprint searchaux\" style=\"display:none\" about=\"#mwt1\" typeof=\"mw:Transclusion\" data-mw='{\"parts\":[{\"template\":{\"target\":{\"wt\":\"short description\",\"href\":\"./Template:Short_description\"},\"params\":{\"1\":{\"wt\":\"American-born French entertainer (1906–1975)\"}},\"i\":0}}]}' id=\"mwAg\"\u003eAmerican-born French entertainer (1906–1975)\u003c/div[...]
{{short description|American-born French entertainer (1906–1975)}}\n
"description": "American-born French entertainer (1906–1975)"

How to Access Structured Contents

Structured Contents is currently available in two of our APIs:

On-demand API: Request individual articles from any project with structured JSON. Best for testing, post-training, or lightweight use.

Snapshot API: Get a compressed file of all articles in a project as structured JSON snapshots. Best for pre-training, indexing, and high-scale applications.

Shaping Structured Contents Together

In order to help us strengthen current features and shape new ones we welcome and encourage feedback on Structured Contents. Signing up for an account to our APIs provides the latest features, but to make experimentation easy we have also shared early versions of Structured Contents snapshots on open dataset platforms Hugging Face and Kaggle.

Wikimedians can also access beta Structured Contents through their Wikimedia Cloud Services accounts.

Structured Contents Payload Example

Our Structured Contents endpoints have the same familiar structure as our production responses, but also include beta fields and objects parsed from raw article data. Parsed objects that are unique to Structured Contents are: infoboxes, sections, description, references, and tables.

The On-demand API Structured Contents endpoint is freely available. Snapshot API Structured Contents dumps are available upon request.

Example: Run this cURL command with your access token (see auth docs) to get the Structured Contents response from the live English Josephine Baker Wikipedia article as seen here →

curl --location 'https://api.enterprise.wikimedia.com/v2/structured-contents/Josephine_Baker' --header 'Content-Type: application/json' --header 'Authorization: Bearer ACCESS_TOKEN' --data '{"filters":[{"field":"is_part_of.identifier","value":"enwiki"}]}'

For a full breakdown and explanation of all Structured Contents response fields, consult our Data Dictionary.

More questions? – We’re here to help.

[{
  "name": "Josephine Baker",
  "identifier": 255083,
  "abstract": "Freda Josephine Baker, naturalized as Joséphine Baker, was an...",
  "version": {...},
  "url": "https://en.wikipedia.org/wiki/Josephine_Baker",
  "date_created": "2003-06-29T19:16:19Z",
  "date_modified": "2025-09-08T23:58:22Z",
  "main_entity": {
    "identifier": "Q151972",
    "url": "https://www.wikidata.org/entity/Q151972"
  },
  "is_part_of": {...},
  "additional_entities": [...],
  "in_language": {...},
  "image": {...},
  "license": [...],
  "description": "American-born French entertainer (1906–1975)",
  "infoboxes": [
    {
      "name": "Infobox person",
      "type": "infobox",
      "has_parts": [
        {
          "name": "Josephine Baker",
          "type": "section",
          "has_parts": [
            {
              "type": "image",
              "value": "Baker in 1940",
              "images": [...]
            },
            {
              "name": "Born",
              "type": "field",
              "value": "Freda Josephine McDonald June 3, 1906 St. Louis, Missouri, U.S.",
              "links": [...]
            },
            {
              "name": "Died",
              "type": "field",
              "value": "April 12, 1975 (aged 68) Paris, France"
            },{...}
          ]
        },{...}
      ]
    },{...}
  ],
  "sections": [
    {
      "name": "abstract",
      "type": "section",
      "has_parts": [
        {
          "type": "paragraph",
          "value": "Freda Josephine Baker (née McDonald; June 3, 1906 – April 12, 1975), naturalized as...",
          "links": [...],
          "citations": [
            {
              "identifier": "cite_note-3",
              "text": "[3]"
            }
          ]
        },{...}
      ]
    },
    {
      "name": "film_credits",
      "type": "section",
      "has_parts": [
        {
          "type": "table",
          "table_references": [
            {
              "identifier": "film_credits_table1",
              "confidence_score": 0.8
            }
          ]
        }
      ]
    }
  ],
  "tables": [
    {
      "identifier": "film_credits_table1",
      "headers": [
        [
          { "value": "Year" },
          { "value": "Title" },
          { "value": "Role" },
          { "value": "Notes" },
          { "value": "Ref." }
        ]
      ],
      "rows": [
        [
          { "value": "1927" },
          { "value": "La Sirène des Tropiques (Siren of the Tropics)" },
          { "value": "Papitou" },
          { "value": "silent film" },
          { "value": "" }
        ],
        [
          { "value": "1927" },
          { "value": "Die Frauen von Folies Bergères (The Woman from the Folies Bergères)" },
          { "value": "" },
          { "value": "silent film" },
          { "value": "" }
        ],[...]
      ],
      "confidence_score": 0.8
    }
  ],
  "references": [
    {
      "identifier": "cite_note-3",
      "type": "book",
      "metadata": {
        "first": "Kathryn",
        "isbn": "978-1-55652-961-0",
        "last": "Atwood",
        "page": "77",
        "publisher": "Chicago Review Press",
        "title": "Women Heroes of World War II",
        "year": "2011"
      },
      "text": {
        "value": "Atwood, Kathryn (2011). Women Heroes of World War II. Chicago Review Press. p. 77. ISBN 978-1-55652-961-0.",
        "links": [
          {
            "url": "https://en.wikipedia.org/wiki/ISBN_(identifier)",
            "text": "ISBN"
          },
          {
            "url": "https://en.wikipedia.org/wiki/Special:BookSources/978-1-55652-961-0",
            "text": "978-1-55652-961-0"
          }
        ]
      }
    },{...}
  ]
}]

See the Production Payload Example