Structured Contents Initiative
More Structured Data, less BLOBs
Wikipedia uses wikitext, a markup language designed for formatting page content. While it has proven useful for editors authoring wiki articles, it creates complexity for developers parsing articles at scale.
When Wikimedia Enterprise launched in 2021, we built it to serve high-volume, high-frequency users of Wikimedia data. As part of that effort we also improved parsing by providing HTML blobs, a format that developers are more familiar with and for which many parsing libraries already exist.
The Structured Contents Initiative is the next step in serving easy-to-parse Wikimedia data. Currently in beta, it extracts infoboxes, sections, tables, references, and more from raw wikitext and HTML and delivers them as structured, machine-readable JSON.
What’s Available Now
Structured Contents currently extracts the following article pieces into JSON:
abstract
description
infobox
sections
citations
&references
tables
For a full breakdown and explanation of the structured contents data schema in responses see our Data Dictionary: Beta section.

Showcase: BLOBs vs Structured Contents
Below are examples using Josephine Baker‘s English Wikipedia article. Each feature is shown side by side, comparing the raw HTML and wikitext BLOBs versus the clean JSON output from Structured Contents. These examples make it clear how the data is transformed and why it is easier for developers to use. Some of the payload output in these examples have been truncated (using […]).
Article Description
[...]\u003e\u003cdiv class=\"shortdescription nomobile noexcerpt noprint searchaux\" style=\"display:none\" about=\"#mwt1\" typeof=\"mw:Transclusion\" data-mw='{\"parts\":[{\"template\":{\"target\":{\"wt\":\"short description\",\"href\":\"./Template:Short_description\"},\"params\":{\"1\":{\"wt\":\"American-born French entertainer (1906–1975)\"}},\"i\":0}}]}' id=\"mwAg\"\u003eAmerican-born French entertainer (1906–1975)\u003c/div[...]
{{short description|American-born French entertainer (1906–1975)}}\n
"description": "American-born French entertainer (1906–1975)"
Article Sections
id=\"mwIQ\"\u003eDuring her early career, Baker was among the most celebrated performers to headline the revues of the \u003cspan title=\"French-language text\" about=\"#mwt45\" typeof=\"mw:Transclusion\" data-mw='{\"parts\":[{\"template\":{\"target\":{\"wt\":\"lang\",\"href\":\"./Template:Lang\"},\"params\":{\"1\":{\"wt\":\"fr\"},\"2\":{\"wt\":\"[[Folies Bergère]]\"},\"italic\":{\"wt\":\"no\"}},\"i\":0}}]}' id=\"mwIg\"\u003e\u003cspan lang=\"fr\" style=\"font-style: normal;\"\u003e\u003ca rel=\"mw:WikiLink\" href=\"./Folies_Bergère\" title=\"Folies Bergère\"\u003eFolies Bergère\u003c/a\u003e\u003c/span\u003e\u003c/span\u003e\u003clink rel=\"mw:PageProp/Category\" href=\"./Category:Articles_containing_French-language_text\" about=\"#mwt45\" id=\"mwIw\"/\u003e in \u003ca rel=\"mw:WikiLink\" href=\"./Paris\" title=\"Paris\" id=\"mwJA\"\u003eParis\u003c/a\u003e.[...]
\n\nDuring her early career, Baker was among the most celebrated performers to headline the revues of the {{lang|fr|[[Folies Bergère]]|italic=no}} in [[Paris]]. [...]
"sections": [{
"type": "paragraph",
"value": "During her early career, Baker was among the most celebrated performers to headline the revues of the Folies Bergère in Paris. [...]",
"links": [
{
"url": "https://en.wikipedia.org/wiki/Folies_Bergère",
"text": "Folies Bergère"
},
[...]
],
"citations": [
{
"identifier": "cite_note-4",
"text": "[4]"
},
[...]
]
}]
Article Infoboxes
data-mw-deduplicate=\"TemplateStyles:r1295905060\" typeof=\"mw:Extension/templatestyles mw:Transclusion\" about=\"#mwt6\" data-mw='{\"name\":\"templatestyles\",\"attrs\":{\"src\":\"Module:Infobox/styles.css\"},\"body\":{\"extsrc\":\"\"},\"parts\":[{\"template\":{\"target\":{\"wt\":\"Infobox person\\n\",\"href\":\"./Template:Infobox_person\"},\"params\":{\"name\":{\"wt\":\"Josephine Baker\"},\"image\":{\"wt\":\"File:Baker Harcourt 1940 2.jpg\"},\"caption\":{\"wt\":\"Baker in 1940\"},\"birth_name\":{\"wt\":\"Freda Josephine McDonald\"},\"birth_date\":{\"wt\":\"{{birth date|mf=yes|1906|06|03}}\"},\"birth_place\":{\"wt\":\"[[St. Louis]], Missouri, U.S.\"}
[...]
{{Infobox person\n| name = Josephine Baker\n| image = File:Baker Harcourt 1940 2.jpg\n| caption = Baker in 1940\n| birth_name = Freda Josephine McDonald\n| birth_date = {{birth date|mf=yes|1906|06|03}}\n| birth_place = [[St. Louis]], Missouri, U.S.\n| [...]
"infoboxes": [{
"name": "Infobox person",
"type": "infobox",
"has_parts": [
{
"name": "Josephine Baker",
"type": "section",
"has_parts": [
{
"type": "image",
"value": "Baker in 1940",
"images": [
{
"content_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/0/0b/Baker_Harcourt_1940_2.jpg/250px-Baker_Harcourt_1940_2.jpg",
"caption": "Baker in 1940",
"height": 250,
"width": 250
}
]
},
{
"name": "Born",
"type": "field",
"value": "Freda Josephine McDonald June 3, 1906 St. Louis, Missouri, U.S.",
"links": [
{
"url": "https://en.wikipedia.org/wiki/St._Louis",
"text": "St. Louis"
}
]
},
[...]
How to Access Structured Contents
Structured Contents is currently available in two of our APIs:
On-demand API: Request individual articles from any project with structured JSON. Best for testing, post-training, or lightweight use.
Snapshot API: Get a compressed file of all articles in a project as structured JSON snapshots. Best for pre-training, indexing, and high-scale applications.
Get Started
Take the next step and begin working with Structured Contents today:
Stay up to date:
Additional Reading
Explore details of past releases and articles about Structured Contents.
Shaping Structured Contents Together
In order to help us strengthen current features and shape new ones we welcome and encourage feedback on Structured Contents. Signing up for an account to our APIs provides the latest features, but to make experimentation easy we have also shared early versions of Structured Contents snapshots on open dataset platforms Hugging Face and Kaggle.
Wikimedians can also access beta Structured Contents through their Wikimedia Cloud Services accounts.