The constant evolution of Wikipedia’s content has never been more evident than in 2024, with English Wikipedia alone seeing over 31 million edits. As we wrap up the year, let’s explore how our Enterprise APIs have evolved to help harness this wealth of information more effectively, as well as examine the patterns in Wikipedia’s usage data.
In this article:
Living Knowledge: 2024 Wikipedia by the Numbers
Wikipedia’s vibrant community continued to expand and refine the world’s knowledge base throughout 2024. English Wikipedia saw remarkable activity, including:
- Over 3,417 MB of new text content added to English Wikipedia
- 31,251,861 edits to English Wikipedia articles
- An average of 342 edits per minute globally, totaling 81,987,181 changes
These numbers underscore the nature of Wikipedia’s content–precisely the kind of real-time information our APIs help to access and analyze.
Most-Accessed Content: A Window into Global Interests
The most-viewed articles of 2024¹ offer fascinating insights into what captured the world’s attention. The Deaths in 2024 article led with over 44 million pageviews, while the 2024 U.S. Presidential Election article drew nearly 28 million views.Entertainment and pop culture also drove significant traffic this year. The highly anticipated Deadpool & Wolverine movie article garnered over 22.3 million page views. All of this content is queryable through our Enterprise APIs, enabling access to these trends for application development and research.
An API for Wikipedia Articles
Our APIs provide access to human generated and curated content from many Wikimedia projects beyond Wikipedia, including Wiktionary, Wikiquote, Wikibooks, Wikisource, Wikinews, Wikiversity, Wikivoyage, and Wikidata. Each project provides unique content that can be valuable both independently, as a means to enrich training data, or to enhance a knowledge graph on any given topic.
To start exploring this wealth of data, start by creating a free Wikimedia Enterprise account. Once the authentication credentials are secured, begin accessing any Wikipedia article through our On-demand API or grab complete dataset dumps via our Snapshot API.
Here are some practical examples using cURL to query a few of those most-popular articles; just replace {YOUR_AUTH_TOKEN} with the actual authentication token (see Authentication docs):
The Deaths in 2024 page
curl --location 'https://api.enterprise.wikimedia.com/v2/articles/Deaths_in_2024' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {YOUR_AUTH_TOKEN}' \
--data '{
"filters": [
{
"field":"is_part_of.identifier", "value": "enwiki"
}
]
}'
The 2024 U.S. Presidential Election page
curl --location 'https://api.enterprise.wikimedia.com/v2/articles/2024_United_States_presidential_election' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {YOUR_AUTH_TOKEN}' \
--data '{
"filters": [
{
"field":"is_part_of.identifier", "value": "enwiki"
}
]
}'
Because this page has a very strong infobox, it might be interesting to see our Structured Contents endpoint and how that differs from the HTML blob in the first response. Fully parsed article sections, a short description, and the aforementioned infobox parsed out. Try it out with the cURL command below.
curl --location 'https://api.enterprise.wikimedia.com/v2/structured-contents/2024_United_States_presidential_election' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {YOUR_AUTH_TOKEN}' \
--data '{
"filters": [
{
"field":"is_part_of.identifier", "value": "enwiki"
}
]
}'
The Deadpool & Wolverine page
curl --location 'https://api.enterprise.wikimedia.com/v2/articles/Deadpool_%26_Wolverine' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {YOUR_AUTH_TOKEN}' \
--data '{
"filters": [
{
"field":"is_part_of.identifier", "value": "enwiki"
}
]
}'
Major Enterprise API Enhancements in 2024
This year brought many improvements to our API ecosystem, making Wikimedia project data more accessible and structured than ever before.
Enhanced Data Structure & Access
We have revolutionized how to interact with Wikipedia content by introducing parsed Article Body Sections and short Descriptions. This transformation makes complex Wikipedia articles more machine-readable and granular. We also enhanced our metadata framework with additional credibility signals, including no-index fields, maintenance tags, and breaking news indicators. The introduction of RevertRisk scoring provides new ways to assess content quality and make informed decisions about dataset updates.
API Infrastructure Improvements
Our infrastructure saw significant advancement with the implementation of Parallel Connections and Restart support in Realtime API, which allows up to 10 simultaneous connections for improved data throughput and 48-hour event retention for reliable data recovery.
Our Structured Contents beta endpoints were improved with expanded data features and optimized performance; more to come in 2025.
AI & Machine Learning Integration
2024 marked our entry into the AI/ML space with two major initiatives: the release of our official dataset on Hugging Face 🤗 and the launch of beta Structured Contents in Snapshot API, supporting six Wikipedia languages. Our involvement at NeurIPS 2024 conference, alongside the Common Crawl Foundation, fostered important discussions about nonprofit organizations in the AI/ML landscape. In 2025 we’ll continue the community outreach and will be attending more conferences and hackathons as we can.
Developer Experience Enhancements
We’ve improved our developer toolset with an upgraded Go SDK and a new Python SDK. These come complete with OAuth authentication libraries and practical example directories, creating a more developer-friendly ecosystem for rapid integration and development. Our developer documentation has seen improvements as well; while there’s more to do, we have already heard praise for the changes so far.
Democratizing Data Access
Perhaps most importantly, we have made significant strides in democratizing access to our APIs, as free API accounts now come with recurring monthly credits and more frequent data updates. Our On-demand API now offers 5,000 monthly recurring requests, replacing the previous one-time limit, and Snapshot API frequency increased to twice-monthly releases. These improvements reflect our commitment to making enterprise-grade API performance and open data accessible to developers and researchers at all levels.
Looking Ahead to 2025
As we approach 2025, we are excited to help developers and researchers leverage the expansiveness within Wikimedia project data through our Enterprise APIs. The continuing evolution of Wikimedia project data, specifically Wikipedia’s content, from breaking news to cultural phenomena, creates endless opportunities for innovative applications and training insights.
Stay tuned for more updates and enhancements in 2025!
— Wikimedia Enterprise Team
Sources:
¹ Stats of Most Popular Articles Source: https://wikimediafoundation.org/news/2024/12/03/announcing-english-wikipedias-most-popular-articles-of-2024/
Photo Credits
Wild golden eagle and Majinghorn (Pfyn-Finges, Switzerland), CC BY 4.0, via Wikimedia Commons