2025 was a landmark year for the open knowledge movement. When the Digital Public Goods Alliance (DPGA) officially recognized Wikipedia as a digital public good, it cemented the project’s status as critical global infrastructure. As generative AI and large language models became the default interface for millions of users this year, the demand for this human-curated, constantly updated data has never been higher.
This shift has fundamentally changed how the world interacts with Wikimedia content. For the developers and organizations building the next generation of knowledge tools, relying on traditional web scraping methods is no longer sufficient. Wikimedia Enterprise has evolved to meet this moment by providing a dedicated, commercial-grade pipeline. We are helping organizations integrate essential knowledge at scale, grounding their models in verifiable fact while ensuring the ecosystem remains sustainable for everyone.
New Features: Unlocking Granular Data
This year, we introduced substantial improvements to the Enterprise APIs designed to give developers deeper access to the knowledge within Wikimedia projects. By expanding our Structured Contents capabilities and releasing new Quality Scoring Models, we are making it easier to retrieve granular, context-rich data while assessing its reliability at scale.
The Evolution of Tables
Some of the most valuable data in Wikipedia exists within HTML tables, covering everything from election results to financial reports. While this information has always been accessible within the HTML article body, accurately parsing it alongside the surrounding content has historically been a significant challenge due to irregular formatting, merged cells, and nesting.
In 2025, we took a major step forward by introducing Parsed Wikipedia Tables to our Structured Contents endpoints. This feature transforms these complex HTML structures into clean, semantic JSON, allowing developers to ingest high-value lists and statistics directly without writing custom parsers for every edge case.
Contextual Citations
We also overhauled how sources are handled within the Structured Contents payload. In standard HTML payloads, associating a reference with the specific sentence it supports requires the user to build their own complex linking logic. Our updated API now handles this heavy lifting, delivering references and citations as parsed objects linked directly to the specific sections of text they support. This granular linkage allows AI models to verify facts at the sentence level, preserving the context that is vital for grounding generative AI in verifiable truth.
Quality Scoring Models
To complement this granular data, we rolled out two new Quality Scoring Models across all endpoints. These models provide immediate, programmatic insight into the reliability of the content you are ingesting:
- Reference Need: Scores a section of text based on whether it requires a citation but lacks one, aligning with Wikipedia’s core ‘verifiability’ policy.
- Reference Risk: Evaluates the existing references on a page to flag potential reliability issues or volatile sources.
By combining the precision of Parsed Tables and Contextual Citations with the high-level oversight of Quality Scores, we have created a data pipeline that prioritizes not just volume, but verifiability.
Partners: Building a Sustainable Ecosystem
In 2025, we welcomed a diverse range of new partners to the platform. These organizations are moving away from custom crawlers and choosing a direct relationship with the data source. By doing so, they are getting better technical performance while also supporting a healthier digital ecosystem. Our partners recognize that clear attribution and sustainable sourcing are what keep the volunteer communities thriving, ensuring the data pipeline remains fresh for years to come.
Here are a few of the partnerships and use cases we highlighted this year:
- Ecosia partnered with us to drive sustainable search innovation.
- Pleias entered a partnership to drive open, ethical AI training.
- Prorata.AI joined us to foster a healthier digital content economy.
- Nomic AI utilized Enterprise datasets to create rich visualizations of multilingual Wikipedia.
- Reef Media is developing new ways to fact-check and verify sources using our APIs.
New Horizons: 2026 and Beyond
As Wikipedia approaches its 25th birthday in 2026, we are reflecting on how to ensure these projects remain vibrant for the next quarter-century.
A major milestone in that journey is stability. We are proud to report that Wikimedia Enterprise has become a self-sustaining, profitable initiative this fiscal year. For our customers, this is a crucial metric as it means we are here to stay. You can integrate our APIs into your production environments with the confidence that Wikimedia Enterprise is a stable, long-term solution for high-volume access to the world’s largest educational resource.
In 2026, we look forward to new data sources being integrated directly into our API endpoints, along with expanded payload data to identify knowledge patterns and trends. We are excited to onboard new customers, provide free access to more researchers, and partner with mission-aligned organizations building a better world with open data.
Build better models with verifiable facts!
See you in 2026!
— Wikimedia Enterprise Team
Photo Credits
Wild baby chamois in the Aletsch Forest Nature Reserve, CC BY 4.0, via Wikimedia Commons

