Wikimedia Data Primer
Explaining where the data from Enterprise’s APIs comes from
What are the differences between Wikimedia, Wikipedia, and MediaWiki? Who writes and edits all these articles? What API should I use to access article content?
The Wikimedia ecosystem is complex, diverse, and can be easy to get lost in. This is a crash course on what you need to know to start understanding this ecosystem and make informed decisions on how to extract, analyse, and interpret data coming from wiki projects.
This primer describes the different wiki projects and organisations, explains languages, namespaces, data domains, and shows how articles are created, edited, and maintained. Consult the links under each paragraph for more in-depth resources about the topics discussed here.
The Wikimedia Ecosystem
The Wikimedia movement is the global community of contributors to Wikimedia projects, including Wikipedia. Contributors to wiki projects come in many forms. Some are anonymous contributors, others registered editors or admins. Contributors in the Wikimedia movement are unpaid volunteers.
- Types of user accounts
- Wikipedians
- Watch: What is the Wikimedia movement? (1 min)
- Watch: How to contribute to the Wikimedia movement (1 min)
- Watch: Why does Wikipedia matter in the age of AI? (1 min)
The Wikimedia Foundation is the American 501(c)(3) nonprofit organization that provides the technical and organizational infrastructure to enable members of the public to develop wiki-based content in languages across the world. The foundation does not write or curate the content of its projects.
- Watch: What does the Wikimedia Foundation do? (1 min)
- Watch: What does it take to run Wikipedia? (1 min)
- About the Wikimedia Foundation
Wikimedia Enterprise is a team within the Wikimedia Foundation that develops and markets a set of commercial APIs. These APIs provide Wikimedia project data to customers who need endpoints with high availability, scalability, and data throughput.
For questions or support for the Wikimedia Enterprise APIs, visit the Wikimedia Enterprise Help Center or open a support ticket through your account dashboard.
- About Wikimedia Enterprise
- Tech updates from Wikimedia Enterprise
- Project data available through Wikimedia Enterprise
The Wikimedia Foundation develops other APIs, too. The main software underpinning most wiki projects is called MediaWiki. The Wikimedia Foundation develops a set of APIs that interact with MediaWiki software: retrieving articles, writing new data to a MediaWiki instance, analysing user data… The Wikimedia Enterprise APIs currently only give access to a subset of data from a subset of wiki projects. If you want to extract and analyse data from a wiki project that isn’t being served through the Wikimedia Enterprise APIs, access that data through one of these foundational APIs, or through readily available data dumps.
- About MediaWiki
- Find out how to retrieve data through other APIs or data dumps: Data Domains
- Visit the Wikimedia Developer Portal
For questions or support for the MediaWiki APIs or other foundational APIs, go to the ‘Get Help’ section of the Wikimedia Developer Portal.
Other Projects
Wikipedia is just one of many projects in the Wikimedia ecosystem. Wikimedia Enterprise APIs give access to Wikipedia, Wiktionary, Wikivoyage, Wikibooks, Wikiversity, Wikiquote, Wikisource, and Wikinews. Projects are often interlinked, referring to one another in some way.
- Project data available through Wikimedia Enterprise
- All projects in the Wikimedia movement
- Watch: What projects does the Wikimedia Foundation support? (1 min)
Wikidata
Wikidata acts as a central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others. Wikidata is structured as a graph database, with nodes, edges, and properties. The Wikidata repository consists mainly of items, uniquely identified by a Q-number, such as Earth (Q2). Statements describe detailed characteristics of an Item and consist of a property and a value. Properties in Wikidata have a P-number, such as ‘Part Of (P361)’.
For every Wikipedia article, there is a Wikidata Q-number, which in turn links to all articles about that topic across Wikimedia projects. Some information on Wikipedia pages and on other project pages is sourced directly from Wikidata. Many templates on Wikipedia and other projects will be automatically populated with information from Wikidata, e.g. infoboxes on Wikipedia pages about people.
Wikidata is developed and maintained by Wikimedia Deutschland, an independent Wikimedia chapter.
- Introduction to Wikidata
- Templates using data from Wikidata
- Wikidata access for developers
- Learn more about Templates and transclusion
- Get Wikidata technical support
Wikimedia Commons
Wikimedia Commons is a media file repository of public domain and freely licensed educational media content (images, sound and video clips) to everyone, in their own language. It acts as a common repository for the various projects of the Wikimedia Foundation. Wikimedia Commons hosts the media used on wiki projects, such as the images on Wikipedia pages or videos on wikivoyage. Media files in the Wikimedia Enterprise API output will have a Wikimedia Commons URL.
The next parts of this primer will mainly cover Wikipedia, but most of the concepts explained below are also applicable to other wiki projects.
Understanding Articles and Languages
Wikipedia exists in more than 355 languages, with each language version operating as its own distinct project. An article’s content and its editing rules are specific to its language project. This is true for almost all other wiki projects, with the notable exception of Wikidata.
Every language is its own project. An article on “Paris” in English Wikipedia is not automatically translated or synchronized with the article on “Paris” in French Wikipedia. Articles are created and maintained by volunteers in each language community. There is no automated translation process.
- About multilingual coordination
- Interlanguage links explained
- More information about translation of Wikipedia articles
Wiki projects divide their content into different Namespaces. These subdivisions mostly come from the prefix in the name of a wiki page. A user page, e.g. User:Quiddity, is part of the User Namespace. Article pages, e.g. NATO, are part of the Article namespace, which has no prefix. Namespaces have numbers associated with them: The User Namespace is Namespace 2, the Article Namespace is Namespace 0. Wikimedia Enterprise gives access to Namespaces 0, 6, 10, and 14.
- Info about all Namespaces on Wikipedia
- Wikimedia Enterprise Namespace access
- Namespaces and their content may vary
Wiki project articles are crowdsourced: anyone can contribute to an article. You don’t need to sign up for an account, and you don’t need any training. Content on Wikipedia has to follow core principles (see ‘the five pillars’) and policies (see ‘core content policies’). Misinformation, disinformation, and vandalism are all handled through robust moderation tools and processes run by volunteers. All of the discussions and edit histories on wiki articles are publicly available to enable these moderation tools and ensure transparency.
- Watch: Who is in charge of content on Wikipedia? (1 min)
- Watch: If volunteers edit Wikipedia, how can you trust it? (1 min)
- Watch: How is misinformation addressed on Wikipedia? (1 min)
- Who writes Wikipedia
- The five pillars of Wikipedia
- Wikipedia Core Content policies
- User Groups and their permissions
Learn More
The Wikimedia ecosystem is an intricate and ever-changing organism. Coming across data that is hard to interpret or analyse is completely normal. If you have more questions about the output of Enterprise’s APIs, please contact our support team (e.g. by opening a ticket through your user dashboard).
- Wikimedia Enterprise API documentation
- Our public documentation, including authentication, endpoints, data schema, SDKs, and everything you’d need to start making API requests.
- Wikimedia Enterprise’s Data Dictionary
- Explains every field returned in the Wikimedia Enterprise’s API Responses in detail.
- Wikimedia Enterprise Blog
- Public announcements of new features and functionality.
- Wikimedia Enterprise Help Center
- Our public FAQs.
- Diff
- A blog by and for the Wikimedia volunteer community to connect and share learnings, stories, and ideas from across the movement.
- MediaWiki news
- Updates on the MediaWiki software: new releases, Maintenance info, Security updates, and more.
Contact Sales to get upgraded access to Wikimedia Enterprise APIs.