Skip to content

Wikimedia Europe

Visual Portfolio, Posts & Image Gallery for WordPress

Benh LIEU SONG (Flickr), CC BY-SA 4.0, via Wikimedia Commons

NASA Goddard Space Flight Center from Greenbelt, MD, USA, Public domain, via Wikimedia Commons

Michael S Adler, CC BY-SA 4.0, via Wikimedia Commons

Charles J. Sharp, CC BY-SA 4.0, via Wikimedia Commons

Stefan Krause, Germany, FAL, via Wikimedia Commons

Markus Trienke, CC BY-SA 2.0, via Wikimedia Commons

JohnDarrochNZ, CC BY-SA 4.0, via Wikimedia Commons

Wikipedia Is Running On Its Own Metal: The Power and Limits of Self-Hosted Infrastructure

The recent AWS outage served as yet another reminder of how much of the modern internet depends on a handful of cloud providers. When the service experienced widespread disruptions in October 2025, countless websites and applications went dark. The cascading failures illustrated a drawback of the cloud-dependent infrastructure.

Luckily, Wikipedia and its sister projects hummed along without interruption. It is a little known fact that the Wikimedia Foundation runs its own servers in several places around the world. This has some advantages, but also poses specific challenges. Let’s take a look!

Running Our Own Metal

While much of the world has embraced “the cloud”, and there are indeed  many advantages to it, Wikipedia and its sister projects—including Wikimedia Commons and Wikidata—take a decidedly different approach. The Wikimedia Foundation operates its own data centers, running thousands of servers across multiple facilities around the globe. This isn’t a small operation: Wikipedia consistently ranks among the most-visited websites worldwide, serving billions of page views each month, all without relying on AWS, Google Cloud, or Azure.

The Wikimedia Foundation maintains server clusters in several locations, including facilities in the US, Amsterdam, Marseille, Singapore and, most recently, Brazil. These data centres house the physical hardware that stores Wikipedia’s vast repository of knowledge and handles the enormous traffic load. The infrastructure includes caching servers distributed globally to ensure fast page loads regardless of where users are located, database servers managing the constant stream of edits and updates, and application servers running the MediaWiki software that powers the wikis themselves.

This self-hosted approach gives Wikimedia control over its infrastructure and reduces dependencies. When problems arise, they’re handled internally rather than waiting for third-party providers. The Wikimedia Foundation’s site reliability engineers can optimise performance specifically for Wikipedia’s unique use case—a read-heavy workload with periodic bursts of write activity when news breaks or popular topics trend.

The AI Challenge

One thing the system isn’t optimised for is the massive increase in crawler bots, caused by the advent of AI large language models. Bots work differently than humans. Humans tend to view the same articles in large numbers when a specific event occurs – a football game, an election, a new Pope being announced. This way content can be cached locally in the region it is most needed. This saves resources – bandwidth and energy.

Crawlers work differently. They access many unrelated articles in very, very quick succession. This makes regional caching impossible or inefficient. As prompts are unrelated,  content needs to be delivered globally, which is more expensive. As of 2025, 65% of our most expensive traffic comes from crawler bots.

To address this Wikimedia has established Wikimedia Enterprise. This is a wholly owned entity that offers very large re-users a “direct pipe” to our servers, through a dedicated API. This API is not free of charge for commercial entities. The aim here is twofold: Firstly, to ensure our infrastructure continues to serve humans as a priority. Secondly, small donors, who currently make up the vast majority of Wikimedia’s budget, shouldn’t inadvertently subsidise infrastructure needed only by large AI developers. 

One discomfort this currently creates is that Wikimedia Enterprises does use AWS. This is mainly because Wikimedia didn’t want to invest donor money into infrastructure catering to very large for-profit entities. The way it works is that the project content is mirrored on AWS and enterprise clients get served from there. Now that Wikimedia Enterprise is off the ground, it is a discussion worth having if Wikimedia shouldn’t use part of the revenue to build its own AWS-level infrastructure. First for Wikimedia Enterprise, but then perhaps for other free & open projects?

The Open Source Imperative

Beyond the hardware, we also consider software as infrastructure. Wikimedia maintains a firm policy against relying on proprietary-only solutions for services and infrastructure when open-source alternatives exist. This isn’t merely a preference; it’s a core value that shapes infrastructure decisions at every level. The servers run Linux, the databases use open-source systems like MySQL and MariaDB, and the caching layer relies on open-source tools like Varnish. Even the MediaWiki platform, that runs Wikipedia and the other projects, is free software developed by the Wikimedia Foundation, Wikimedia Deutschland (for Wikibase) and thousands of volunteers. 

This commitment ensures that Wikipedia’s infrastructure can be audited, understood, and replicated by anyone. There are no black boxes, no proprietary vendor lock-ins, and no dependence on closed-source systems that could compromise  independence or user privacy. When you’re running one of humanity’s largest repositories of knowledge, being beholden to proprietary software vendors isn’t just philosophically inconsistent—it’s a strategic vulnerability.

The Other AI Challenge

Wikimedia projects aren’t just a data resource for AI training. Wikimedia also develops and runs some of its own ML/AI applications, such as tools to help volunteer editors recognise vandalism or categorise content. 

To do so, the Wikimedia Foundation needs GPU chips, just as everyone else. But while Nvidia GPU chips are the dominant standard by now, Wikimedia has intentionally decided to choose another product – AMD GPUs

The single most important reason for this  choice by the Wikimedia Foundation is that AMD are currently the only ones releasing their software stack open source. If we had opted for Nvidia chips we would have needed to rely on proprietary drivers and tools. 

It is worth mentioning that this choice has come at a significant cost. Developers had to dedicate a lot of time to overcome basic gaps. Some of the tools and drivers had to be developed in house, as they didn’t yet exist. In other instances, shortcomings of the stack needed to be dealt with. 

Perhaps an interesting anecdote on the side: As Wikimedia runs its own servers, the tech team had to measure space inside the server chassis before being sure about what card available on the market would fit. This task looks exactly as it sounds like.

Looking Ahead

It is important to emphasise that “the cloud” is not something inherently good or bad. It is a very useful set of technologies that benefits humans in many ways. The same goes for AI. 

However, we believe that having many different and competing hosting, service and revenue models makes both technology and society more resilient. We believe that investing in open-source software and independent infrastructure is necessary to that end.