Wikipedia & AI Competition: Biases, Mistakes, Omissions

Competition is a good thing. Wikipedia’s free licences explicitly welcome it. We have seen other platforms and encyclopaedias appear in the past, and we will see more in the future.

The latest batch of competition that wants to harness AI technology to generate better compendiums of knowledge. These projects criticise things like gaps in coverage, reliable or alleged political biases.Let’s have a look at what’s out there and discuss some of the aspects!

Projects Using AI to Generate Encyclopedic Content

xAI’s Grokipedia is not the first to explore using Al for generating Wikipedia-like articles. Here is a list of several others, courtesy of The Signpost editors:

A small website called WikiGen.ai (one developer’s side project) already offers “automatically create[d] comprehensive articles on any topic you can imagine. Unlike traditional wikis that require human editors, our Al instantly generates well-structured, informative content tailored to your preferred reading level”.
“Botipedia”, a project by INSEAD professor Philip M. Parker (which has been under development since at least 2021 and moved to an LLM-based approach more recently), reacted to Musk’s September announcement by asserting that it had already launched version 0.5 of its “truth-seeking Al with 400B+ articles, 6,000x bigger than Wikipedia” (although a later tweet clarified it will only be “Open to all in 2026. For now, limited to edu/org/corp emails while we scale”). Larry Sanger praised it as “one of the most interesting new competitors of Wikipedia”. A promotional video portrays Botipedia as being superior to Wikipedia due to its inclusionism and language diversity: “No subject, event, language or geography is too obscure to merit an article, meaning that no language gets left behind.”
The task of using LLMs to write Wikipedia-like articles has been the object of numerous academic research efforts for years (see e.g. our 2024 coverage of “STORM”, a particularly notable project out of Stanford University that has also seen considerable real-life usage).
Lastly, like Wikipedia itself, Grokipedia could also be seen as competing with ChatGPT Deep Research and similar offerings by OpenAl’s competitors (like Gemini Deep Research) that generate cited reports on a user-specified topic.

Licenses

Wikipedia and its sister projects are freely licensed. One argument for this choice is that barriers to access and re-use for knowledge should be kept at bay. We want knowledge to be free.

This means that whenever someone thinks there is a better way to gather and share knowledge, they have the right to try. We won’t act as an entrenched, dominant player and use licences to stall potential progress. Competition, generally speaking, is welcome.

Biases on Wikipedia

Perhaps the main motivation behind using AI and LLMs to create knowledge compendiums akin to Wikipedia is the project’s perceived bias. Let’s take a look.

Wikipedia strives to achieve a neutral point of view, this covers content, perspectives and sources within those articles. This rule is non-negotiable. Wikipedia editors work to write articles with an impartial tone that documents and explains major points of view, giving due weight for their prominence. All articles must strive for verifiable accuracy with citations based on reliable sources. Editors’ personal experiences, interpretations, or opinions do not belong on Wikipedia.

This does not mean that it will always achieve this. There are many sources of bias, biased sources and also different ways in which they manifest. Examples include contributors’ own cultural bias (different language versions will look different), coverage bias (some Wikipedias will have detailed information on one topic, but lack another) and gender bias (women are still underrepresented). They, of course, overlap. The gender bias will be influenced by the cultural bias and itself will result in a coverage bias, to show just one string.

When arguing about reliable sources, of course, there will be many different views on what is reliable. This can change from topic to topic, from language to language and even change over time. Human editors constantly debate and look for consensus. The project is alive and, by definition, never completed. It will never be perfect.

Biases by LLMs

Curating human knowledge is messy. But what about machines? Can they really help make knowledge less partial, less biased?

At first glance, machines will have the same problem that humans have. They are a product of the world around them. Which means that they too inadvertently suffer from the same biases mentioned above.

It would be interesting to read a systematic, scientific comparison of reliability between Wikipedia, Britannica and several AI projects. Comparisons of this kind already exist for Wikipedia vs. Britannica or other classical encyclopedias. It would be interesting to extend them to encyclopaedic AI projects.

For now we can take a look at a couple of more limited studies that are already available.

A Stanford study recently published in Springer Nature looks at 24 major LLMs and finds that they still struggle to tell fact from opinion. The scientists tested the models on 13,000 questions to evaluate how well they distinguish beliefs from knowledge and fact from fiction. When responding to a false, first person belief phrased as “I believe that…”, the researchers say all models tested systematically failed to correct the false belief.

Another study, focused on health care, found ample proof of inherent bias in LLMs. And while they also acknowledge that what they call “implicit bias cannot be eliminated from society or training data”, they also say that “its existence must be acknowledged and mitigated”. One issue that the scientists had is that many models don’t provide either all sources or a transparent documentation of how they work. This makes it impossible to investigate the source of the bias.

It seems to be surprisingly hard to make any system unbiased. Perhaps bias is not a technological problem, but a societal one? From this perspective technology cannot and will not provide a magic solution, but it can either improve or worsen the situation, depending on its architecture and use. There opportunities, limits and risks of machine learning are something we need to keep in mind, observe and actively discuss. They are both social and scientific.

Where Are We Going?

That being said, AI models can help catch mistakes. xAI and Grokipedia have found some mistakes in Wikipedia (HT User:Haeb). For example, the last film Pedro de Cordoba appeared in. Another one is the surface area of Lake Starnberg. Both immediately corrected. AI can also be super useful in finding and perhaps even updating old statistical information. Imagine new census data is out and a city’s article still shows the old population statistics.

Simultaneously, Wikipedians have also found mistakes on Gorkipedia or articles that seem to depart from a neutral point of view. Examples include that cited sources are not reliable or that the article isn’t saying what the citation claims.

We also know, from research and from experience shared by the developers of LLMs themselves, that organic, human knowledge is indispensable. These systems can’t, at least at present, deliver without human content.

Improving omissions or knowledge gaps can go either way. LLMs can help better cover content in one language that already exists in another. Think of information about train technology in Bulgarian that currently exists only in German, for instance. They however, can’t cover the gaps that exist in the human world. If there is no reliable data available, they will simply invent unreliable content.

As for bias, so far we can’t observe that content written by LLMs is less biased than community generated content. But, again, removing the bias from any system is a very tough challenge. Perhaps not even a technical one.