There are many bots on Wikipedia, computer-controlled “user accounts” that perform simple, repetitive, maintenance-related tasks. Most are simple, trained to fix typos or using a list of blacklisted words to determine vandalism. ClueBot NG uses a combination of different detection methods which use machine learning at their core.
Bots on Wikipedia
A bot (a common nickname for a software robot) is an automated tool that carries out repetitive and mundane tasks. Bots are used to maintain different Wikimedia projects across language versions. Bots are able to make edits very rapidly, but can disrupt Wikipedia if they are incorrectly designed or operated. False positives are an issue as well. For these reasons, a bot policy has been developed.There are currently 2,534 bot tasks approved for use on the English Wikipedia; however, not all approved tasks involve actively carrying out edits. Bots will leave messages on user talk pages if the action that the bot has carried out is of interest to that editor. There are 323 bots flagged with the “bot” flag right now (and over 400 former bots) on English Wikipedia. On Bulgarian Wikipedia, a much smaller language version, there are currently 106 bot accounts, but only a number of them are active. Projects by smaller communities sometimes need to rely more on machines for page maintenance.
ClueBot NG is operated by users Crispy1989, Cobi, Rich Smith, DamianZaremba. It uses a much more complex method for classifying vandalism, acts understood as editing deliberately intended to obstruct the project’s encyclopaedic purpose. Previous anti-vandal bots have used a list of simple heuristics and blacklisted words to determine if an edit is vandalism. If a certain number of heuristics matched, the edit was classified as vandalism. This method results in quite a few false positives, because many of the heuristics have legitimate uses in some contexts, and only about a 5% to 10% vandalism catch rate, because most vandalism cannot be detected by these simple heuristics.
ClueBot NG uses a combination of different detection methods which use machine learning at their core: Machine Learning Basics, Bayesian Classifiers, Artificial Neural Network, Threshold Calculation and Post-Processing Filters.
Machine Learning Basics
Instead of a predefined list of rules that a human generates, ClueBot NG learns what is considered vandalism automatically by examining a large list of edits which are preclassified as either constructive or vandalism. Its concept of what is considered vandalism is learned constantly from fellow human vandal-fighters. This list of edits is called a dataset. The accuracy of the bot largely depends on the size and quality of the dataset. If the dataset is small, contains inaccurately classified edits, or does not contain a random sampling of edits, the bot’s performance is severely hampered. The best thing fellow Wikipedians can do to help the bot is to improve the dataset, for which there is a Dataset Review Interface.
A few different Bayesian classifiers are used in ClueBot NG. The most basic one works in units of words. Essentially, for each word, the number of constructive edits that add the word, and the number of vandalism edits that add the word, are counted. This is used to form a vandalism-probability for each added word in an edit. The probabilities are combined in such a way that not only words common in vandalism are used, but also words that are uncommon in vandalism can reduce the score.
This differs from a simple list of blacklisted words in that word weights are exactly determined to be optimal, and there’s also a large “whitelist” of words, also with optimal weights, that contributes to the determination.
Scores from the Bayesian classifiers alone are not used. Instead, they’re fed into the neural network as simple inputs. This allows the neural network to reduce false positives due to simple blacklisted words, and to catch vandalism that adds unknown words.
Artificial Neural Network
The main component of the ClueBot NG vandalism detection algorithm is the neural network. An artificial neural network, or ANN, is a machine learning technique that can recognize patterns in a set of input data that are more complex than simply determining weights. The input to the ANN used in ClueBot NG is composed of a number of different statistics calculated from the edit, which include, among many other things, the results from the Bayesian classifiers. Each statistic has to be scaled to a number between zero and one before being input to the neural network.
The output of the neural network is used as the main vandalism score for ClueBot NG. As with other machine-learning techniques, the score’s accuracy depends on the training dataset size and accuracy.
The ANN generates a vandalism score between 0 and 1, where 1 is 100% sure vandalism. To classify some edits as vandalism, and some as constructive, a threshold must be applied to the score. Scores above the threshold are classified as vandalism, and scores below the threshold are classified as constructive.
The threshold is not randomly chosen by a human, but is instead calculated to match a given false positive rate. When doing actual vandalism detection, it’s important to minimise false positives. A human selects a false positive rate, which is the percentage of constructive edits incorrectly classified as vandalism. To make sure the threshold and statistics are accurate and do not give inaccurate false positive rates, the portion of the dataset used for threshold calculations is kept separate from the training set, and is not used for training. Also, only the most accurate parts of the dataset (currently, the ones that are human-reviewed from the review interface) are used for this calculation. This ensures that all statistics given here are accurate, and that false positives will not exceed the given rate.
Even after the bot detects and wants to revert vandalism, there are few simple rules on Wikipedia that override it. Although they limit the bot’s efficiency in combating vandalism, they also greatly reduce the risk of false positives. Such rules include:
- User Whitelist — If the bot deems an edit as vandalism, but the edit is made by a user who is recognised by the community for their constructive contributions over time, the edit is not automatically reverted.
- Edit Count — If a user has a large number of edits, and relatively small percentage of warnings, the edit is not reverted.
- 1RR — The same user/page combination is not reverted more than once per day, unless the page is on the angry revert list.
Approval, Operating Team & Emergency
ClueBot NG was approved on English Wikipedia in December 2010. It has been operational since then. The approval process consisted of a request, a pretrial discussion, a first 14-day trial and following discussion period and a second 14-day trial and following discussion period.
The operating team is known and its members and the bot have public talk pages where anyone can leave a message. There is a simple way for anyone to report false positives. There is also a group of users that reviews false positive reports for ClueBot NG.
Everything ClueBot NG does (as well as any other user) is publicly logged and easily reversible.
Any Wikipedia administrator can use the emergency shutdown button. Any user can request a shutdown at any time on the administrators’ incidents notice board.
Large parts of this post are taken from the Wikipedia pages describing ClueBotNG.
A miniseries on machine learning tools
Machine Learning and AI technologies have the potential to benefit free knowledge and improve access to trustworthy information. But they also come with significant risks. Wikimedia is building tools and services around these technologies with the main goal of helping volunteer editors in their work on free knowledge projects. But we strive to be as human centered and open as possible in this process. This is a miniseries of blog posts that will present tools that Wikimedia concepts, develops and uses, the unexpected and sometimes undesired results and how we try to mitigate them. Today, we present ORES, a service meant to recognise vandalism.