Dimi Dimitrov

Editorial: The DSA debate after Haugen and before the trilogues

If the EU really wants to revamp the online world, it should start shaping legislation with the platform models in mind it likes to support, instead of just going after the ones it dislikes.

Whistleblowers are important. They often provide evidence and usually carry conversations forward. They might be able to open the debate to new audiences. I am grateful to  Frances Haugen for having the courage to speak and the energy to do it over and over again across countries, as the discussion is indeed global. 

On the other hand the hearings didn’t reveal anything completely new, we didn’t learn something we didn’t already know. We live in a time where the peer-to-peer internet has essentially been replaced by a network of platforms, which, in their overwhelming majority, are for-profit, data-collecting and indispensable in everyday life. 

Read More »Editorial: The DSA debate after Haugen and before the trilogues

Meet “ClueBot NG”, an AI Tool to tackle Wikipedia vandalism

There are many bots on Wikipedia, computer-controlled  “user accounts” that perform simple, repetitive, maintenance-related tasks. Most are simple, trained to fix typos or using a list of blacklisted words to determine vandalism. ClueBot NG uses a combination of different detection methods which use machine learning at their core.

Bots on Wikipedia

A bot (a common nickname for a software robot) is an automated tool that carries out repetitive and mundane tasks. Bots are used to maintain different Wikimedia projects across language versions. Bots are able to make edits very rapidly, but can disrupt Wikipedia if they are incorrectly designed or operated. False positives are an issue as well. For these reasons, a bot policy has been developed.There are currently 2,534 bot tasks approved for use on the English Wikipedia; however, not all approved tasks involve actively carrying out edits. Bots will leave messages on user talk pages if the action that the bot has carried out is of interest to that editor. There are 323 bots flagged with the “bot” flag right now (and over 400 former bots) on English Wikipedia. On Bulgarian Wikipedia, a much smaller language version, there are currently 106 bot accounts, but only a number of them are active. Projects by smaller communities sometimes need to rely more on machines for page maintenance.

Read More »Meet “ClueBot NG”, an AI Tool to tackle Wikipedia vandalism

Wikimedia Projects & AI: Designing a “Section Recommendation” tool without reinforcing biases

There is an idea to use a  “section recommendation” feature to help editors write articles by suggesting possible sections to be added. But it is possible that its recommendations inadvertently increase gender bias. Here’s how we could deal with it.

Read More »Wikimedia Projects & AI: Designing a “Section Recommendation” tool without reinforcing biases

Wikimedia Projects & AI Tools: Vandalism Detection

There is a machine learning service available to interested Wikimedia projects and communities called ORES. It aims to recognise if an edit, for instance on Wikipedia, is damaging or done in good faith. Of course, false predictions cannot be avoided and thus remain a major risk. Here’s how we try to handle it.  

Read More »Wikimedia Projects & AI Tools: Vandalism Detection

DSA in imco: Three amendments we like and one that surprised us

Just before the summer recess, the European Parliament’s Internal Market and Consumer Protection committee released over 1300 pages of amendments to the EU’s foremost content moderation law. It took the summer to delve into the suggestions and are ready to kick off the new Parliamentary season by sharing some thoughts on them. Our main focus remains on how responsible communities can continue to be in control of online projects like Wikipedia, Wikimedia Commons and Wikidata.

1. The Greens/EFA on “manifestly illegal content”

AM 691 by Alexandra Geese on behalf of the Greens/EFA Group

Article 2 – paragraph 1 – point g a (new)

‘manifestly illegal content’ means any information which has been subject of a specific ruling by a court or administrative authority of a Member State or where it is evident to a layperson, without any substantive analysis, that the content is in not in compliance with Union law or the law of a Member State;

Almost any content moderation system will require editors or service providers to assess content and make ad-hoc decisions on whether something is illegal and therefore needs to be removed or not. Of course, things aren’t always black-and-white and sometimes it takes a while to make the right decision, like with leaked images of Putin’s Palace. Other times it is immediately clear that something is an infringement, like a verbatim copy of a hit song, for instance. In order to recognise these differences the DSA rightfully uses the term “manifestly illegal”, but if fails to actually give a definition thereof. We agree with Alexandra Geese and the Greens/EFA Group that the wording of Recital 47 should make it into the definitions. 

Read More »DSA in imco: Three amendments we like and one that surprised us

Data Governance Act: Good Intentions, Bad Definitions

The European Commission wants more European data (public, private and personal) to be shared for the purposes of innovation, research and business. It also wants to avoid a system where only a few large platforms control all the data. It thus wants to create mechanisms and tools to get there. That’s commendable! What the Commission  proposes in the Data Governance Act (DGA), though, is at times very unclear.

Here is a breakdown of the European Commission proposals by sector, peppered with our take on some relevant aspects and support for some European Parliament and Council amendments. 

Public Sector Data

DGA creates a mechanism for re-using protected public sector data (e.g. because of privacy rules, statistical confidentiality or IP) . Public sector bodies are to establish secure environments where data can be mined within the institution. Anonymised data could be provided through outside of the institution, if the re-use can’t happen within its infrastructure. 

Read More »Data Governance Act: Good Intentions, Bad Definitions

Takedown Notices and Community Content Moderation: Wikimedia’s Latest Transparency Report

In the second half of 2020 the Wikimedia Foundation received 380 requests for content alteration and takedown. Two were granted. This is because our communities do an outstanding job in moderating the sites. Something the Digital Services Act negotiators should probably have in mind.

See the organisational chart in full here

Wikipedia is a top 10 website globally anyone can edit and upload content to. Its sister projects host millions of files uploaded by users. Yet, all these projects together triggered only 380 notices. How in the world is this possible?

Read More »Takedown Notices and Community Content Moderation: Wikimedia’s Latest Transparency Report

E-Evidence: trilogues kick off on safeguards vs. efficiency

The Regulation on European production and preservation orders for electronic evidence in criminal matters (E-Evidence) aims to create clear rules on how a judicial authority in one Member State can request electronic evidence from a service provider in another Member State. One such use case would be requesting user data from a platform in another EU country during an investigation. We wrote about our main issues in the past.

What Wikimedia worries about

At Wikimedia we were originally  worried mainly about a new data category – access data. This would mean that prosecutors would be able to demand information such as IP addresses, date and time of use, and the “interface” accessed, without judicial oversight. In the Wikipedia context, however, this information would also reveal which articles a user has read and which images she has looked at. 

The second aspect we care about is whether the service provider’s hosting country’s authority will have the right to intervene in some cases where fundamental rights of its citizens are concerned. We know that unfortunately not all EU Member States have good rule of law records, which calls for safeguards at least  against potential systemic abuse. Again, knowing which Wikipedia articles or which Wikimedia Commons images someone opened is information that should be hard to get and only in rare and well justified cases.

Read More »E-Evidence: trilogues kick off on safeguards vs. efficiency

E-Evidence: Let’s Keep Reader Data Well Protected!

A new EU regulation aims to streamline the process by which a prosecutor from one EU Member State can request electronic evidence from a server in another Member State. As current procedures are messy, this is necessary. But the current proposal would also mean that prosecutors could request data about who has read which Wikipedia article without judicial oversight and without a possibility for the country’s authority that hosts the platform to intervene in case of fundamental rights breaches. That is worrisome!

The Wikimedia Foundation gathers very little about the users and editors on its projects, including Wikipedia. This is how the Wikimedia movement can ensure that everyone is really free to speak their mind and, for instance, share information that may be critical of a government in the country they live in. However, the Foundation’s servers do record the IP addresses of users who have accessed Wikipedia, and the individual articles they have viewed. In accordance with the Wikimedia community’s support for strong privacy protections, the Foundation keeps this information for a few months as part of the way its servers function before it is deleted. Allowing access to these IP addresses and the articles that the users behind those IP addresses have read — without judicial oversight — is the issue with the European Commission and Council proposals for an E-Evidence Regulation.

Read More »E-Evidence: Let’s Keep Reader Data Well Protected!