Debate: AI and the Commons – solutions to be found only beyond licences

While we surely must not shy away from looking at what develops with and around Open Content and for solutions of harmful effects we must seek beyond the licensing level. We shouldn’t try to leverage copyright as a prohibitive means unless we are willing to sacrifice the idea of the Open Content altogether.

Read the introduction to the debate

Read Anna Mazgal’s take on the issue

New technologies mean new dark sides

The breathtaking potential of automated systems includes a breathtaking danger of abuse. One might argue, however, that facial recognition is not actually an application of artificial intelligence technology, but a rather sophisticated method of pattern recognition combined with an instance of deep learning mechanisms. We should widen the scope to the digital content used for enhancing autonomous systems or automation in general – the term Automated Decision-Making, ADM, comes to mind.

Nobody interested in digital technology, the internet, and fundamental rights should disengage from the debate around such systems and how to regulate them. At the same time we have to be quite precise about the types of content we are talking about. It’s not only because the property of being open (in the meaning of the open definition and the definition of Free Cultural Works) is key here. Also because the possible means for regulation differ according to the content in question.

Openness isn’t about more leverage, it’s about less control

An openness that means “here’s content that you can use, unless your use is something we don’t approve of” is a very peculiar kind of openness. The friction produced by the infamous non-commercial clause (NC) in some CC licences is a very telling example of that peculiarity. Even though NC can make sense as a means to restrict certain uses (and users), it mostly restricts the wrong ones.

If you focus your view on a tiny group of potential users you want to control or exclude, restrictions can make a lot of sense. And that, by the way, is the idea behind the classic IP licensing where exclusive rights by default exclude everybody except a tiny group that bought the licence. But if one takes into account all potential users, which is the paradigm behind the Open Content, such restrictions limit a number of users far greater than intended. Depending on the context, the ratio is probably that of 2% reasonably restricted vs. 98% collateral effect, although no conclusive research seems to exist on this point.

What free / libre / open licences usually require as a condition is attribution, as laid down for example in the BY clause of the Creative Commons licences. There is a second condition possible in licences meant to qualify as free / libre / open: the copyleft (or as it is known in Creative Commons licences: Share Alike). Just as the attribution condition, copyleft only requires a positive action of preserving the rights of the original in derivative works following the uses that are allowed in the licence grant. It is still far from prohibiting anything or limiting the licence grant in any relevant aspect. So yes, there is some leveraging in copyleft, but it only serves to sustain the openness.

“The Open Content was given not to “us”, but to the commons – piece by piece by initial individual rightsholders. The commons is a massive amorphous sphere without a single licensing steward.”

How big of a chunk is it anyway?

Being in the middle of an argument about the use of Open Content for training automated systems through deep learning, one very basic question cannot be avoided: how relevant is the chunk of training data coming from the world of Open Content really?

We can’t directly know the answer, but we can make an estimate. There’s little doubt that people responsible for getting trainable systems in contact with content to train on will try to obtain the maximum amount of content that they can get. Thus, the ratio between Open Content on the one hand and all internet content that is practically if not legally available on the internet, on the other, would be indicative of how much training data is actually Open Content.

Creative Commons estimates on the overall size of the commons surely are impressive. The numbers from 2017 range in the lower one-digit Billions, with an impressive growth rate. Yet, looking for example at the image content that is so very important in general and for the topic at hand in this text, compared to the overall amount of pictures on the internet, including social media platforms and such, the amount of CC-licensed works is tiny.

A Commons is not about ownership

One might say: “But still, we must at least take care of what is done with our content, even if that’s only a small chunk!”. Well, here’s the hard truth that is both wonderful and annoying: Open Content belongs to all and thus to nobody in particular. It was given not to “us”, but to the commons – piece by piece by initial individual rightsholders. The commons is a massive amorphous cloud-like sphere without a single licensing steward. No group of enthusiasts or activists owns the commons, nor can it claim that they are tasked with custodianship of it.

A retroactive change to the notion of Open would be equal to a departure from the existing model into a whole new digital commons through a gigantic fork of code, a schism of Openness. With a sufficient reason for such a schism, it shouldn’t be unthinkable. The question is whether countering undesirable effects of facial recognition technology is a sufficient reason. The author of these lines thinks it’s not, in particular because there in fact are other means beyond the licences that are better adapted to the purpose.

“A retroactive change to the notion of Open would be equal to a departure from the existing model into a whole new digital commons through a gigantic fork of code, a schism of Openness. “

A mandate on the input side, sure, but hardly any on the output side

As the Open Content does not belong to anybody in particular, there’s a question of our political mandate in advocacy. Nobody has a mandate to limit or widen the scope of permission that was chosen by the contributing person at the point when a piece of content joined the commons. And, as stated above, the scope of permission in the Open Content as we know it does not limit or even talk about the purposes for which that content may be used.

It’s thus highly questionable whether it is the job of Wikimedians to purposefully engage in political framing outside of the core purpose of the Wikimedia projects. Working towards “… a world in which every single human being can freely share in the sum of all knowledge” definitely doesn’t extend to influencing what all those humans do with that resource. To have it regulated by the European Union or any other body what others can or cannot do with knowledge means going beyond the objective of free sharing and of our mandate.

Licence proliferation

Finally, there’s one rather legalistic topic that is often pushed aside: licence proliferation. In this context it means that the overall pool of content licensed freely for the purposes of remixing or derivation is made available under various licences. This hinders the objective behind the Open Content due to increased complexity and licence incompatibility. Incompatibility in this sense means that at least one of the licences cannot be adhered to without breaching the terms of the other.

Open Content is all about enabling non-lawyers to freely engage in creative expression, without the barbed-wire fences that the prohibitive copyright laws erect everywhere. It is about lowering the so-called transaction costs to a level where even non-profit projects and individuals can dare handle copyrighted material in a reasonably safe way. That requires simplicity.

Solutions beyond licences

Restricting certain purposes of use is alien to the Open Content definitions we have. Unless we want to risk dividing the open world into separate parts and boosting licence proliferation, we are stuck with the liberal definition of Open.

Instead, we should engage in pushing back against problematic developments around digital content – if we think our mission covers that – in ways that do not touch the core rights granted in open licensing. We should focus on personality rights and privacy law. These fields are more adequate to meaningfully address facial recognition and might prove much more effective than tinkering with a myriad of individual rights grants.