Debate: Should we care that AI facial recognition is trained on openly licensed photos?

Wikimedia.brussels introduces a new format: debate. Our regular contributors as well as guest authors look at one topic from various sides. The arguments may be contrary, or they may point to different priorities. Contributors cast light on the complexity of an issue that doesn’t lend itself to an easy one-way solution. It is up to our Readers to choose the most appealing point of view or appreciate the diversity of perspectives.

Read the contribution by Anna Mazgal

Read the contribution by John Weitzmann

These days, searchable Creative Commons-licensed resources include over 600 million items. Many of these are photos and out of them, a large number depicts humans – and their faces. While CC licensing does not touch upon the rights of subjects of photographs, the licences enable the author to waive many of their rights making possible, for example, reuse of images portraying people.

At Wikimedia, we are of course fans of open and free licensing – all content in projects such as Wikipedia or Wikimedia Commons is available for further reuse. We love when people do that because practising Free Knowledge is only possible with frictionless sharing, adapting, remixing and building upon what already exists. But as we see the availability of these resources as a force for good, should we care if they are used in a way that harms people?

Openly licensed photos depicting humans are massively fed into large databases used for training AI in facial recognition. Staggering numbers of these images are freely and openly licensed, coming from services such as Flickr and licensed in every possible way, including non-commercial use restrictions. According to researcher and artist Adam Harvey, author of Exposing.ai project, we are talking about millions of images, across multiple databases. Is that in any way harmful?

Literally everyone who develops artificial intelligence meant to deal with humans through recognising, categorising, tracking and profiling us and our behaviours needs training data. AI can be used for perfectly legitimate and hugely beneficial purposes, for example as assistants for people with disabilities. It can also be used by a bank that, wanting to understand our lifestyle to establish our credit scoring, combs through the internet and social media looking for our vacation photos and random snapshots. MegaFace, one of the largest datasets identified by Harvey and containing close to 5 million images leading to almost 700 thousand identities has been used by Tok Tok, Facebook, Google and Clearview.ai. Downloaded, according to the New York Times, thousands of times, these images were also requested by Europol, Danish and Turkish police, a Russian security agency, or the US Intelligence Advanced Research Projects Activity (IARPA), to name a few.

Photos available seemingly for the benefit of all humanity are then used to train obscure social media algorithms used to profile us for ads, to modify our behaviour and stay angry at each other. They are also instrumental in developing state surveillance by a number of states and agencies, and even these seemingly benevolent do it without little public scrutiny as AI uses are new and as yet unregulated. Finally, the subjects of beautiful wedding photos, friends gatherings, and artistic portraits are largely unaware of the fact that their memories and past time become “cannon meat” in the fight for economic and political dominance. Military and security-related uses of AI, from training drone face recognition software to emotion recognition at border control, remains vastly unregulated and unsupervised.

None of this is the fault of the photographers, their subjects or projects that disseminate open content and free knowledge. It is rather a side consequence of a well working system, and neither that system or its participants can be held liable for all the evil in the world, including abuses of that content or personality rights.

But is that enough to write off concerns about obscure and harmful uses of AI? Maybe we do have a more elusive ethical responsibility? And if we do, how can we approach the unintended consequences of our good work in a way that doesn’t stifle proliferation of free and open resources that, for the most part, benefit humanity? Or perhaps the “for the most part” argument is enough and we simply need to do our work and leave worries about AI dystopia to others?

Anna Mazgal, Senior EU Policy Advisor at FKAGEU points out that we cannot, on one hand, claim that sharing in the sum of human knowledge is our end game and, on the other, pretend that uses of that knowledge that lead to disempowerment, disenfranchising and dehumanising of us humans are completely beyond our area of reflection.

John Weizmann, Legal Counsel at Wikimedia Deutschland argues that we must not leverage copyright and licence grants to counter problematic developments in AI – unless we’re ready to jeopardise Open Content altogether.

Let us know what you think!

References:

Harvey, Adam. LaPlace, Jules, Exposing.ai, 2021, https://exposing.ai, retrieved on 2022-02-14;
Nech, Aaron and Kemelmacher-Shlizerman, Ira, “Level Playing Field for Million Scale Face Recognition”, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, p. 3406-3415;