In this installment of series of longer features on our blog we analyse the scope of the AI Act as proposed by the European Commission and assess it adequacy in the context of impact of AI in practice.
AI is going to shape the Internet more and more and through it access to information and production of knowledge. Wikipedia, Wikimedia Commons and Wikidata are supported by machine learning tools and their role will grow in the following years. We are following the proposal for the Artificial Intelligence Act that, as the first global attempt to legally regulate AI, will have consequences for our projects, our communities and users around the world. What are we really talking about when we speak of AI? And how much of it do we need to regulate?
The devil is in the definition
It is indispensable to define the scope of any matter to be regulated, and in the case of AI that task is no less difficult than for “terrorist content” for example. There are different approaches as to what AI is taken in various debates, from scientific ones to popular public perceptions. When hearing “AI”, some people think of sophisticated algorithms – sometimes inside an android – undertaking complex, conceptual and abstract tasks or even featuring a form of self-consciousness. Some include in the definition algorithms that modify their operations based on comparisons between and against large amounts of data for example, without any abstract extrapolation.
The definition proposed by the European Commission in the AI Act lists software developed with specifically named techniques; among them machine learning approaches including deep learning, logic- and knowledge-based approaches, as well as statistical approaches including Bayesian estimation, search and optimization methods. The list is quite broad and it clearly encompasses a range of technologies used today by companies, internet platforms and public institutions alike.
That is a sensible approach, because with the emergence of all sorts of algorithm-assisted decision-making processes that the EU citizens come in touch with every day, there is a need to better understand how well these algorithms perform and to regulate some of their uses.
AI and the production of free knowledge
Wikipedia and Wikimedia Commons rely on several algorithms, or bots, in both detection and prevention of vandalism and as editing help. These tools are developed by the community and they undergo joint scrutiny before being massively deployed. The community of a given language version of Wikipedia votes if the tool is needed and helpful, and it is checked for bugs and problems. Sometimes a bot is deployed by the Wikimedia Foundation, which then runs another check before taking it on. There is an unconditional requirement for all the bots and the training data to be open source. Any person with enough skills to evaluate or improve the code can do so.
What AI are we regulating?
Various Wikimedia projects employ techniques classifying them as AI in the understanding described above: ClueBot NG uses Machine Learning Basics, Bayesian Classifiers, Artificial Neural Network, Threshold Calculation and Post-Processing Filters; ORES, the vandalism detection and removal tool, is based on machine learning. Wikidata, created and developed in Europe, is a knowledge base and Wikibase is software for running knowledge bases, both falling under logic and knowledge-based AI. Does that mean that the new law would regulate how these projects function?
The main idea behind the AI Act is not to go after all the technologies capable of generating content, predictions, recommendations, or decisions influencing the environments they interact with; but to regulate their use. Since the performance of a given algorithm depends on humans. And more specifically, on to what end they want to use it and what data they provide so that it produces the desired output. Focus on the use makes a lot of sense as well.
Prohibited uses of AI – half a step forward
The European Commission outlined 3 basic approaches to regulating AI. First one relates to the prohibited applications of AI. Globally we have enough evidence that the algorithmic black box can be employed to evaluate people’s behaviour, modify it, or refuse certain services or privileges.
The EC wants to prohibit social scoring by public authorities that use algorithmic determination of people’s trustworthiness based on behaviour or personal characteristics, whether known or predicted. But social scoring would not be forbidden in all cases – only if the result leads to bad treatment in social contexts that are unrelated to those where the data has been gathered. So, for example, algorithmic evaluation of somebody’s financial accounts to determine eligibility for a public welfare programme could be allowed; applying AI to social media posts to determine if that person’s way of life suggests that they earn more than they actually declare, perhaps not.
Another prohibition concerns introducing to the market and using AI that subliminally or unconsciously manipulates a person and leads to distorting their behaviour in a way that brings harm to themselves or to others. This could be read as a prohibition of pedalling highly divisive content on social media that incites hatred or violent behaviour. It is necessary to curb the algorithms deployed for manipulation to bring in profits to the platforms and their clients. But such harm needs to be proven in a specific case of a social network’s AI modifying the behaviour of a person (or a group of people). Considering that these technologies are protected by trade secrets, it may be a long and uphill court battle. In other words, forbidding a use of technology may not be sufficient to meet the objective.
AI knows your face
Finally the EC wants to prohibit the use of real-time remote biometric identification to enforce the law in publicly accessible spaces. There are, however, exceptions. It would be allowed when strictly necessary to look for a victim of a crime, of a perpetrator of a serious offence, or to prevent a specific, substantial and imminent threat. It means that, firstly, a slight delay in live footage makes it not real-time even if that is only a few seconds, and it will be allowed. Secondly, systems capable of performing live checks in those exceptional situations will be massively installed, because one can never determine when and where such a need will arise. Biometric identification in public spaces should be forbidden as it is not possible to know who has access to troves of data on innocent passers-by, in real time or not, and how that information is used under the overarching pretext of national security.
“We do not collect personal data and protect anonymity by default, but cross-referencing large datasets may reveal identities of people. Wikipedians produce information about politically and culturally sensitive topics and we wouldn’t want such activity to be weaponised against them.”
Since the AI that we develop at Wikimedia is not used for any of these purposes, we are not directly affected by the prohibitions. We support the idea to prohibit certain uses, however, especially where the activity of our editors and contributors could be mined to infer information about groups or individuals. We do not collect personal data and protect anonymity by default, but cross-referencing large datasets may reveal identities of people even if the information accessible in one source is scarce. Since Wikipedians produce information about politically and culturally sensitive topics, we wouldn’t want such activity to be weaponised against them in any circumstances, with the use of AI or otherwise.
Curbing high risk of AI
The EC lays out in their draft what we know already – that some uses of AI, however helpful, need to be especially scrutinised. The EC wants that high-risk uses be accompanied with a risk management system and data governance practices. Both are important, as the performance of AI depends not only on what it is used for (and if with the purpose it was made for) but also what data is used to train it and further, what data is processed and how it contributes to algorithmic bias.
Technical documentation and record-keeping would have to be developed. Users of the AI – for example public authorities buying these technologies as well as companies – will need to be able to understand how to use the chosen AI to get expected results and what are its limitations and potential for automation bias (when an algorithm relies on its own input to produce further input the built in bias grows with each iteration). What is very important, the European Commission wants high-risk AI to be effectively overseen by humans throughout its life cycle.
AI and the bureaucratic machine
These are some of the requirements, but their effectiveness depends largely on which uses of AI they will concern specifically. The EC lists high-risk AI uses in an annex to the Regulation, which would make it easier to revise and amend than if they are named in the law itself – for better, and for worse.
The high-risk uses are divided into a few categories, including biometric identification and categorisation, that is, the instances thereof that are not prohibited. These would be either not real-time or when carried out by private entities (think a smart doorbell with a camera). The latter would not be prohibited under the AI ACT as proposed by the EC; but could be undermined from the point of General Data Protection Regulation, for example.
Somewhat logically, high-risk AI would be one used in management and operation of critical infrastructure. Another example is education and vocational training. It includes both determining if a person is eligible and knowledgeable to access a given level of education, which includes exams and other types of assessments. High-risk category is also stretched to AI in employment.
Learning from past mistakes (not)
It makes sense that uses of AI in access to services such as welfare benefits or public assistance, as well as essential private services such as access to loans through establishing a credit score is deemed high risk. However, let’s consider the huge scandal in the Netherlands related to the AI-driven public childcare benefits system that, not being properly designed and controlled for years, pushed tens of thousands of families into tax debts. It would seem appropriate to at least impose a moratorium on uses of such AI tools in public welfare until it is clear that they can be designed to adequately serve such a purpose, considering the dangers their malfunctioning proved to pose.
There are also instances of AI applications that are considered high-risk by the EC but that should be banned. These include use of AI in law enforcement and criminal justice systems. There have been many instances of misuse of AI by police and there are records of disproportionate targeting of minorities and marginalised groups. Law enforcement tools tend to be shrouded in secrecy so the scrutiny, even if mandatory by AI Act, can be hindered by the use of public security arguments that override many laws. There is also no clear evidence of the possibility to ensure fair trial and access to effective remedy if AI is used in researching and interpreting facts in judicial proceedings.
These arguments can be extended to applications of AI in migration, asylum and border control. Seeking asylum is regulated by international law, violations of border crossing rules can result in imprisonment. The right to liberty and to seek protection from life-threatening circumstances should be reliably safeguarded and that includes ensuring that they are not dependent on technology that can easily be shielded from public or expert scrutiny. Even as, or especially when, migration to the EU may keep surging.
It is evident that the technologies developed in the Wikimedia community have no application in high-risk areas. Our interest in that category stems from the fact that with further changes of the annex, the approach to what is and isn’t high-risk AI could change. Especially as the growing need to enable access to verifiable information may bring more innovation in AI-assisted fact checking. Also, with the commitment to human rights in the Wikimedia movement, we think it is important to strike the right balance between what is high risk, and therefore still allowed under special conditions, and what shouldn’t be allowed to protect fundamental rights.
General purpose AI
So what about other situations in which AI is applied? The EC envisions that some practices should be more transparent when AI is used for any general purpose. Importantly, that would include the need to inform a person that they interact with a form of AI whenever it is not obvious, be it a bot answering questions in an online shop or a job application process. The same concerns emotion recognition and biometric categorisation, except when it is used by the law enforcement in detection, prevention and investigation of crimes.
Interestingly the obligation to reveal will also include deep fakes – manipulations of images, video or audio that can be mistaken for a true activity of a real person or a true event. It is important for our projects, as deep fakes pose a threat to appropriate referencing of information on Wikipedia or as content making its way to Wikimedia commons, for example.
Of course, malicious actors will not care if that requirement is followed; so we will see if that obligation can be at all enforceable. Also, if a piece of content is considered as exercising freedom of expression or of the arts it won’t have to be revealed as a deep fake, which builds on the necessity for protection of the two. The same would apply to detection, prevention and investigation of crimes, although here it is difficult to discern how deep fakes can assist prevention and why the manipulation cannot be revealed in that case.
“The policy makers want to take a bet that prescribing transparency of complex systems is enough to manage the risks these systems pose for human rights and human lives. By that, they are effectively legalised.”
Legalising the regulated
With rapid developments in technology coupled with its fast absorption we stand at many crossroads at once. On one hand, there is a need to make sense of the avalanche of information that falls on us in every activity, in every sector, when debating any systemic issue. On the other hand, there is little conversation about whether mass collection and processing of information isn’t hugely contributing to the problems that we are trying to solve by sorting it out. On one hand, we have a promise of state of the art technology that will help judges be more just, police more swift, and beneficiaries of public services better served. On the other hand, we still need policies and political processes that will set publicly accepted objectives and not hide behind the obscurity of algorithms that no one feels accountable for. On one hand, we see the need of channelling vast sums of money to private companies making the big promise of AI-assisted wellbeing, on the other hand we still struggle to ensure that our institutions have the capacity and support necessary to navigate through this new data-powered reality.
The AI Act is like a foot in the door in opening up these problems. It offers guidance as to what is possible, what should be deployed more carefully, and what shouldn’t be done at all. In doing so it does two things that are not future-proof. First, it ignores existing evidence of uses of AI, such as predictive policing, assisting justice, or in public aid, that proves exacerbation of existing inequalities and stripping people of possibility to argue their own case. After all, it is hard to argue with a proprietary algorithm, which can only be understood as much as its maker is willing to reveal.
Second, with high-risk AI the policy makers want to take a bet that prescribing transparency of complex systems that can only be understood by specialists is enough to manage the risks these systems pose for human rights and human lives. By that, they are effectively legalised. We can only hope that the industry that will emerge around ensuring compliance with requirements for high-risk AI will do its job properly.
It may be that, standing at the crossroads, this is as much as can be done to ensure some scrutiny over AI. From that standpoint regulating the use and not necessarily the technology itself, emphasis on transparency and providing a list of prohibited practices are sensible choices. Here’s hoping that during the legislative process at least that last category can be improved.