Privacy By Design: GDPR in AI Technologies
What is AI and why do we talk about it?
When average people see the phrase ‘artificial intelligence ‘, they usually think about some autonomous self-learning Jarvis who improve itself so rapidly, so that it captures and controls humanity.
However, such a vision for artificial intelligence is popularized by writers, scriptwriters and directors, but when it comes to people from the IT field, for them AI is just a certain technology level with reduced number of people involved in some processes, rather than a one specific super-robot.
Curiously enough, the term “artificial intelligence” is not a product of the twenty-first or even the end of the twentieth century. It was coined in 1955 by American computer scientist John McCarthy when he was preparing his speech for a conference in Dartmouth. But why, then, the issue of artificial intelligence is so extensively discussed now, more than 60 years after its appearing?
One of the main reasons for such discussing is the increasing amount of data that is available in the world. Somewhere in the 2010s, society realized that the amount of produced data was so large (and so rapidly increasing) that it became not enough to use only human efforts for effective analyzing at least some significant part of this data. Therefore, companies began to use AI platform for this. Using of artificial intelligence allowed leading companies to optimize many processes and made their analytics extremely effective. At this point, you all have to remember how quickly ordinary network advertising has been replaced by targeted, and how music services which play exactly the songs that you like have become a part of your day (“how does it knoooow?”).
What do GDPR and Privacy by design have to do with it?
Of course, the active involvement of AI in different processes created some new problems. Among the processing data (actively transferred to the AI), there was not only general or technical information. Oftenly, this was the data which we call personal, that is relating to some identifiable living individuals and often a part of their private lives. Security and “correct” processing of such data is very important, so both technical and legal measures are needed in order to guarantee these aspects.
The GDPR is adopted to fulfill these tasks. Article 25 of this act entrench the requirement called “privacy by design“. Within this concept, it is noted that it is necessary to integrate the privacy (and other important principles of personal data processing) into the systems at all stages of their creation and existence. This rule applies to the development of AI as well, therefore developers of programs and algorithms are required to integrate such technical measures, so that important aspects of personal data processing would be literally embedded in their systems.
It is important to note that the rules of GDPR influence the processes of AI development not only within the Europe, because most of the leading companies, which develop artificial intelligence, have made this document their step-by-step guide and use its principles as a best practice, regardless they are obliged to do it or not. That is why GDPR plays such an important role in determining the direction of AI development around the world.
Let’s say right away, the concept of privacy by design is often criticized for non-specificity, so it’s not easy to find out what is it about. However, we can understand its meaning by analyzing the practice of AI companies that comply with the requirements of GDPR.
The 4 main requirements for AI developers
So, what was changed in the work of AI companies and developers in order to ensure the implementation of privacy by design?
Less automated processing
First of all, the developers faced with the task of compliance with the requirements of Article 22 GDPR, so that the data subject would have the opportunity not to be a subject to a decision based solely on automated processing. That is, in order to comply with the privacy by design concept, companies had to move away from understanding artificial intelligence as using only machine intelligence, and start to involve human intelligence in this processes (the human-in-the-loop system). Involving people to supervise and control the data processing reduces the risks in the work of the AI. As a result, the personal data processed by the AI is more secured, and the conclusions of machine intelligence are deprived of possible biases or inaccuracies. For a better understanding of how human-in-the-loop works, let’s give you an example:
Imagine an AI platform that was created to analyze the data of users’ payments and to block those which look like a fraud. How profitable it would be if machine intelligence could do everything by itself – analyze, find the signs of fraud, and block the payment. Quick and cheap! However, then the requirements of article 22 of the GDPR are not met. While this system can be organized in a different way: the machine intelligence analyzes the data, distinguishes the signs of fraud, and represents it to a person in a convenient way – such as “we have such a transaction, and here I have found such suspicious things – block it or not”? And then a person looks at the picture as a whole, and decides whether the conclusions of the machine make some sense and do we need to block the transaction.
Of course, it is not very beneficial for saving time and money, but it reduces the risk that someone’s transaction can be blocked because of the machine decision based on the “wrong” data (in legal language: this allows to limit the discriminatory effect on individuals on the basis of racial or ethnic origin, political opinions, religion or philosophical beliefs, trade union membership, genetic or medical status, or sexual orientation). In addition, people involved in such a system will be able to supervise that there are no other violations (such as leakage of data that the machine cannot detect by itself).
Logic of AI? Explain, please
Secondly, in order to implement the concept of privacy by design, AI developers and companies should use artificial intelligence in the way that the logic of its solution can be explained. This aspect is very much related to the previous one, but it is important to emphasize that according to the GDPR, when some decisions are based on automated processing, reliable information about the logic of such processing should be provided for the data subject.
AI developers (those who work with technical aspects) often ask here – and what exactly do I need to explain you? How artificial intelligence selects and transmits data, how does it analyze it? Well, what are you able to understand of that? (spoiler: almost nothing). In fact, it is necessary to explain here not the technical processes, but the reasons why the machine decides in this way, and not otherwise. Linked to the previous example, the AI should be designed in such a way that if necessary, it could be shown: this transaction was blocked because it was carried out in this way, and according to the logic of the machine, it is the sign of fraud, because it has analyzed examples of fraudulent transactions, and this percentage of them had such signs.
Yes, it also adds some work to AI developers, because such things need to be embedded in machines (time and money!), but the benefits of this approach need no explanations.
Differential privacy
The third is reducing the ability to identify individuals using the information given by the AI. What does it mean? The work of the AI should be organized in a way when the personal, and especially sensitive data, is encrypted where it is possible. For example, when the AI platform analyzes the salary or sexual orientation in a particular group of individuals for certain purposes, it is important that the person who receives the result of such an analysis cannot follow the reverse path and compare the data about individuals and their salary or orientation. This approach is called differential privacy. And if you think it is easy to do it, you are wrong. But many people work on improving the differential privacy, so this problem has a chance to be solved in the near future.
Encrypting is also important in data leakage. The damage of leakage can be minimized if the information is stored separately and it is not easy to identify a particular person even if you have some significant part of it.
Right to be forgotten?
The last but not the least, when using AI for data analysis, is the right to forgotten, which is enshrined in article 17 of the GDPR. The problem is, how to ensure the erasure of personal information, if the AI uses the given information for self-learning? And is it possible to completely remove any information from the AI memory? And how to prove that such information was removed, and not just deeply hidden in the memory of AI? To answer these questions, some deep research is needed, so within the scope of this article we will leave them open.
But the fact is that GDPR is not just an act used somewhere in the EU by some specific organization, it is something that changes the rules of development and using of artificial intelligence around the world, therefore, it changes our future.