Contact-tracing apps are being pushed at the forefront of the efforts for containing the Coronavirus epidemic, and are generating a vast amount of controversy, given the severe invasion of privacy they entice and the large potential for abuse. Contact tracing is a longstanding public health technique that works by identifying everyone whom a sick person may have been in contact with and helping these people identify their risks and take appropriate action, such as getting tested for instance. In 2020 this has taken the form of the mass-adoption of cell-phone apps, for tracing at scale.
Most contact-tracing apps work in the same way: a unique identifier linked to a person’s cell phone is exchanged via Bluetooth with all the other phones that get within a certain distance for a minimum amount of time. The phone, therefore, contains a list of devices it has “encountered”, which is regularly uploaded to a central server. When a person discovers to be infected, they can flag their list to the server. The server then sends a notification to the people on the list, stating their risk and suggesting a course of action.
It is important to note that there is no disclosure of the name of the infected person. It is however not impossible for people to make deductions, especially when we are so limited by social distancing measures… For instance, if someone only goes out of their home to go to work. But although this is a serious concern, it is not the most dangerous aspect of technology.
There is a central server that collects all the lists of unique phone identifiers. If the phone identifier is generated by the central server, then we have a place where we find the keys to identify the user of each phone AND the social contact data. This allows the owner of the server to build a social graph, a map of all physical human interactions. Who each person meets, where, when, and how often, represents an enormous amount of predictive information that allows inferring most of that person’s life. We know perfectly well that once a technology exists, it is never forgotten. So after the Coronavirus crisis is passed, the social graph remains. And what if, one day, a government is interested in tracing the social contacts of journalists, union leaders, religious minorities? The social graph becomes a hugely powerful instrument of surveillance and suppression. In some countries like China, social graphs are also used to assign a “social score”, identifying “desirable” behaviors in the population, and punishing “undesirable” behavior with less access: to social programs, to jobs, to mortgages and loans, even to specific geographical locations.
Mitigations to this problem are possible. The app Immuni (*), which is being rolled out by the Italian government, has been updated to use decentralized contact-tracing: the phone unique identifier is generated by the phone itself, not by the server. So the server does not contain a key to trace the phone back to the user, resulting in what we would call an “anonymized social graph”: each node is not a person but a number, and since the anonymization is not performed by the central entity, the entity does not “know” how to identify a user from their number. This is certainly an improvement, but we must notice two things:
1. the data marketplace: all metadata generated by devices connected to the internet – like phones – is for sale. Every app installed in the phone is constantly sending data to its central server, and this data is routinely enriched with data from other sources and sold. The phone location data alone might be enough to uniquely identify its user. So from the technological standpoint, nothing prevents a commercial or governmental entity to use the contact-tracing data, enriched with secondary data, to re-identify every single user, and build the social graph. Only regulation can prevent this, and we know that regulation does not come near to mitigating this problem. For instance, the GDPR, while regulating data voluntarily provided by the user, has no reach over metadata produced by apps.
2. the power of modeling data: the level of predictive power that can be extracted from our data is hard to grasp, but it’s immense. Data and its applications are not compartmentalized: an app designed to trace infected contacts during a pandemic generates data that can be used later on to model voting intent, spending habits, travel preferences. Again from a technological standpoint, it is impossible to limit the use of this data to this particular emergency. Only regulation can achieve that.
The decentralized model for contact-tracing apps is being pushed by commercial entities like Apple and Google, in the name of preserving privacy. We welcome these proposals, but we do so with our eyes open: massive data companies like Google have endless other ways to know exactly who we are and what we do every second of every day. Nevertheless, there is more work to be done to ensure that contact tracing apps don’t overreach what they were designed to do – help.
(*) The European Parliament on April 17th gave their support to the decentralized approach, pointing out by overwhelming majority “that […] the generated data are not to be stored in centralized databases, which are prone to the potential risk of abuse and loss of trust and may endanger uptake throughout the Union” and demanding “that all storage of data be decentralized”.
How do you trace Covid-19 while respecting privacy? https://e-estonia.com/trace-covid-19-while-respecting-privacy/
Decentralized Privacy-Preserving Proximity Tracing https://github.com/DP-3T/documents
Joint Statement on Contact Tracing: Date 19th April 2020 https://drive.google.com/file/d/1OQg2dxPu-x-RZzETlpV3lFa259Nrpk1J/view
Contact Tracing Cryptography Specification https://covid19-static.cdn-apple.com/applications/covid19/current/static/contact-tracing/pdf/ContactTracing-CryptographySpecification.pdf