Not so anonymous

A recent article in The New York Times drew my attention to a study which shows how "anonymized" information is, in fact, not so anonymous after all. In the paper "How To Break Anonymity of the Netflix Prize Dataset" two researchers show how to identify users of the Netflix movie rental service from an anonymized dataset released to participants in a data mining competition organized by Netflix.

The interesting thing about this study is that, according to Netflix "The anonymity of the information is comparable to the strictest federal standards for anonymizing personal health information." While this may or may not be true, it is certainly true that there is a debate going on as to whether the privacy regulations are sufficient (and not only in the US, but also in Europe).

The problem is the following: on one hand the data collected by the medical industry (as well as other industries) is very valuable for data analysis (data mining), and can be a powerful tool in medical research. On the other hand the data is very sensitive, and should be protected. Law makers are faced with the problem of coming up with data protection laws which, on one hand, allows researchers to use the data, but on the other hand protects the privacy of the citizens.

Lawsuit

It seems there is a recent lawsuit case regarding Netflix publishing its price dataset.[Article]
It also seems there might be a second contest by Netflix. Again, data analyzers will compete to improve the recommendation system, this time by using a dataset with ZIP codes, ages and genders.
And of course again, researchers from privacy domains will compete to be the first to identify users from the dataset.

 

MODAP Consortium

The consortium consists of 11 partners from 7 countries in Europe.
Sabanci University (Coordinator) Fraunhofer IAIS Hasselt University
CNR - Area Della Ricerca di Pisa Université de Lausanne EPFL - Ecole Polytechnique Fédérale de Lausanne University of Piraeus Research Centre University of Milan
Wind Telecomunicazioni SpA Alterra B.V. EPFL - National Kapodistrian University of Athens

Sponsors

MODAP Project funded by:
European Union FET-OPEN
EU FET-OPEN 2009-2012
The Future and Emerging Technologies Open Scheme