“Facebook will open its data up to academics to see how it impacts elections”
The headline above seen in MIT Technology Review twitter feed definitely caught my attention as it was timely and related to my post yesterday.
So last week Facebook announced the first researchers who will have access to Facebook’s privacy-protected data as part of its role to promote independent research on social media’s role on elections. You can read the announcement here. Basically, Facebook wants to correct the world’s perceptions on them that their existence makes the world a better place, they do not misuse or allow third parties unknowingly misuse their biggest asset which is the users data.
I applaud this initiative, ignoring any political agenda behind it, if there is. This will actually set the foundation/framework on data sharing because Facebook aims to do it by “ensuring that privacy is preserved and information kept secure” and that it “acts in accordance with its legal and ethical obligations to the people who use their service”. Whatever they intend to do, they would not compromise people’s privacy. According to the announcement, Facebook has “consulted some of the country’s leading external privacy advisors and the Social Science One privacy committee for recommendations on how best to ensure the privacy of the data sets shared and have rigorously tested their infrastructure to make sure it is secure“.
What’s interesting to me is they are building a process to remove personal identifiable information (“PII”) from the data set and specifically testing the application of differential privacy, an increasingly used innovative method of anonymising data which is a machine learning technique based on neural networks. In ODI’s report on Anonymisation and Open Data, differential privacy is defined as follows:
Differential privacy is a property of data systems that allows collection of aggregated statistics about a dataset but obfuscates individual records. When queried, a small amount of noise is added to the data such that if any one record were removed, the query result would stay the same. This means those using the data can never be entirely certain about any single person’s data.
If this is deemed successful, this will actually pave the way for other corporations specifically the traditional ones who are sitting on customers data to have the comfort of sharing privacy-protected data to external parties to harness the power of big data. The biggest challenge is to get the traditional lawyers, CEOs, senior management understand that anonymised data is NOT personal data.