Protecting Data Privacy

Fifth annual Data Science Day explores the tenuous balance of big data and human rights

Sep 21 2020 | By Jesse Adams

For all the blessings of big data, it brings bleaker possibilities, too. The same massive datasets that enable revolutions in commerce and medicine can be exploited to erode away our privacy or even to empower surveillance states. How thought leaders across research and industry are working to protect human rights and dignity while unleashing data’s vast potential was the focus of the fifth annual Data Science Day hosted by Columbia’s Data Science Institute (DSI). The virtual event September 14 convened over 1,800 registrants from across the field and globe.

“Data science, as a new emerging technical field with profound societal consequences, requires us to face these issues head on from the very beginning,” said Jeannette M. Wing, professor of computer science and DSI’s Avanessians Director, introducing proceedings. “Much of the success of today’s methods, such as deep learning, relies on having lots and lots of data: the more data, the better the model, and that today is typically about people, about us. So how can we build systems that use these models while preserving our privacy?”

Among a series of talks regarding responsible data science from throughout the university, Columbia Engineering’s Roxana Geambasu, associate professor of computer science, explained her work in differential probability, a technique by which to assess levels of data security.

“The question that differential privacy answers is how do you know what kind of statistical information is safe to share and how much of it is safe to share,” Geambasu said, explaining how easily many supposedly anonymized datasets can be substantially reconstructed even from redacted and incomplete information.

Also addressing the assembly were Professors Yeon-Koo Che of the Department of Economics, Jeff Goldsmith of the Mailman School of Public Health’s Department of Biostatistics, and Rafael Yuste of the Department of Biological Sciences, who talked about their research and efforts to shore up robust ethical norms. After their talks, they took questions in an open-ended discussion moderated by Tamar Mitts of the School of International and Public Affairs.

Keynote speaker Eric Schmidt, former CEO and executive chairman of Google, detailed his experience navigating highly nuanced issues of data privacy, security and ethics issues with very different governing bodies around the world.

“Technology optimism, embedded in the ethos of the tech industry and the tech founders that believe they’re making the world a better place, keeps running into the reality of how technologies are used,” he noted. “The control of personal data is a novel type of right… we want to decide where is the moral boundary and where is the pragmatic boundary, and getting government systems to make those trade-offs turns out to be very, very difficult.”

In a wide-ranging conversation with Wing, his colleague in graduate school and then at Bell Labs, Schmidt suggested that diverse societies will inevitably regulate data in different ways.

“The collection of data that’s going on now, if used properly, can advance healthcare, safety, knowledge and so forth globally,” Schmidt said. “The key question is, with what appropriate safeguards?”

Stay up-to-date with the Columbia Engineering newsletter

* indicates required