Columbia University Joint CS/EE Networking Seminar Series

Privacy Preserving Ridge Regression on Hundreds of Millions of Records

Nina Taft

Technicolor Research Palo Alto

Friday, Nov 15th, 11:30am, EE Conference Room

Abstract: Numerous data mining tasks, used inside cloud services such as recommendation systems, often employ linear or ridge regression as an underlying computational element. In this talk we present a system in which ridge regression is carried out in a privacy preserving way because the user data stays encrypted all the time. Ridge regression is an algorithm that takes as input a large number of data points and finds the best-fit linear curve through them. Our system outputs the best-fit curve in the clear, but exposes no other information. We propose a hybrid approach that combines Homomorphic encryption with Yao garbled circuits. Our system scales nicely because we remove the dependency on the number of users from any computations involving non-linear operations. We implement the complete system and experiment with it on real data-sets, and show that our hybrid approach performs significantly better than either method alone. We demonstrate that we can run regression on millions o! f users' data within a few minutes, outperforming the state of the art by 2 orders of magnitude. Showing that core data mining building blocks can indeed be executed quickly on encrypted data, even when there are millions of users in the system, is an important step in bring privacy to data mining driven cloud services.

Bio: Nina Taft received her PhD from UC Berkeley, and has spent her career working in industrial research labs in the San Francisco Bay Area. She spent 5 years working at Sprint Labs in the IP research group that helped launch the field of Internet Measurement. Nina worked on ISP traffic engineering problems, but is primarily known for her body of work on traffic matrices. After Sprint, Nina worked for Intel Labs in Berkeley and conducted research on anomaly detection, energy management, and end-host tracing tools for automated performance diagnosis. Currently she is a distinguished scientist at Technicolor Research Palo Alto where she focuses on privacy and recommendation systems. Nina is an active member of the networking community where she has served on numerous program committees, steering committees and in various conference chair positions.