Deep learning based on artificial neural networks is a very popular approach to modeling, classifying, and recognizing complex data such as images, speech, and text. The unprecedented accuracy of deep learning methods has turned them into the foundation of new AI-based services on the Internet. Commercial companies that collect user data on a large scale have been the main beneficiaries since the success of deep learning techniques is directly proportional to the amount of data available for training.
Massive data collection required for deep learning presents obvious privacy issues. Users' personal, highly sensitive data such as photos and voice recordings is kept indefinitely by the companies that collect it. Users can neither delete it, nor can restrict the purposes for which it is used. Furthermore, centrally kept data is subject to legal subpoenas and extra-judicial surveillance. In many situations, privacy and confidentiality concerns prevent data owners from sharing data and thus benefitting from large-scale deep learning.
In this talk, I will describe joint work with Prof. Vitaly Shmatikov on a practical system that enables multiple parties to collectively learn an accurate neural-network model for a given objective without sharing their input datasets. Our results indicate that this system offers an attractive point in the utility/privacy tradeoff space: participants preserve the privacy of their respective inputs, while still benefitting from other participants' models and thus boosting their learning accuracy beyond what is achievable solely on their own inputs.
Reza Shokri is a post-doctoral researcher at University of Texas at Austin, and is currently visiting Cornell NYC Tech. His research focuses on computational privacy: using statistical and machine-learning tools to evaluate and protect privacy. More info: www.shokri.org