Diffprivlib: Privacy-preserving machine learning with Scikit-learn

Cite

Related Material

EuroPython

Holohan, Naoise

Formal Metadata

Title

Diffprivlib: Privacy-preserving machine learning with Scikit-learn

Subtitle

Train machine learning models with differential privacy guarantees

Title of Series

EuroPython 2020

Number of Parts

130

Author

Holohan, Naoise

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/50088 (DOI)

Publisher

EuroPython

Release Date

2020

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Data privacy is having an ever-increasing impact on the way data is stored, processed, accessed and utilised, as the legal and ethical effects of data protection regulations take effect around the globe. Differential privacy, considered by many to be the strongest privacy guarantee currently available, gives robust, provable guarantees on protecting privacy, and allows tasks to be completed on data with guarantees on the privacy of individuals in that data. This naturally extends to machine learning, where training datasets can contain sensitive personal information, that are vulnerable to privacy attacks on trained models. By using differential privacy in the training process, a machine learning model can be trained to accurately represent the dataset at large, but without inadvertently revealing sensitive information about an individual. Diffprivlib is the first library of its kind to leverage the power of differential privacy with scikit-learn and numpy to give data scientists and researchers access to the tools to train accurate, portable models with robust, provable privacy guarantees built-in. In this talk, we will introduce attendees to the idea of differential privacy, why it is necessary in today's world, and how diffprivlib can be seamlessly integrated within existing scripts to protect your trained models from privacy vulnerabilities. Attendees will be expected to have a basic understanding of sklearn (i.e., how to initialise, fit and predict a model). No knowledge of data privacy or differential privacy will be assumed or required.