knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" ) set.seed(3033362) # for reproducibility
#version <- as.vector(read.dcf('DESCRIPTION')[, 'Version']) #version <- gsub('-', '.', version) version <- "0.4.2.9000"
#dep <- as.vector(read.dcf('DESCRIPTION')[, 'Depends']) #m <- regexpr('R *\\(>= \\d+.\\d+.\\d+\\)', dep) #rm <- regmatches(dep, m) #rvers <- gsub('.*(\\d+.\\d+.\\d+).*', '\\1', rm) rvers <- "3.4.0"
The diffpriv
package makes privacy-aware data science in R easy.
diffpriv
implements the formal framework of differential privacy:
differentially-private mechanisms can safely release to untrusted third parties:
statistics computed, models fit, or arbitrary structures derived on
privacy-sensitive data. Due to the worst-case nature of the framework, mechanism
development typically requires involved theoretical analysis. diffpriv
offers
a turn-key approach to differential privacy by automating this process with
sensitivity sampling in place of theoretical sensitivity analysis.
Obtaining diffpriv
is easy. From within R:
## Install the release version of diffpriv from CRAN: install.packages("diffpriv") ## Install the latest development version of diffpriv from GitHub: install.packages("devtools") devtools::install_github("brubinstein/diffpriv")
A typical example in differential privacy is privately releasing a simple
target
function of privacy-sensitive input data X
. Say the mean of
numeric
data:
## a target function we'd like to run on private data X, releasing the result target <- function(X) mean(X)
First load the diffpriv
package (installed as above) and construct a
chosen differentially-private mechanism for privatizing target
.
## target seeks to release a numeric, so we'll use the Laplace mechanism---a ## standard generic mechanism for privatizing numeric responses library(diffpriv) mech <- DPMechLaplace(target = target)
To run mech
on a dataset X
we must first determine the sensitivity of
target
to small changes to input dataset. One avenue is to analytically bound
sensitivity (on paper; see the vignette) and supply it
via the sensitivity
argument of mechanism construction: in this case not hard
if we assume bounded data, but in general sensitivity can be very non-trivial
to calculate manually. The other approach, which we follow in this example, is
sensitivity sampling: repeated probing of target
to estimate sensitivity
automatically. We need only specify a distribution for generating random probe
datasets; sensitivitySampler()
takes care of the rest. The price we pay for
this convenience is the weaker form of random differential privacy.
## set a dataset sampling distribution, then estimate target sensitivity with ## sufficient samples for subsequent mechanism responses to achieve random ## differential privacy with confidence 1-gamma distr <- function(n) rnorm(n) mech <- sensitivitySampler(mech, oracle = distr, n = 5, gamma = 0.1) mech@sensitivity ## DPMech and subclasses are S4: slots accessed via @
With a sensitivity-calibrated mechanism in hand, we can release private
responses on a dataset X
, displayed alongside the non-private response
for comparison:
X <- c(0.328,-1.444,-0.511,0.154,-2.062) # length is sensitivitySampler() n r <- releaseResponse(mech, privacyParams = DPParamsEps(epsilon = 1), X = X) cat("Private response r$response: ", r$response, "\nNon-private response target(X):", target(X))
The above example demonstrates the main components of diffpriv
:
DPMech
for generic mechanisms that captures the non-private
target
and releases privatized responses from it. Current subclassesDPMechLaplace
, DPMechGaussian
: the Laplace and Gaussian mechanisms
for releasing numeric responses with additive noise;DPMechExponential
: the exponential mechanism for privately
optimizing over finite sets (which need not be numeric); andDPMechBernstein
: the Bernstein mechanism for privately releasing
multivariate real-valued functions. See the
bernstein vignette for more.DPParamsEps
and subclasses for encapsulating privacy parameters.sensitivitySampler()
method of DPMech
subclasses estimates target
sensitivity necessary to run releaseResponse()
of DPMech
generic
mechanisms. This provides an easy alternative to exact sensitivity bounds
requiring mathematical analysis. The sampler repeatedly probes
DPMech@target
to estimate sensitivity to data perturbation. Running
mechanisms with obtained sensitivities yield random differential privacy.Read the package vignette for more, or news for the latest release notes.
diffpriv
is an open-source package offered with a permissive MIT License.
Please acknowledge use of diffpriv
by citing the paper on the sensitivity
sampler:
Benjamin I. P. Rubinstein and Francesco Aldà. "Pain-Free Random Differential Privacy with Sensitivity Sampling", to appear in the 34th International Conference on Machine Learning (ICML'2017), 2017.
Other relevant references to cite depending on usage:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.