LinearRegressionDP | R Documentation |
This class implements differentially private linear regression using the objective perturbation technique \insertCiteKifer2012DPpack.
To use this class for linear regression, first use the new
method to construct an object of this class with the desired function
values and hyperparameters. After constructing the object, the fit
method can be applied with a provided dataset and data bounds to fit the
model. In fitting, the model stores a vector of coefficients coeff
which satisfy differential privacy. These can be released directly, or used
in conjunction with the predict
method to privately predict the
outcomes of new datapoints.
Note that in order to guarantee differential privacy for linear regression,
certain constraints must be satisfied for the values used to construct the
object, as well as for the data used to fit. The regularizer must be
convex. Additionally, it is assumed that if x represents a single row of
the dataset X, then the l2-norm of x is at most p for all x, where p is the
number of predictors (including any possible intercept term). In order to
ensure this constraint is satisfied, the dataset is preprocessed and
scaled, and the resulting coefficients are postprocessed and un-scaled so
that the stored coefficients correspond to the original data. Due to this
constraint on x, it is best to avoid using an intercept term in the model
whenever possible. If an intercept term must be used, the issue can be
partially circumvented by adding a constant column to X before fitting the
model, which will be scaled along with the rest of X. The fit
method
contains functionality to add a column of constant 1s to X before scaling,
if desired.
DPpack::EmpiricalRiskMinimizationDP.KST
-> LinearRegressionDP
new()
Create a new LinearRegressionDP object.
LinearRegressionDP$new(regularizer, eps, delta, gamma, regularizer.gr = NULL)
regularizer
String or regularization function. If a string, must be
'l2', indicating to use l2 regularization. If a function, must have form
regularizer(coeff)
, where coeff
is a vector or matrix, and
return the value of the regularizer at coeff
. See
regularizer.l2
for an example. Additionally, in order to
ensure differential privacy, the function must be convex.
eps
Positive real number defining the epsilon privacy budget. If set to Inf, runs algorithm without differential privacy.
delta
Nonnegative real number defining the delta privacy parameter. If 0, reduces to pure eps-DP.
gamma
Nonnegative real number representing the regularization constant.
regularizer.gr
Optional function representing the gradient of the
regularization function with respect to coeff
and of the form
regularizer.gr(coeff)
. Should return a vector. See
regularizer.gr.l2
for an example. If regularizer
is
given as a string, this value is ignored. If not given and
regularizer
is a function, non-gradient based optimization methods
are used to compute the coefficient values in fitting the model.
A new LinearRegressionDP object.
fit()
Fit the differentially private linear regression model. The
function runs the objective perturbation algorithm
\insertCiteKifer2012DPpack to generate an objective function. A
numerical optimization method is then run to find optimal coefficients
for fitting the model given the training data and hyperparameters. The
nloptr
function is used. If regularizer
is given as
'l2' or if regularizer.gr
is given in the construction of the
object, the gradient of the objective function and the Jacobian of the
constraint function are utilized for the algorithm, and the NLOPT_LD_MMA
method is used. If this is not the case, the NLOPT_LN_COBYLA method is
used. The resulting privacy-preserving coefficients are stored in coeff.
LinearRegressionDP$fit(X, y, upper.bounds, lower.bounds, add.bias = FALSE)
X
Dataframe of data to be fit.
y
Vector or matrix of true values for each row of X
.
upper.bounds
Numeric vector of length ncol(X)+1
giving upper
bounds on the values in each column of X
and the values of
y
. The last value in the vector is assumed to be the upper bound
on y
, while the first ncol(X)
values are assumed to be in
the same order as the corresponding columns of X
. Any value in the
columns of X
and in y
larger than the corresponding upper
bound is clipped at the bound.
lower.bounds
Numeric vector of length ncol(X)+1
giving lower
bounds on the values in each column of X
and the values of
y
. The last value in the vector is assumed to be the lower bound
on y
, while the first ncol(X)
values are assumed to be in
the same order as the corresponding columns of X
. Any value in the
columns of X
and in y
larger than the corresponding lower
bound is clipped at the bound.
add.bias
Boolean indicating whether to add a bias term to X
.
Defaults to FALSE.
clone()
The objects of this class are cloneable with this method.
LinearRegressionDP$clone(deep = FALSE)
deep
Whether to make a deep clone.
Kifer2012DPpack
# Build example dataset
n <- 500
X <- data.frame(X=seq(-1,1,length.out = n))
true.theta <- c(-.3,.5) # First element is bias term
p <- length(true.theta)
y <- true.theta[1] + as.matrix(X)%*%true.theta[2:p] + stats::rnorm(n=n,sd=.1)
# Construct object for linear regression
regularizer <- 'l2' # Alternatively, function(coeff) coeff%*%coeff/2
eps <- 1
delta <- 0 # Indicates to use pure eps-DP
gamma <- 1
regularizer.gr <- function(coeff) coeff
lrdp <- LinearRegressionDP$new('l2', eps, delta, gamma, regularizer.gr)
# Fit with data
# We must assume y is a matrix with values between -p and p (-2 and 2
# for this example)
upper.bounds <- c(1, 2) # Bounds for X and y
lower.bounds <- c(-1,-2) # Bounds for X and y
lrdp$fit(X, y, upper.bounds, lower.bounds, add.bias=TRUE)
lrdp$coeff # Gets private coefficients
# Predict new data points
# Build a test dataset
Xtest <- data.frame(X=c(-.5, -.25, .1, .4))
predicted.y <- lrdp$predict(Xtest, add.bias=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.