WeightedERMDP.CMS: Privacy-preserving Weighted Empirical Risk Minimization
In DPpack: Differentially Private Statistical Analysis and Machine Learning

WeightedERMDP.CMS

R Documentation

Privacy-preserving Weighted Empirical Risk Minimization

Description

This class implements differentially private empirical risk minimization in the case where weighted observation-level losses are desired (such as weighted SVM \insertCiteYang2005DPpack). Currently, only the output perturbation method is implemented.

Details

To use this class for weighted empirical risk minimization, first use the new method to construct an object of this class with the desired function values and hyperparameters. After constructing the object, the fit method can be applied with a provided dataset, data bounds, weights, and weight bounds to fit the model. In fitting, the model stores a vector of coefficients coeff which satisfy differential privacy. These can be released directly, or used in conjunction with the predict method to privately predict the outcomes of new datapoints.

Note that in order to guarantee differential privacy for weighted empirical risk minimization, certain constraints must be satisfied for the values used to construct the object, as well as for the data used to fit. These conditions depend on the chosen perturbation method, though currently only output perturbation is implemented. Specifically, the provided loss function must be convex and differentiable with respect to y.hat, and the absolute value of the first derivative of the loss function must be at most 1. If objective perturbation is chosen (not currently implemented), the loss function must also be doubly differentiable and the absolute value of the second derivative of the loss function must be bounded above by a constant c for all possible values of y.hat and y, where y.hat is the predicted label and y is the true label. The regularizer must be 1-strongly convex and differentiable. It also must be doubly differentiable if objective perturbation is chosen. For the data x, it is assumed that if x represents a single row of the dataset X, then the l2-norm of x is at most 1 for all x. Note that because of this, a bias term cannot be included without appropriate scaling/preprocessing of the dataset. To ensure privacy, the add.bias argument in the fit and predict methods should only be utilized in subclasses within this package where appropriate preprocessing is implemented, not in this class. Finally, if weights are provided, they should be nonnegative, of the same length as y, and be upper bounded by a global or public bound which must also be provided.

Super class

DPpack::EmpiricalRiskMinimizationDP.CMS -> WeightedERMDP.CMS

Methods

Method `new()`

Create a new WeightedERMDP.CMS object.

Usage

WeightedERMDP.CMS$new(
  mapXy,
  loss,
  regularizer,
  eps,
  gamma,
  perturbation.method = "objective",
  c = NULL,
  mapXy.gr = NULL,
  loss.gr = NULL,
  regularizer.gr = NULL
)

Arguments

mapXy: Map function of the form mapXy(X, coeff) mapping input data matrix X and coefficient vector or matrix coeff to output labels y. Should return a column matrix of predicted labels for each row of X. See mapXy.sigmoid for an example.
loss: Loss function of the form loss(y.hat, y, w), where y.hat and y are matrices and w is a matrix or vector of weights of the same length as y. Should be defined such that it returns a matrix of weighted loss values for each element of y.hat and y. If w is not given, the function should operate as if uniform weights were given. See generate.loss.huber for an example. It must be convex and differentiable, and the absolute value of the first derivative of the loss function must be at most 1. Additionally, if the objective perturbation method is chosen, it must be doubly differentiable and the absolute value of the second derivative of the loss function must be bounded above by a constant c for all possible values of y.hat and y.
regularizer: String or regularization function. If a string, must be 'l2', indicating to use l2 regularization. If a function, must have form regularizer(coeff), where coeff is a vector or matrix, and return the value of the regularizer at coeff. See regularizer.l2 for an example. Additionally, in order to ensure differential privacy, the function must be 1-strongly convex and differentiable. If the objective perturbation method is chosen, it must also be doubly differentiable.
eps: Positive real number defining the epsilon privacy budget. If set to Inf, runs algorithm without differential privacy.
gamma: Nonnegative real number representing the regularization constant.
perturbation.method: String indicating whether to use the 'output' or the 'objective' perturbation methods \insertCitechaudhuri2011DPpack. Defaults to 'objective'. Currently, only the output perturbation method is supported.
c: Positive real number denoting the upper bound on the absolute value of the second derivative of the loss function, as required to ensure differential privacy for the objective perturbation method. This input is unnecessary if perturbation.method is 'output', but is required if perturbation.method is 'objective'. Defaults to NULL.
mapXy.gr: Optional function representing the gradient of the map function with respect to the values in coeff. If given, must be of the form mapXy.gr(X, coeff), where X is a matrix and coeff is a matrix or numeric vector. Should be defined such that the ith row of the output represents the gradient with respect to the ith coefficient. See mapXy.gr.sigmoid for an example. If not given, non-gradient based optimization methods are used to compute the coefficient values in fitting the model.
loss.gr: Optional function representing the gradient of the loss function with respect to y.hat and of the form loss.gr(y.hat, y, w), where y.hat and y are matrices and w is a matrix or vector of weights. Should be defined such that the ith row of the output represents the gradient of the (possibly weighted) loss function at the ith set of input values. See generate.loss.gr.huber for an example. If not given, non-gradient based optimization methods are used to compute the coefficient values in fitting the model.
regularizer.gr: Optional function representing the gradient of the regularization function with respect to coeff and of the form regularizer.gr(coeff). Should return a vector. See regularizer.gr.l2 for an example. If regularizer is given as a string, this value is ignored. If not given and regularizer is a function, non-gradient based optimization methods are used to compute the coefficient values in fitting the model.

Returns

A new WeightedERMDP.CMS object.

Method `fit()`

Fit the differentially private weighted empirical risk minimization model. This method runs either the output perturbation or the objective perturbation algorithm \insertCitechaudhuri2011DPpack (only output is currently implemented), depending on the value of perturbation.method used to construct the object, to generate an objective function. A numerical optimization method is then run to find optimal coefficients for fitting the model given the training data, weights, and hyperparameters. The built-in optim function using the "BFGS" optimization method is used. If mapXy.gr, loss.gr, and regularizer.gr are all given in the construction of the object, the gradient of the objective function is utilized by optim as well. Otherwise, non-gradient based optimization methods are used. The resulting privacy-preserving coefficients are stored in coeff.

Usage

WeightedERMDP.CMS$fit(
  X,
  y,
  upper.bounds,
  lower.bounds,
  add.bias = FALSE,
  weights = NULL,
  weights.upper.bound = NULL
)

Arguments

X: Dataframe of data to be fit.
y: Vector or matrix of true labels for each row of X.
upper.bounds: Numeric vector of length ncol(X) giving upper bounds on the values in each column of X. The ncol(X) values are assumed to be in the same order as the corresponding columns of X. Any value in the columns of X larger than the corresponding upper bound is clipped at the bound.
lower.bounds: Numeric vector of length ncol(X) giving lower bounds on the values in each column of X. The ncol(X) values are assumed to be in the same order as the corresponding columns of X. Any value in the columns of X larger than the corresponding upper bound is clipped at the bound.
add.bias: Boolean indicating whether to add a bias term to X. Defaults to FALSE.
weights: Numeric vector of observation weights of the same length as y.
weights.upper.bound: Numeric value representing the global or public upper bound on the weights.

Method `predict()`

Predict label(s) for given X using the fitted coefficients.

Usage

WeightedERMDP.CMS$predict(X, add.bias = FALSE)

Arguments

X: Dataframe of data on which to make predictions. Must be of same form as X used to fit coefficients.
add.bias: Boolean indicating whether to add a bias term to X. Defaults to FALSE. If add.bias was set to TRUE when fitting the coefficients, add.bias should be set to TRUE for predictions.

Returns

Matrix of predicted labels corresponding to each row of X.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

WeightedERMDP.CMS$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

\insertRef

chaudhuri2011DPpack

\insertRef

Yang2005DPpack

Examples

# Build train dataset X and y, and test dataset Xtest and ytest
N <- 200
K <- 2
X <- data.frame()
y <- data.frame()
for (j in (1:K)){
  t <- seq(-.25, .25, length.out = N)
  if (j==1) m <- stats::rnorm(N,-.2, .1)
  if (j==2) m <- stats::rnorm(N, .2, .1)
  Xtemp <- data.frame(x1 = 3*t , x2 = m - t)
  ytemp <- data.frame(matrix(j-1, N, 1))
  X <- rbind(X, Xtemp)
  y <- rbind(y, ytemp)
}
Xtest <- X[seq(1,(N*K),10),]
ytest <- y[seq(1,(N*K),10),,drop=FALSE]
X <- X[-seq(1,(N*K),10),]
y <- y[-seq(1,(N*K),10),,drop=FALSE]

# Construct object for weighted linear SVM
mapXy <- function(X, coeff) X%*%coeff
# Huber loss from DPpack
huber.h <- 0.5
loss <- generate.loss.huber(huber.h)
regularizer <- 'l2' # Alternatively, function(coeff) coeff%*%coeff/2
eps <- 1
gamma <- 1
perturbation.method <- 'output'
c <- 1/(2*huber.h) # Required value for SVM
mapXy.gr <- function(X, coeff) t(X)
loss.gr <- generate.loss.gr.huber(huber.h)
regularizer.gr <- function(coeff) coeff
wermdp <- WeightedERMDP.CMS$new(mapXy, loss, regularizer, eps,
                                gamma, perturbation.method, c,
                                mapXy.gr, loss.gr,
                                regularizer.gr)

# Fit with data
# Bounds for X based on construction
upper.bounds <- c( 1, 1)
lower.bounds <- c(-1,-1)
weights <- rep(1, nrow(y)) # Uniform weighting
weights[nrow(y)] <- 0.5 # half weight for last observation
wub <- 1 # Public upper bound for weights
wermdp$fit(X, y, upper.bounds, lower.bounds, weights=weights,
           weights.upper.bound=wub)
wermdp$coeff # Gets private coefficients

# Predict new data points
predicted.y <- wermdp$predict(Xtest)
n.errors <- sum(round(predicted.y)!=ytest)

DPpack documentation built on Oct. 20, 2024, 9:07 a.m.

DPpack index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DPpack
Differentially Private Statistical Analysis and Machine Learning

WeightedERMDP.CMS: Privacy-preserving Weighted Empirical Risk Minimization
In DPpack: Differentially Private Statistical Analysis and Machine Learning

Privacy-preserving Weighted Empirical Risk Minimization

Description

Details

Super class

Methods

Public methods

Method `new()`

Usage

Arguments

Returns

Method `fit()`

Usage

Arguments

Method `predict()`

Usage

Arguments

Returns

Method `clone()`

Usage

Arguments

References

Examples

Related to WeightedERMDP.CMS in DPpack...

R Package Documentation

Browse R Packages

We want your feedback!

DPpack Differentially Private Statistical Analysis and Machine Learning

WeightedERMDP.CMS: Privacy-preserving Weighted Empirical Risk Minimization In DPpack: Differentially Private Statistical Analysis and Machine Learning

Privacy-preserving Weighted Empirical Risk Minimization

Description

Details

Super class

Methods

Public methods

Method new()

Usage

Arguments

Returns

Method fit()

Usage

Arguments

Method predict()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

References

Examples

Related to WeightedERMDP.CMS in DPpack...

R Package Documentation

Browse R Packages

We want your feedback!

DPpack
Differentially Private Statistical Analysis and Machine Learning

WeightedERMDP.CMS: Privacy-preserving Weighted Empirical Risk Minimization
In DPpack: Differentially Private Statistical Analysis and Machine Learning

Method `new()`

Method `fit()`

Method `predict()`

Method `clone()`