tune_linear_regression_model | R Documentation |
This function implements the privacy-preserving hyperparameter tuning
function for linear regression \insertCiteKifer2012DPpack using the
exponential mechanism. It accepts a list of DP models with various chosen
hyperparameters, a dataset X with corresponding values y, upper and lower
bounds on the columns of X and the values of y, and a boolean indicating
whether to add bias in the construction of each of the models. The data are
split into m+1 equal groups, where m is the number of models being compared.
One group is set aside as the validation group, and each of the other m
groups are used to train each of the given m models. The negative of the sum
of the squared error for each model on the validation set is used as the
utility values in the exponential mechanism
(ExponentialMechanism
) to select a tuned model in a
privacy-preserving way.
tune_linear_regression_model(
DPmodels,
X,
y,
upper.bounds,
lower.bounds,
add.bias = FALSE
)
DPmodels |
Vector of linear regression model objects, each initialized with a different combination of hyperparameter values from the search space for tuning. Each model should be initialized with the same epsilon privacy parameter value eps. The tuned model satisfies eps-level differential privacy. |
X |
Dataframe of data to be used in tuning the model. Note it is assumed the data rows and corresponding labels are randomly shuffled. |
y |
Vector or matrix of true values for each row of X. |
upper.bounds |
Numeric vector giving upper bounds on the values in each column of X and the values in y. Should be length ncol(X)+1. The first ncol(X) values are assumed to be in the same order as the corresponding columns of X, while the last value in the vector is assumed to be the upper bound on y. Any value in the columns of X and y larger than the corresponding upper bound is clipped at the bound. |
lower.bounds |
Numeric vector giving lower bounds on the values in each column of X and the values in y. Should be length ncol(X)+1. The first ncol(X) values are assumed to be in the same order as the corresponding columns of X, while the last value in the vector is assumed to be the lower bound on y. Any value in the columns of X and y smaller than the corresponding lower bound is clipped at the bound. |
add.bias |
Boolean indicating whether to add a bias term to X. Defaults to FALSE. |
Single model object selected from the input list DPmodels with tuned parameters.
Kifer2012DPpack
# Build example dataset
n <- 500
X <- data.frame(X=seq(-1,1,length.out = n))
true.theta <- c(-.3,.5) # First element is bias term
p <- length(true.theta)
y <- true.theta[1] + as.matrix(X)%*%true.theta[2:p] + stats::rnorm(n=n,sd=.1)
# Grid of possible gamma values for tuning linear regression model
grid.search <- c(100, 1, .0001)
# Construct objects for logistic regression parameter tuning
# Privacy budget should be the same for all models
eps <- 1
delta <- 0.01
linrdp1 <- LinearRegressionDP$new("l2", eps, delta, grid.search[1])
linrdp2 <- LinearRegressionDP$new("l2", eps, delta, grid.search[2])
linrdp3 <- LinearRegressionDP$new("l2", eps, delta, grid.search[3])
DPmodels <- c(linrdp1, linrdp2, linrdp3)
# Tune using data and bounds for X and y based on their construction
upper.bounds <- c( 1, 2) # Bounds for X and y
lower.bounds <- c(-1,-2) # Bounds for X and y
tuned.model <- tune_linear_regression_model(DPmodels, X, y, upper.bounds,
lower.bounds, add.bias=TRUE)
tuned.model$gamma # Gives resulting selected hyperparameter
# tuned.model result can be used the same as a trained LogisticRegressionDP model
tuned.model$coeff # Gives coefficients for tuned model
# Build a test dataset for prediction
Xtest <- data.frame(X=c(-.5, -.25, .1, .4))
predicted.y <- tuned.model$predict(Xtest, add.bias=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.