Description Usage Arguments Details Value Warning Note Author(s) References See Also Examples
PrivateLR implements two randomized algorithms for estimating L2regularized logistic regression coefficients that allow specifying the maximal effect a single point change in the training data are allowed to have. Specifically, the algorithms take as parameter the maximum allowed change in loglikelihood of producing particular coefficients resulting from any single training data point substitution.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25  dplr(object, ...)
## S3 method for class 'formula'
dplr(object, data, lambda=NA, eps=1, verbose=0,
rp.dim = 0, threshold='fixed', do.scale=FALSE, ...)
## S3 method for class 'numeric'
dplr(object, x, ...)
## S3 method for class 'logical'
dplr(object, x, ...)
## S3 method for class 'factor'
dplr(object, x, ...)
## S3 method for class 'data.frame'
dplr(object, target=ncol(object),...)
## S3 method for class 'matrix'
dplr(object, target=ncol(object),...)
## S3 method for class 'dplr'
predict(object, data, type = "probabilities", ...)
## S3 method for class 'dplr'
summary(object, ...)
## S3 method for class 'dplr'
print.summary(x, ...)
## S3 method for class 'dplr'
print(x, ...)
scaled(fml, data)

object 
can be given as an object of If a In If given as a vector, 
data 
a data frame or matrix containing the variables in the model described by

lambda 
the regularization parameter. If 
eps 
the privacy level. The coefficients of the model are computed by a
method that guarantees 
verbose 
regulates how much information is printed, 0 nothing, 1 a little, 2 more. 
rp.dim 
if 
threshold 

do.scale 
The privacy guarantees are for data where the covariate vectors lie
within the unit ball. If 
type 

x 
In the 
target 
the index of the column in 
fml 
A formula that describes the dimensions of the data that should be scaled into the unit ball. 
... 

The function dplr
implements logistic regression using the
differentially private methods by Chaudhuri, Monteleoni, and
Sarwate.
The interface is similar but not identical to that of lm
, with
the addition of the possibility of supplying a data matrix or
data.frame together with a target column index (defaults to
ncol(data)
).
The returned model instance has a convenience function
model$pred
that takes a data matrix or data frame to be
classified as input.
The print
function currently prints the summary.
The scaled
function scales data such that covariate vectors
lie within the unit ball. Note that the response variable is
put as the last column in the data frame data
returned.
Also, the response column name might have changed, depending on
the left side of the formula given.
A randomized algorithm A, taking a dataset as input, is said to be εdifferentially private if it holds that
 log(P(A(D) in S))  log(P(A(D') in S))  ≤ ε
for any
pair of datasets D,D' that differ in exactly one element, and any
set S. We now turn to the algorithms implemented by dplr
.
Let l2(v) denote the L2 norm of a vector v, and let
J(w, λ) = ALL(w) + λ/2 * l2(w)^2
where ALL(w) is the average logistic loss over the training data of size n and dimension d with labels y and covariates x. L2regularized logistic regression computes
w^* = argmin_w J(w, λ)
for a given λ.
The function dplr
implements two approaches to
εdifferential private L2 regularized logistic regression
(see the ... argument op
above).
The first is output perturbation, where we compute
w' = w^* + 2/(n * λ * ε) * b,
where b is a ddimensional real vector sampled with probability proportional to exp(l2(b)).
The second is objective perturbation. Let
F(w, λ, ε) = J(w, λ) + 2/(ε * n) * b %*% w
where n and b are as above. Let c = 0.25 and let z = 2 * log1p(c/(λ * n)), then if
ε  z > 0,
we compute
w' = argmin_w F(w, λ, ε  z)
otherwise we compute an adjusted lambda version
w' = argmin_w F(w, c/(n * (exp(ε/4)  1)), ε/2).
The logistic regression model coefficients w' are then εdifferentially private.
The dplr
function returns a class "dplr"
list object
comprised of elements including:
par 
the coefficients of the logistic model. 
coefficients 
same as 
value, counts, convergence, message 
these are as returned by the

CIndex 
the area under the ROC curve (aka., CIndex) of the model on its training data. 
eps 
the supplied privacy level. 
lambda 
the regularization parameter used 
n 
the number of data points 
d 
the dimensionality of the data points 
pred 
a convenience function such that 
p.tr 
this is the classification probability threshold. 
did.rp 
TRUE if random projection was performed. 
rp.dim 
if random projection was performed this contains the number of dimensions projected onto. Only present if random projection was performed. 
rp.p 
the projection matrix used for random projection. Only present if random projection was performed. 
scaled 
TRUE if data was scaled by providing 
status 
a text string indicating the status of the computations.

The scaled
function returns a list of the following:
data 
the scaled data frame 
scale 
the scaling factor used. 
The privacy level is only guaranteed for the coefficients of the
model, not for all the other returned values, and also only in the
case when input data points (potentially after expansion of factors) are
of L2norm <= 1. In particular using prediction thresholds
estimated using data (methods 'youden'
and 'topleft'
),
as well as built in scaling of data is not guaranteed.
Both of these are turned off by default.
This implementation was in part supported by NIH NLM grant 7R01LM00727307 and NIH Roadmap for Medical Research grant U54 HL108460.
Staal A. Vinterbo <sav@ucsd.edu>
Chaudhuri K., Monteleoni C., and Sarwate, A. Differentially Private Empirical Risk Minimization. JMLR, 2011, 12, 10691109
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28  data(iris)
# the following two are equivalent
# and predict Species being any
# but the first factor level.
model < dplr(iris)
model < dplr(Species ~ ., iris)
# pick a particular factor level and privacy level 2
model < dplr(I(Species != 'setosa') ~ ., iris, eps=2)
# The following is again equivalent to the two first
# examples. Note that we need to remove 'Species' from the
# covariate matrix/data frame, and
# that the class reported by summary will now
# not be 'Species' but 'dplr.class'.
model < dplr(iris$Species, iris[,5])
# two equivalent methods to get at the predicted
# probabilities
p < model$pred(iris)
p < predict(model, iris)
# print a summary of the model. Note that
# only the coefficients are guaranteed
# to be generated in an epsdifferentially
# private manner.
summary(model)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.