knitr::opts_knit$set( stop_on_error = 2L ) knitr::opts_chunk$set( fig.height = 7, fig.width = 7 )
dpVariance
The dpVariance
class evaluates a privacy-preserving variance of a vector of values. The class supports any vector type that can be represented numerically, meaning that it can handle the R types numeric
, integer
, and logical
.
# import the library library(PSIlence) # example data x1 <- c(3, 12, 20, 42, 33, 65, 70, 54, 33, 45) x2 <- c(TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE) data <- data.frame(x1, x2) # example on a numeric variable dpVarianceExample <- dpVariance$new(mechanism='mechanismLaplace', varType='numeric', variable='x1', epsilon=10, n=10, rng=c(0, 70)) dpVarianceExample$release(data) print(dpVarianceExample$result) # example on a logical variable dpVarianceExample2 <- dpVariance$new(mechanism='mechanismLaplace', varType='logical', variable='x2', epsilon=0.1, n=10, rng=c(0, 1)) dpVarianceExample2$release(data) print(dpVarianceExample2$result)
In typical usage, there are two methods to the dpVariance
class: the new
method and the release
method. The new
method does not touch any data, it just creates an object that can calculate a differentially private covariance matrix. Only the release
method touches data, and applies the functionality of the previously created object to the data.
mechanism
\ Character, the class name of the mechanism used to perturb the true estimate, must be 'mechanismLaplace'
.
varType
\ Character, the type of values in the data frame that will be passed to the mechanism. Should be one of 'numeric'
, 'integer'
, or 'logical'
.
variable
\ character, the name of the variable in the data for which to calculate the variance.
n
\ Integer, the number of observations in the vector.
rng
\ Numeric, a 2-tuple giving an a priori estimate of the lower and upper bounds of the vector.
epsilon
\ Numeric, the differential privacy parameter $\epsilon$, typically taking values between 0 and 1 and reflecting the privacy cost of the query. Optional, default NULL
. If NULL
, the user must specify a value for accuracy
.
accuracy
\ Numeric, the accuracy of the query. Optional, default NULL
. If NULL
, the user must specify a value for epsilon
. If epsilon
is not NULL
, this value is ignored and evaluated internally.
imputeRng
\ Numeric, a 2-tuple giving a range within which missing values of the vector are imputed. Optional, default NULL
. If NULL
, missing values are imputed using the range provided in rng
. See Notes below for more information.
alpha
\ Numeric, the statistical significance level used in evaluating accuracy and privacy parameters. If the bootstrap is employed, alpha
is also used to trim the release. Default 0.05
.
The release
method accepts a single argument.
data
\ Data frame containing numeric columns corresponding the name specified in variable
.The release
method makes a call to the mechanism, which generates a list of statistical summaries available on the result
field.
result
List, contains the accuracy guarantee, privacy cost, and private release. Other elements reflecting variable post-processing of the release are also included.
The list in the result
attribute has the following values.
release
\ Differentially private estimate of the variance.variable
\ The variable n the data for which the differentially private variance was calculated.std
\ Differentially private estimate of the standard deviation of the variable (the square root of the variance estimate).accuracy
\ The accuracy guarantee of the release, given epsilon
, if epsilon
was entered. Otherwise, this is the accuracy value entered by the user.epsilon
\ The privacy cost required to guarantee accuracy
, if accuracy
was entered. Otherwise, this is the epsilon value entered by the user.
Import the PSIlence library and attach the sample dataset:
library(PSIlence) data(PUMS5extract10000)
Numeric Example
To calculate a private variance of a numeric vector with dpVariance
, enter the mechanism (this will be the Laplace Mechanism, or 'mechanismLaplace'
), the variable type ('numeric'
), the variable of interest (the column name of the variable in the dataframe), the number of observations in the dataframe, the epsilon value (generally less than 1), and the range:
numericVariance <- dpVariance$new(mechanism='mechanismLaplace', varType='numeric', variable='income', n=10000, epsilon=0.1, rng=c(0, 750000)) numericVariance$release(PUMS5extract10000) print(numericVariance$result)
Logical Example
To calculate the variance of a logical variable, input a the name of a logical vector into variable
and update varType
to 'logical'
. Note: you do not need to enter a range for a logical variable (because the range is known to be c(0,1)).
logicalVariance <- dpVariance$new(mechanism='mechanismLaplace', varType='logical', variable='married', n=10000, epsilon=0.1, rng=c(0, 1)) logicalVariance$release(PUMS5extract10000) print(logicalVariance$result)
imputeRng
argument, the imputation strategy is to use a Uniform distribution to choose any value in the imputation range with equal probability.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.