knitr::opts_knit$set( stop_on_error = 2L ) knitr::opts_chunk$set( fig.height = 7, fig.width = 7 )
dpCovariance
The dpCovariance
class evaluates a privacy-preserving covariance of a series of vectors of values. The class supports any vector type that can be represented numerically, meaning that it can handle the R types numeric
and integer
.
# import the library library(PSIlence) # example data x1 <- c(3, 12, 20, 42, 33, 65, 70, 54, 33, 45) x2 <- c(11, 42, 16, 20, 21, 86, 30, 50, 73, 94) data <- data.frame(x1, x2) # range of the example data # we do this here to make the ranges easier to pass into the call to dpCovariance range1 <- c(0,70) range2 <- c(0,100) ranges <- list(range1, range2) dpCovarianceExample <- dpCovariance$new(mechanism='mechanismLaplace', varType='numeric', n=10, epsilon=c(1,1,1), columns = c('x1', 'x2'), rng=ranges) dpCovarianceExample$release(data) print(dpCovarianceExample$result)
In typical usage, there are two methods to the dpCovariance
class: the new
method and the release
method. The new
method does not touch any data, it just creates an object that can calculate a differentially private covariance matrix. Only the release
method touches data, and applies the functionality of the previously created object to the data.
The new
method creates an object of the class, and accepts the following arguments:
mechanism
\ Character, the class name of the mechanism used to perturb the true estimate, must be 'mechanismLaplace'
.
varType
\ Character, the type of values in the data frame that will be passed to the mechanism. Should be one of 'numeric'
or 'integer'
.
n
\ Integer, the number of observations in the columns.
epsilon
\ Numeric, the differential privacy parameter $\epsilon$, typically taking values between 0 and 1 and reflecting the privacy cost of the query. The vector of epsilons match up with the induced covariance matrix by proceeding top-to-bottom down each column of the lower triangular matrix.
columns
\ Character, the columns of the data to include in the output covariance matrix
rng
\ Numeric, a matrix of 2-tuples with the lower and upper bounds for each of P variables in the data frame, dimensions Px2.
imputeRng
\ Numeric, a 2-tuple giving a range within which missing values of the vector are imputed. Optional, default NULL
. If NULL
, missing values are imputed using the range provided in rng
.
intercept
\ Logical, a vector of length one indicating whether an intercept should be added prior to calculaing the covarience. Optional, default FALSE
.
formula
\ Formula, a list of the regression equations to be performed on the covariance matrix.
delta
\ Numeric, the differential privacy parameter $\delta$, the likelihood of additional privacy loss beyond $\epsilon$. Typically takes values of $10^{-5}$ or less, should never be less than $1/n$
The release
method accepts a single argument.
data
\ Data frame containing numeric columns corresponding the names specified in columns
.The release
method makes a call to the mechanism, which generates a list of statistical summaries available on the result
field.
result
List, contains the covariance matrix and the list of the variables in the covariance matrix. Also includes other elements reflecting variable post-processing of the release.
The list in the result
attribute has the following values.
release
\ Differentially private estimate of the covariance matrix.variable
\ List of the variables in the covariance matrix (variables listed as strings)linearRegression
\ the function for the linear regression on the private estimate of the covariance matrixImport the PSIlence library and attach the sample datasets:
library(PSIlence) data(PUMS5extract10000)
To calculate a private covariance matrix of a set of numeric vectors with dpCovariance
, enter the mechanism (this will be the Laplace Mechanism, or 'mechanismLaplace'), the variable type ('numeric'), the columns of interest (the column names of the variables of interest in the dataframe), the number of observations in the dataframe, the epsilon value (generally less than 1), and the matrix of ranges of the chosen columns:
# before calculating the covariance, create a matrix of the column ranges, # where each row contains the minimum and maximum value of a column, # in the order of the columns as they are passed into the dpCovariance call income_range = c(0, 750000) education_range = c(1,16) age_range = c(0,120) ranges <- list(income_range, education_range, age_range) numeric_covariance <- dpCovariance$new(mechanism='mechanismLaplace', varType='numeric', columns=c('income', 'educ', 'age'), n=10000, epsilon=c(0.1,0.1,0.1,0.1,0.1,0.1), rng=ranges) numeric_covariance$release(PUMS5extract10000) print(numeric_covariance$result)
imputeRng
argument, the imputation strategy is to use a Uniform distribution to choose any value in the imputation range with equal probability.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.