IdentificationRisk: An Identification Risk Function
In RyanHornby/IdentificationRisk: Calculates Identification Risk for Synthetic Data

View source: R/IdentificationRiskContinuous.R

IdentificationRisk

R Documentation

An Identification Risk Function

Description

This function will compute the identification risk for a dataset with synthetic categorical variables. This function assumes categorical variables will be as factors.

Usage

IdentificationRisk(
  origdata,
  syndata,
  known,
  syn,
  r,
  percentage = TRUE,
  euclideanDist = FALSE
)

Arguments

`origdata`	dataframe of the original data
`syndata`	list of the different synthetic dataframes
`known`	vector of the names of the columns in the dataset assumed to be known
`syn`	vector of the names of the columns in the dataset that are synthetic
`r`	radius to compare with for continuous variables. Radius is either percentage (default) or fixed. Radius can be the same for all continuous variables or specific to each. To specify for each use a vector, with the radii ordered in the same order those columns appear in the dataset.
`percentage`	true for a percentage radius, false for a constant radius
`euclideanDist`	true for a euclidean distance radius, false otherwise