repSifter: DataSifter II Algorithm for Time-varying Data

Description Usage Arguments Value

View source: R/repSifter.R

Description

Create a informative privacy-preserving time-varying dataset that guarantees subjects' privacy while preserving the information contained in the original dataset.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
repSifter(
  data,
  mispct,
  misw = "perij",
  lnames,
  timevar,
  ID,
  maxit = 10,
  crit = 0.05,
  cal.weights.method = "param"
)

Arguments

data

A data frame contains original data to be processed. The data must be in long format. Missingness is allowed in time-varying varaibles.

mispct

Percent of artificial missing that should be introduced for obfuscation. 20%-30% is recommended for utility preservation.

misw

Type of sampling weights or missingness level. "peri" is to consider weights on subject level, which means any subjects with partial missing would be excluded from complete cases. "perij" is to consider weights on subject and time level. Only subjects with all time points missing would be excluded from complete cases.

lnames

A vector of longitudinal variables names.

timevar

The time variable or cluster varaible name.

ID

Name of the ID variable in the dataset.

maxit

Maximal iteration. The default is 10 times.

crit

Critical value for the stopping criteria. The default is 0.05, which stops the algorithm when the absolute deviance of the imputed and original value is within 5% of the original values.

cal.weights.method

Raw data missingness model for calculating IPW weights. If method = "param", the function utilize logistic regression ("peri") or GLMM ("perij") for missingness model. If method = "nonparam", the function utilize random forest ("peri") for missingness model.

Value


SOCR/DataSifterII documentation built on Dec. 15, 2021, 10:29 a.m.