DataRebuild: IPD reconstruction from IPD summaries only.

Description Usage Arguments Details Value Note See Also Examples

View source: R/ipd_rec.R

Description

'DataRebuild()' generates artificial data, that is stochastic copies of the original IPD, by taking empirical IPD distributional summaries as input data only.

Usage

1
2
3
4
5
6
7
DataRebuild(H, n, correlation.matrix, moments, x.mode,
  johnson.parameters = NULL, stochastic.integration = FALSE,
  data.rearrange = c("incomplete", "norta"), corrtype = c("rank.corr",
  "moment.corr", "normal.corr"), marg.model = c("gamma", "johnson"),
  variable.names = NULL, SBjohn.correction = F, compute.eec = F,
  checkdata = F, tabulate.similar.data = FALSE,
  SI_k = 8000, input.sn.corr = NULL)

Arguments

H

integer number of independent IPD replicates to be generated.

n

integer number of independent IPD records. Ex: number of rows (subjects) in original IPD.

correlation.matrix

pairwise IPD correlations values.

moments

numeric array of IPD marginal moments up to fourth degree for all IPD variables (columns).

x.mode

logical vector: is IPD marginal variable binary (TRUE) or not ?

johnson.parameters

array of Johnson parameters for each IPD marginal variable. Depends on CRAN archived 'JohnsonDistribution' package. If NULL it is computed on given 'moments'.

stochastic.integration

logical: should Monte Carlo integration be used to resolve Gaussian copula inversion (NORTA transformation)? Default to FALSE, that is numerical integration relying on package 'cubature' is used first.

data.rearrange

method of IPD dependence reconstruction based on all pairwise IPD correlations (norta), or on first degree correlations only (incomplete).

corrtype

what type of IPD correlation matrix are you feeding in ? Spearman (rank.corr), Pearson (moment.corr), or Waerden (normal.corr). see Details

marg.model

either "gamma" or "johnson" for modeling of non-binary IPD marginal. All binary marginals are modeled via a Bernoulli distribution, or a Beta distribution if Kruskal analytic conversion is used (see below).

variable.names

names of IPD marginal variables. If NULL (Default) automatic labels are generated.

SBjohn.correction

logical. Should be Johnson marginal values corrected ? Default to FALSE. If TRUE, wrongly sampled negative values are set to the minimum positive sampled value.

compute.eec

currently deprecated. Do not edit default value.

checkdata

logical: if TRUE it compares the IPD summary (marginal moments and pairwise correlations) averages over the H IPD reconstructions against the original IPD summary input values.

tabulate.similar.data

if TRUE and also checkdata = TRUE it returns the full tabular comparison between the reconstructed and original IPD summaries.

SI_k

resampling size of stochastic integration approach. Default to 8000.

NI_tol

error tolerance for numerical integration. Default 1e-02, do not decrease too much. As reasonable max value use 1e-05.

NI_maxEval

max number of evaluations during numerical integration. Default 500 (instead 0 implies infinite number of evaluations)..

input.sn.corr

solution of 'correlation.matrix' into standard normal space (the Gaussian copula parameter, see Details). Default is NULL and solution is found internally. If matrix solution is instead given, it overrides internal computations and it is directly used to generate artificial data. This can be useful as a post hoc data generation tuning procedure. See Details.

cp.finetune

logical. If NORTA method is used and x.mode = TRUE, it iteratively fine-tunes Kruskal analytic solution (corrtype = rank.corr) of copula parameter, until the correlation bias of the generated artificial data is reduced. It can also be used along with argument 'assume.all.smooth = TRUE' (see below). Default FALSE.

rescale.smoothed.binary

if Kruskal analytic conversion was used and x.mode = T, it rescales smoothed binary variables into integer format (typically needed). Default FALSE.

assume.all.smooth

logical. If NORTA method is used, it pretends an input Pearson correlation matrix is already a valid Kruskal solution, which falsely assumes all variables are continuous, when some are actually discrete. This is biased but it can yield quick (fine-tunable – see 'cp.finetune'). Default FALSE.

Details

'DataRebuild()' is based on a Gaussian Copula inversion technique also known as NORmal To Anything (NORTA) transformation. Inversion occurs upon conversion of an input empirical matrix into standard normal space (copula parameter solution). If data.rearrange = "norta", conversion (optimization) expects a Pearson correlation matrix as input (corrtype = "moment.corr" is chosen automatically default). Using "norta" and "rank.corr" performs Kruskal analytic conversion (theoretically valid if all marginals are continous), whereas "normal.corr" simply returns the input matrix as it is. If optimization fails with numerical integration (default), try stochastic integration (stochastic.integration = TRUE) instead.

Value

An object of class 'similar.data'.

Note

this program currently assumes that previous to calculation of the input IPD summaries every IPD categorical variable with m levels was first converted to m-1 dummy (binary) variables. As an alternative one can, in the future, allow for categorical marginals as well and use a Multinomial distribution modeling. This program relies on archived package 'JohnsonDistribution'.

See Also

[Return.key.IPD.summaries()] for allowed input IPD summary format, [FitJohnsonDistribution()] from archived package JohnsonDistribution, [adaptIntegrate()] from package cubature

Examples

1
2
3
4
5
6
## Not run: 
DataRebuild( H = 100, n = 1000 )

## End(Not run)

help("gcipdr")

bonorico/gcipdr documentation built on May 2, 2021, 8:12 p.m.