genCorGen: Create multivariate (correlated) data - for general...
In kgoldfeld/simstudy: Simulation of Study Data

View source: R/generate_correlated_data.R

genCorGen

R Documentation

Create multivariate (correlated) data - for general distributions

Description

Create multivariate (correlated) data - for general distributions

Usage

genCorGen(
  n,
  nvars,
  params1,
  params2 = NULL,
  dist,
  rho,
  corstr,
  corMatrix = NULL,
  wide = FALSE,
  cnames = NULL,
  method = "copula",
  idname = "id"
)

Arguments

`n`	Number of observations
`nvars`	Number of variables
`params1`	A single vector specifying the mean of the distribution. The vector is of length 1 if the mean is the same across all observations, otherwise the vector is of length nvars. In the case of the uniform distribution the vector specifies the minimum.
`params2`	A single vector specifying a possible second parameter for the distribution. For the normal distribution, this will be the variance; for the gamma distribution, this will be the dispersion; and for the uniform distribution, this will be the maximum. The vector is of length 1 if the mean is the same across all observations, otherwise the vector is of length nvars.
`dist`	A string indicating "binary", "poisson" or "gamma", "normal", or "uniform".
`rho`	Correlation coefficient, -1 <= rho <= 1. Use if corMatrix is not provided.
`corstr`	Correlation structure of the variance-covariance matrix defined by sigma and rho. Options include "cs" for a compound symmetry structure and "ar1" for an autoregressive structure.
`corMatrix`	Correlation matrix can be entered directly. It must be symmetrical and positive semi-definite. It is not a required field; if a matrix is not provided, then a structure and correlation coefficient rho must be specified.
`wide`	The layout of the returned file - if wide = TRUE, all new correlated variables will be returned in a single record, if wide = FALSE, each new variable will be its own record (i.e. the data will be in long form). Defaults to FALSE.
`cnames`	Explicit column names. A single string with names separated by commas. If no string is provided, the default names will be V#, where # represents the column.
`method`	Two methods are available to generate correlated data. (1) "copula" uses the multivariate Gaussian copula method that is applied to all other distributions; this applies to all available distributions. (2) "ep" uses an algorithm developed by Emrich and Piedmonte (1991).
`idname`	Character value that specifies the name of the id variable.

Value

data.table with added column(s) of correlated data

References

Emrich LJ, Piedmonte MR. A Method for Generating High-Dimensional Multivariate Binary Variates. The American Statistician 1991;45:302-4.

Examples

set.seed(23432)
lambda <- c(8, 10, 12)

genCorGen(100, nvars = 3, params1 = lambda, dist = "poisson", rho = .7, corstr = "cs")
genCorGen(100, nvars = 3, params1 = 5, dist = "poisson", rho = .7, corstr = "cs")
genCorGen(100, nvars = 3, params1 = lambda, dist = "poisson", rho = .7, corstr = "cs", wide = TRUE)
genCorGen(100, nvars = 3, params1 = 5, dist = "poisson", rho = .7, corstr = "cs", wide = TRUE)

genCorGen(100,
  nvars = 3, params1 = lambda, dist = "poisson", rho = .7, corstr = "cs",
  cnames = "new_var"
)
genCorGen(100,
  nvars = 3, params1 = lambda, dist = "poisson", rho = .7, corstr = "cs",
  wide = TRUE, cnames = "a, b, c"
)

kgoldfeld/simstudy documentation built on Aug. 2, 2024, 7:31 p.m.