View source: R/add_correlated_data.R
addCorGen | R Documentation |
Create multivariate (correlated) data - for general distributions
addCorGen(
dtOld,
nvars = NULL,
idvar = "id",
rho = NULL,
corstr = NULL,
corMatrix = NULL,
dist,
param1,
param2 = NULL,
cnames = NULL,
method = "copula",
...
)
dtOld |
The data set that will be augmented. If the data set includes a single record per id, the new data table will be created as a "wide" data set. If the original data set includes multiple records per id, the new data set will be in "long" format. |
nvars |
The number of new variables to create for each id. This is only applicable when the data are generated from a data set that includes one record per id. |
idvar |
String variable name of column represents individual level id for correlated data. |
rho |
Correlation coefficient, -1 <= rho <= 1. Use if corMatrix is not provided. |
corstr |
Correlation structure of the variance-covariance matrix defined by sigma and rho. Options include "cs" for a compound symmetry structure and "ar1" for an autoregressive structure. |
corMatrix |
Correlation matrix can be entered directly. It must be symmetrical and positive semi-definite. It is not a required field; if a matrix is not provided, then a structure and correlation coefficient rho must be specified. |
dist |
A string indicating "normal", "binary", "poisson" or "gamma". |
param1 |
A string that represents the column in dtOld that contains the parameter for the mean of the distribution. In the case of the uniform distribution the column specifies the minimum. |
param2 |
A string that represents the column in dtOld that contains a possible second parameter for the distribution. For the normal distribution, this will be the variance; for the gamma distribution, this will be the dispersion; and for the uniform distribution, this will be the maximum. |
cnames |
Explicit column names. A single string with names separated by commas. If no string is provided, the default names will be V#, where # represents the column. |
method |
Two methods are available to generate correlated data. (1) "copula" uses the multivariate Gaussian copula method that is applied to all other distributions; this applies to all available distributions. (2) "ep" uses an algorithm developed by Emrich and Piedmonte (1991). |
... |
May include additional arguments that have been deprecated and are no longer used. |
Original data.table with added column(s) of correlated data
Emrich LJ, Piedmonte MR. A Method for Generating High-Dimensional Multivariate Binary Variates. The American Statistician 1991;45:302-4.
# Wide example
def <- defData(varname = "xbase", formula = 5, variance = .4, dist = "gamma", id = "cid")
def <- defData(def, varname = "lambda", formula = ".5 + .1*xbase", dist = "nonrandom", link = "log")
dt <- genData(100, def)
addCorGen(
dtOld = dt, idvar = "cid", nvars = 3, rho = .7, corstr = "cs",
dist = "poisson", param1 = "lambda"
)
# Long example
def <- defData(varname = "xbase", formula = 5, variance = .4, dist = "gamma", id = "cid")
def2 <- defDataAdd(
varname = "p", formula = "-3+.2*period + .3*xbase",
dist = "nonrandom", link = "logit"
)
dt <- genData(100, def)
dtLong <- addPeriods(dt, idvars = "cid", nPeriods = 3)
dtLong <- addColumns(def2, dtLong)
addCorGen(
dtOld = dtLong, idvar = "cid", nvars = NULL, rho = .7, corstr = "cs",
dist = "binary", param1 = "p"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.