gen_data | R Documentation |
Generates data of mixed types from the latent Gaussian copula model.
gen_data( n = 100, types = c("ter", "con"), rhos = 0.5, copulas = "no", XP = NULL, showplot = FALSE )
n |
A positive integer indicating the sample size. The default value is 100. |
types |
A vector indicating the type of each variable, could be |
rhos |
A vector with lower-triangular elements of desired correlation matrix, e.g. |
copulas |
A vector indicating the copula transformation f for each of the p variables, e.g. U = f(Z). Each element can take value |
XP |
A list of length p indicating proportion of zeros (for binary and truncated), and proportions of zeros and ones (for ternary) for each of the variables. For continuous variable, NA should be supplied. If |
showplot |
Logical indicator. If TRUE, generates the plot of the data when number of variables p is no more than 3. The default value is FALSE. |
gen_data
returns a list containing
X: Generated data matrix (n by p) of observed variables.
plotX: Visualization of the data matrix X.
Histogram if p=1
. 2D Scatter plot if p=2
. 3D scatter plot if p=3
. Returns NULL if showplot = FALSE
.
Fan J., Liu H., Ning Y. and Zou H. (2017) "High dimensional semiparametric latent graphicalmodel for mixed data" doi: 10.1111/rssb.12168.
Yoon G., Carroll R.J. and Gaynanova I. (2020) "Sparse semiparametric canonical correlation analysis for data of mixed types" doi: 10.1093/biomet/asaa007.
# Generate single continuous variable with exponential transformation (always greater than 0) # and show histogram. simdata = gen_data(n = 100, copulas = "expo", types = "con", showplot = FALSE) X = simdata$X; plotX = simdata$plotX # Generate a pair of variables (ternary and continuous) with default proportions # and without copula transformation. simdata = gen_data() X = simdata$X # Generate 3 variables (binary, ternary and truncated) # corresponding copulas for each variables are "no" (no transformation), # "cube" (cube transformation) and "cube" (cube transformation). # binary variable has 30% of zeros, ternary variable has 20% of zeros # and 40% of ones, truncated variable has 50% of zeros. # Then show the 3D scatter plot (data points project on either 0 or 1 on Axis X1; # on 0, 1 or 2 on Axas X2; on positive domain on Axis X3) simdata = gen_data(n = 100, rhos = c(.3, .4, .5), copulas = c("no", "cube", "cube"), types = c("bin", "ter", "tru"), XP = list(.3, c(.2, .4), .5), showplot = TRUE) X = simdata$X; plotX = simdata$plotX # Check the proportion of zeros for the binary variable. sum(simdata$X[ , 1] == 0) # Check the proportion of zeros and ones for the ternary variable. sum(simdata$X[ , 2] == 0); sum(simdata$X[ , 2] == 1) # Check the proportion of zeros for the truncated variable. sum(simdata$X[ , 3] == 0)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.