generateData: Generate Covariate-Dependent Data

View source: R/data.R

generateDataR Documentation

Generate Covariate-Dependent Data

Description

Generate a 1-dimensional extraneous covariate and p-dimensional Gaussian data with a precision matrix that varies as a continuous function of the extraneous covariate. This data is distributed similar to that used in the simulation study from (1)

Usage

generateData(p = 5, n1 = 60, n2 = 60, n3 = 60, Z = NULL, true_precision = NULL)

Arguments

p

positive integer; number of variables in the data matrix. 5 by default

n1

positive integer; number of observations in the first interval. 60 by default

n2

positive integer; number of observations in the second interval. 60 by default

n3

positive integer; number of observations in the third interval. 60 by default

Z

NULL or numeric vector; extraneous covariate values for each observation. If NULL, Z will be generated from a uniform distribution on each of the intervals

true_precision

NULL OR list of matrices of dimension p \times p; true precision matrix for each observation. If NULL, the true precision matrices will be generated dependent on Z. NULL by default

Value

Returns list with the following values:

X

a (n1 + n2 + n3) \times p numeric matrix, where the i-th row is drawn from a p-dimensional Gaussian with mean 0 and precision matrix true_precision[[i]]

Z

a (n1 + n2 + n3) \times 1 numeric matrix, where the i-th entry is the extraneous covariate z_i for observation i

true_precision

list of n1 + n2 + n3 matrices of dimension p \times p; the i-th matrix is the precision matrix for the i-th observation

interval

vector of length n1 + n2 + n3; interval assignments for each of the observations, where the i-th entry is the interval assignment for the i-th observation

Extraneous Covariate

If Z = NULL, then the generation of Z is as follows:

The first n1 observations have z_i from from a uniform distribution on the interval (-3, -1) (the first interval).

Observations n1 + 1 to n1 + n2 have z_i from from a uniform distribution on the interval (-1, 1) (the second interval).

Observations n1 + n2 + 1 to n1 + n2 + n3 have z_i from a uniform distribution on the interval (1, 3) (the third interval).

Precision Matrices

If true_precision = NULL, then the generation of the true precision matrices is as follows:

All precision matrices have 2 on the diagonal and 1 in the (2, 3)/ (3, 2) positions.

Observations in the first interval have a 1 in the (1, 2) / (1, 2) positions, while observations in the third interval have a 1 in the (1, 3)/ (3, 1) positions.

Observations in the second interval have 2 entries that vary as a linear function of their extraneous covariate. Let \beta = 1/2. Then, the (1, 2)/(2, 1) positions for the i-th observation in the second interval are \beta\cdot(1 - z_i), while the (1, 3)/ (3, 1) entries are \beta\cdot(1 + z_i).

Thus, as z_i approaches -1 from the right, the associated precision matrix becomes more similar to the matrix for observations in the first interval. Similarly, as z_i approaches 1 from the left, the matrix becomes more similar to the matrix for observations in the third interval.

Examples

## Not run: 
library(ggplot2)

# get the data
set.seed(12)
data <- generateData()
X <- data$X
Z <- data$Z
interval <- data$interval
prec <- data$true_precision

# get overall and within interval sample sizes
n <- nrow(X)
n1 <- sum(interval == 1)
n2 <- sum(interval == 2)
n3 <- sum(interval == 3)

# visualize the distribution of the extraneous covariate
ggplot(data.frame(Z = Z, interval = as.factor(interval))) +
  geom_histogram(aes(Z, fill = interval), color = "black", bins = n %/% 5)

# visualize the true precision matrices in each of the intervals

# interval 1
matViz(prec[[1]], incl_val = TRUE) +
  ggtitle(paste0("True precision matrix, interval 1, observations 1,...,", n1))

# interval 2 (varies continuously with Z)
cat("\nInterval 2, observations ", n1 + 1, ",...,", n1 + n2, sep = "")
int2_mats <- prec[interval == 2]
int2_inds <- c(5, n2 %/% 2, n2 - 5)
lapply(int2_inds, function(j) matViz(int2_mats[[j]], incl_val = TRUE) +
         ggtitle(paste("True precision matrix, interval 2, observation", j + n1)))

# interval 3
matViz(prec[[length(prec)]], incl_val = TRUE) +
  ggtitle(paste0("True precision matrix, interval 3, observations ",
                 n1 + n2 + 1, ",...,", n1 + n2 + n3))

# fit the model and visualize the estimated graphs
(out <- covdepGE(X, Z))
plot(out)

# visualize the posterior inclusion probabilities for variables (1, 3) and (1, 2)
inclusionCurve(out, 1, 2)
inclusionCurve(out, 1, 3)

## End(Not run)

JacobHelwig/covdepGE documentation built on April 11, 2024, 7:22 a.m.