generateData: Generate Covariate-Dependent Data
In JacobHelwig/covdepGE: Covariate Dependent Graph Estimation

View source: R/data.R

generateData

R Documentation

Generate Covariate-Dependent Data

Description

Generate a 1-dimensional extraneous covariate and p-dimensional Gaussian data with a precision matrix that varies as a continuous function of the extraneous covariate. This data is distributed similar to that used in the simulation study from (1)

Usage

generateData(p = 5, n1 = 60, n2 = 60, n3 = 60, Z = NULL, true_precision = NULL)

Arguments

`p`	positive integer; number of variables in the data matrix. `5` by default
`n1`	positive integer; number of observations in the first interval. `60` by default
`n2`	positive integer; number of observations in the second interval. `60` by default
`n3`	positive integer; number of observations in the third interval. `60` by default
`Z`	`NULL` or numeric vector; extraneous covariate values for each observation. If `NULL`, `Z` will be generated from a uniform distribution on each of the intervals
`true_precision`	`NULL` OR list of matrices of dimension `p \times p`; true precision matrix for each observation. If `NULL`, the true precision matrices will be generated dependent on `Z`. `NULL` by default

Value

Returns list with the following values:

`X`	a `(n1 + n2 + n3)` `\times p` numeric matrix, where the `i`-th row is drawn from a `p`-dimensional Gaussian with mean `0` and precision matrix `true_precision[[i]]`
`Z`	a `(n1 + n2 + n3)` `\times 1` numeric matrix, where the `i`-th entry is the extraneous covariate `z_i` for observation `i`
`true_precision`	list of `n1 + n2 + n3` matrices of dimension `p \times p`; the `i`-th matrix is the precision matrix for the `i`-th observation
`interval`	vector of length `n1 + n2 + n3`; interval assignments for each of the observations, where the `i`-th entry is the interval assignment for the `i`-th observation

Extraneous Covariate

If Z = NULL, then the generation of Z is as follows:

The first n1 observations have z_i from from a uniform distribution on the interval (-3, -1) (the first interval).

Observations n1 + 1 to n1 + n2 have z_i from from a uniform distribution on the interval (-1, 1) (the second interval).

Observations n1 + n2 + 1 to n1 + n2 + n3 have z_i from a uniform distribution on the interval (1, 3) (the third interval).

Precision Matrices

If true_precision = NULL, then the generation of the true precision matrices is as follows:

All precision matrices have 2 on the diagonal and 1 in the (2, 3)/ (3, 2) positions.

Observations in the first interval have a 1 in the (1, 2) / (1, 2) positions, while observations in the third interval have a 1 in the (1, 3)/ (3, 1) positions.

Observations in the second interval have 2 entries that vary as a linear function of their extraneous covariate. Let \beta = 1/2. Then, the (1, 2)/(2, 1) positions for the i-th observation in the second interval are \beta\cdot(1 - z_i), while the (1, 3)/ (3, 1) entries are \beta\cdot(1 + z_i).

Thus, as z_i approaches -1 from the right, the associated precision matrix becomes more similar to the matrix for observations in the first interval. Similarly, as z_i approaches 1 from the left, the matrix becomes more similar to the matrix for observations in the third interval.

Examples

## Not run: 
library(ggplot2)

# get the data
set.seed(12)
data <- generateData()
X <- data$X
Z <- data$Z
interval <- data$interval
prec <- data$true_precision

# get overall and within interval sample sizes
n <- nrow(X)
n1 <- sum(interval == 1)
n2 <- sum(interval == 2)
n3 <- sum(interval == 3)

# visualize the distribution of the extraneous covariate
ggplot(data.frame(Z = Z, interval = as.factor(interval))) +
  geom_histogram(aes(Z, fill = interval), color = "black", bins = n %/% 5)

# visualize the true precision matrices in each of the intervals

# interval 1
matViz(prec[[1]], incl_val = TRUE) +
  ggtitle(paste0("True precision matrix, interval 1, observations 1,...,", n1))

# interval 2 (varies continuously with Z)
cat("\nInterval 2, observations ", n1 + 1, ",...,", n1 + n2, sep = "")
int2_mats <- prec[interval == 2]
int2_inds <- c(5, n2 %/% 2, n2 - 5)
lapply(int2_inds, function(j) matViz(int2_mats[[j]], incl_val = TRUE) +
         ggtitle(paste("True precision matrix, interval 2, observation", j + n1)))

# interval 3
matViz(prec[[length(prec)]], incl_val = TRUE) +
  ggtitle(paste0("True precision matrix, interval 3, observations ",
                 n1 + n2 + 1, ",...,", n1 + n2 + n3))

# fit the model and visualize the estimated graphs
(out <- covdepGE(X, Z))
plot(out)

# visualize the posterior inclusion probabilities for variables (1, 3) and (1, 2)
inclusionCurve(out, 1, 2)
inclusionCurve(out, 1, 3)

## End(Not run)

JacobHelwig/covdepGE documentation built on May 31, 2024, 12:13 a.m.