rccSim: Simulation of Data for RCCM

View source: R/rccSim.R

rccSimR Documentation

Simulation of Data for RCCM

Description

This function simulates data based on the Random Covariance Clustering Model (RCCM). Data is generated in a hierarchical manner, beginning with group-level networks and precision matrices and then subject-level networks and matrices.

Usage

rccSim(
  G = 2,
  clustSize = c(67, 37),
  p = 10,
  n = 177,
  overlap = 0.5,
  rho = 0.1,
  esd = 0.05,
  type = "hub",
  eprob = 0.5
)

Arguments

G

Positive integer. Number of groups or clusters.

clustSize

Positive integer or vector of positive integers. Number of subjects in each cluster.

p

Positive integer. Number of variables for each subject.

n

Positive integer. Number of observations for each subject on each variable.

overlap

Positive number between 0 and 1. Approximate proportion of overlapping edges across cluster-level networks.

rho

Positive number between 0 and 1. Approximate proportion of differential edges for subjects compared to their corresponding cluster-level network.

esd

Standard deviation of mean 0 noise added to generated subject-level matrices for variation from the corresponding group-level matrix.

type

Graph type. Options are "hub" or "random".

eprob

Probability of two nodes having an edge between them. Only applicable if type = "random".

Details

For simulating data for hub type graphs, G cluster-level networks are first generated, each with floor(√ p) hubs and thus E = p - floor(√ p) edges. For generating random graphs, G cluster-level networks are generating such that nodes are connected with a probability specified by eprob, yielding approximately E= (p choose 2) x eprob edges. Cluster-level networks are forced to share s = floor(overlap x E) edges. Note that overlap represents the approximate proportion of edges that are common across the cluster-level networks.

Then, for the K subject-level matrices, we first randomly assign them to the G clusters, and then subject-level networks are generated by randomly selecting floor(rho x E) node pairs to add or remove an edge from their corresponding cluster-level network. Non-zero entries for all precision matrices are generated from a uniform distribution with support on the interval [-1, -0.50] U [0.50, 1], and are adjusted until positive definite matrices are obtained.

Value

A list of length 5 containing:

  1. list of K multivariate-Gaussian data sets each of dimension n_k x p (simDat).

  2. p x p x G array of G number of cluster-level networks (g0s).

  3. p x p x G array of G number of cluster-level precision matrices (Omega0s).

  4. p x p x K array of K number of subject-level precision matrices (Omegaks).

  5. vector of length K containing cluster memberships for each subject (zgks).

Author(s)

Andrew DiLernia

Examples

# Generate data with 2 clusters with 12 and 10 subjects respectively,
# 15 variables for each subject, 100 observations for each variable for each subject,
# the groups sharing about 50% of network connections, and 10% of differential connections
# within each group
myData <- rccSim(G = 2, clustSize = c(12, 10), p = 15, n = 100, overlap = 0.50, rho = 0.10)

# View list of simulated data
View(myData)


dilernia/rccm documentation built on Sept. 25, 2022, 9:40 a.m.