rccSim | R Documentation |
This function simulates data based on the Random Covariance Clustering Model (RCCM). Data is generated in a hierarchical manner, beginning with group-level networks and precision matrices and then subject-level networks and matrices.
rccSim( G = 2, clustSize = c(67, 37), p = 10, n = 177, overlap = 0.5, rho = 0.1, esd = 0.05, type = "hub", eprob = 0.5 )
G |
Positive integer. Number of groups or clusters. |
clustSize |
Positive integer or vector of positive integers. Number of subjects in each cluster. |
p |
Positive integer. Number of variables for each subject. |
n |
Positive integer. Number of observations for each subject on each variable. |
overlap |
Positive number between 0 and 1. Approximate proportion of overlapping edges across cluster-level networks. |
rho |
Positive number between 0 and 1. Approximate proportion of differential edges for subjects compared to their corresponding cluster-level network. |
esd |
Standard deviation of mean 0 noise added to generated subject-level matrices for variation from the corresponding group-level matrix. |
type |
Graph type. Options are "hub" or "random". |
eprob |
Probability of two nodes having an edge between them. Only applicable if type = "random". |
For simulating data for hub type graphs, G cluster-level networks are first generated, each with floor(√ p) hubs and thus E = p - floor(√ p) edges. For generating random graphs, G cluster-level networks are generating such that nodes are connected with a probability specified by eprob, yielding approximately E= (p choose 2) x eprob edges. Cluster-level networks are forced to share s = floor(overlap x E) edges. Note that overlap represents the approximate proportion of edges that are common across the cluster-level networks.
Then, for the K subject-level matrices, we first randomly assign them to the G clusters, and then subject-level networks are generated by randomly selecting floor(rho x E) node pairs to add or remove an edge from their corresponding cluster-level network. Non-zero entries for all precision matrices are generated from a uniform distribution with support on the interval [-1, -0.50] U [0.50, 1], and are adjusted until positive definite matrices are obtained.
A list of length 5 containing:
list of K multivariate-Gaussian data sets each of dimension n_k x p (simDat).
p x p x G array of G number of cluster-level networks (g0s).
p x p x G array of G number of cluster-level precision matrices (Omega0s).
p x p x K array of K number of subject-level precision matrices (Omegaks).
vector of length K containing cluster memberships for each subject (zgks).
Andrew DiLernia
# Generate data with 2 clusters with 12 and 10 subjects respectively, # 15 variables for each subject, 100 observations for each variable for each subject, # the groups sharing about 50% of network connections, and 10% of differential connections # within each group myData <- rccSim(G = 2, clustSize = c(12, 10), p = 15, n = 100, overlap = 0.50, rho = 0.10) # View list of simulated data View(myData)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.