CSSCASimulation: Simulate the data according to the CSSCA model

View source: R/CSSCASimulation.R

CSSCASimulationR Documentation

Simulate the data according to the CSSCA model

Description

Simulate the data according to the CSSCA model

Usage

CSSCASimulation(ncluster, memcluster, nblock, ncom, ndistinct, nvar,
  psparse = 0, pnoise = 0, pcombase = 0, pfixzero = 0, meancov,
  pmean)

Arguments

ncluster

the number of clusters that should be simulated

memcluster

A vector indicates the amount of entries in each cluster. The vector should be of length ncluste, with the nth element indicates the amount of entries in the nth cluster. It could also be an integer; in such cases, we assume all clusters have the same amount of entries.

ncom

An integer indicates the number of common components

ndistinct

A vector of length nblock, with the ith element indicates the number of distinctive components assumed for the ith data block. It could also be an integer; in such cases, we assume all blocks have the same amount of distinctive components.

nvar

A vector of length nblock, with the ith element indicates the number of variables assumed for the ith data block. It could also be an integer; in such cases, we assume all blocks have the same amount of variables.

psparse

A number within the range of [0,1] that indicates the sparsity level (i.e. the proportion of zero elements in the loading matrix)

pcombase

A number within the range of [0,1] that indicates the percentage of the "common"(i.e. identical) part in the loading matrices of various clusters. The cluster-specific part would then be (1 - pcombase). It is one of the parameter that controls for the similarities between loading matrices

pfixzero

A number within the range of [0,1] that indicates the percentage of the zero loadings that share the same positions over all clusters. It is one of the parameter that controls for the similarities between loading matrices.

meancov

Possible values: "mean' = only includes mean structure, "cov" = only includes covariance structure and "both" = includes both mean structure and co-variance structure

nblcok

A positive integer indicates the number of blocks (i.e. the number of data sources)

p_noise

A number within the range of [0,1] that indicates the percentage of noise structrue that should be added to the final data.

meanp

A number within the range of [0,1] that indicates the proportion of mean structure

Value

a list of six elements. The first element is a list that includes the generated final data per block; the second element is the concatenated version of the final data (concatenate the block-version data into one single dataset); the third element is the data that involves cluster difference only in co-variance structure (i.e. before adding mean structure and noise stucture) the forth element is a list of cluster-specific score matrices the fifth element is a list of cluster-specific loading matrices the last element is a vector indicates the cluster assignment (the nth element of the vector indicates the cluster assignment of the nth observation)

Examples

   n_cluster <- 3
   mem_cluster <- c(50,50,50) # 50 entries in each cluster
   n_block <- 2
   n_com <- 2
   n_distinct <- c(1,1) #1 distinctive components in each block
   n_var <- c(15,9)
   p_sparse <- 0.5
   p_noise <- 0.3
   p_combase <- 0.5 # moderate similarity
   p_fixzero <- 0.5 # moderate similarity
   mean_v  <- 0.1 # co-variance structrue dominates
 (not run)  CSSCASimulation(n_cluster, mem_cluster, n_block, n_com, n_distinct, n_var, p_sparse,
 p_noise, p_combase, p_fixzero, "both", mean_v)


syuanuvt/CSSCA documentation built on Nov. 28, 2022, 7:58 p.m.