MainCSSCA: The main function of the CSSCA method. A multi-start...

View source: R/MainCSSCA.R

MainCSSCAR Documentation

The main function of the CSSCA method. A multi-start procedure has been used extensively (instead of a simply version of multi-start algorithm that has been used in the function cssca_quick_cpp)

Description

The main function of the CSSCA method. A multi-start procedure has been used extensively (instead of a simply version of multi-start algorithm that has been used in the function cssca_quick_cpp)

Usage

MainCSSCA(all_data, nvar, nblock, ncom, ndistinct, ncluster, nobservations,
  psparse, feed, cutoff.prop = 1/6, n_replace, n_replicate = 3,
  rate = 1/10)

Arguments

all_data

A matrix with concatenated data (the aggregation of the data blocks by rows (entries)). The CSSCA method will be performed on the data.

nvar

A vector of length nblock, with the ith element indicates the number of variables assumed for the ith data block. It could also be an integer; in such cases, we assume all blocks have the same amount of variables.

ncom

An integer indicates the number of ncom components

ndistinct

A vector of length nblock, with the ith element indicates the number of distinctive components assumed for the ith data block. It could also be an integer; in such cases, we assume all blocks have the same amount of distinctive components.

ncluster

the number of clusters that should be simulated

psparse

A number within the range of [0,1] that indicates the psparse level (i.e. the proportion of zero elements in the loading matrix)

feed

A vector (i.e. partition vector) to serve as rational starts (or semi-rational starts)

cutoff.prop

A cutoff value below which

n_replace

the amount of observations that have changed their cluster memberships to create the semi-rational starts

n_replicate

the amount of replicates when the n_replace is fixed (e.g. when n_replace = 1, the algorithm will generate n_replicate semi-rational starts, each of which is generated by randomly change the membership of one of the observation

rate

A number within the range of [0,1] to implicate the retain rate after in the first iterations.

nblcok

A positive integer indicates the number of blocks (i.e. the number of data sources)

nobservation

the number of entries that are included in the dataset

Value

a list of five elements. The first element is vector that indicates the partition of each entry, the nth element refers to the cluster assignment of the nth entry; the second element is a numeric value that is the optimal (minimal) loss function obtained from many starts; the third element is a list that displays cluster-specific loading matrices; the forth element is a list that displays cluster-specific score matrices;

Examples

n_cluster <- 3
mem_cluster <- c(50,50,50) # 50 entries in each cluster
n_obs <- sum(mem_cluster)
n_block <- 2
n_com <- 2
n_distinct <- c(1,1) #1 distinctive components in each block
n_var <- c(15,9)
p_sparse <- 0.5
p_noise <- 0.3
p_combase <- 0.5 # moderate similarity
p_fixzero <- 0.5 # moderate similarity
mean_v  <- 0.1
# extract the data from the simulation
(not run)  sim <- CSSCASimulation(n_cluster, mem_cluster, n_block, n_com, n_distinct, n_var, p_sparse,
 p_noise, p_combase, p_fixzero, "both", mean_v)
 target_data <- sim$concatnated_data
 # feed the data with original cluster assignment and estimate with the CSSCA method
 results <- MainCSSCA(target_data, n_var, n_block, n_com, n_distinct, n_cluster, n_obs, p_sparse, sim$cluster_assign, n_replace = 5)


syuanuvt/CSSCA documentation built on Nov. 28, 2022, 7:58 p.m.