LNMMM: Function for the Clustering Algorithm

View source: R/LNMMM.R

LNMMMR Documentation

Function for the Clustering Algorithm

Description

This function is the paralleled running version of the main clustering algorithm

Usage

LNMMM(
  data,
  run,
  Gmax,
  initial = "kmeans",
  runtime = TRUE,
  threshold,
  verb,
  maxiter = NA,
  nrep = NA,
  niter = NA,
  sim = FALSE
)

Arguments

data

Input data here. If sim==TRUE, data should be a list of multiple datasets, with each dataset as a list of counts W and true_lab (true class label). If no true label, set true_lab as NAs. If not simulation, data should be as the same format as described for each dataset of the simulation.

run

Only specify when sim==FALSE. When sim==TRUE, automatically becomes the number of datasets contained in the simulation dataset list.

Gmax

Input the maximum of number of component wants to fit.

initial

Specify method for initializing z_ig. Possible values could be "kmeans", "random", "small_EM". Default is "kmeans".

runtime

Logical variable, if outputting the running time of the whole procedure or not.

threshold

Threshold for the Atiken's stopping creterion for convergence.

verb

Logical variable, if the key steps of the algortihm and approximated loglikelihood for each iteration are printed.

maxiter

Maximum number of iteration. If specified, algorithm will stop by either below the threshold or maxiter reached. If not specified, algorithm will only be monitored by convergence criterion.

nrep

Default is NA. Only needed if "small_EM" is specified for initial. Number of random starts for the small EM initialization.

niter

Default is NA. Only needed if "small_EM" is specified for initial. Number of iterations for each random start in the small EM initialization.

sim

Indicator of whether this is simulated data. Simulated data input must as a list of multiple datasets (indexed by "run"), with each dataset must be a list of W and true_lab. Default is FALSE.

Value

A list contains the results for all datasets (runs) and all number of components (from 1 to Gmax). Results include the BIC, ICL, ARI if true labels are not NAs, run time in seconds if runtime==TRUE, as matrices, and the best select G by BIC/ICL for each dataset together with the corresponding ARIs. Results inherited from LNM.clust for each G and each data set are also stored.

Examples

# generate data using Data.temp <- generate_data(G = 2, num_observation = c(50,50), K = 2, true_mu = list(c(0,1,0),c(-2,-5,0)),true_Sig=list(rbind(cbind(diag(1,2),0),0),rbind(cbind(diag(1,2),0),0)), seed.no = 1234, M = 10000, truelab = TRUE)

LNMMM(data=Data.temp,run=1,Gmax=5,initial="kmeans",runtime=TRUE,threshold=1e-4,verb=TRUE,sim=FALSE)

yuanfang90/LNMVGA documentation built on Jan. 29, 2024, 8:24 a.m.