LNM.clust: Function for the Clustering Algorithm with specific number of...

View source: R/LNM.clust.R

LNM.clustR Documentation

Function for the Clustering Algorithm with specific number of component G_act

Description

This function is part of the main clustering function for the proposed algorithm. It fit the proposed LNM_MM model for a specific number of component G_act. If one wants to fit G for 1:5, use the paralleled function or put this function into a loop for G.

Usage

LNM.clust(
  data,
  run,
  G_act,
  initial = "kmeans",
  runtime = TRUE,
  threshold,
  verb = FALSE,
  maxiter = NA,
  nrep = NA,
  niter = NA,
  sim = FALSE
)

Arguments

data

Input data here. If sim==TRUE, data should be a list of multiple datasets (indexed by "run"), with each dataset as a list of counts W and true_lab (true class label). If no true label, set true_lab as NAs. If not simulation, data should be as the same format as described for each dataset of the simulation.

run

Keep track of run number of datasets. For simulation this could be the index of the simulated data; for other cases, could run several times too with random initialization to pick the highest BIC/ICL. If only want to run 1 time for 1 dataset, specify run=1.

G_act

Input the current actual running number of parameter.

initial

Specify method for initializing z_ig. Possible values could be "kmeans", "random", "small_EM". Default is "kmeans".

runtime

Logical variable, if outputting the running time of the whole procedure or not.

threshold

Threshold for the Atiken's stopping creterion for convergence.

verb

Logical variable, if the key steps of the algortihm and approximated loglikelihood for each iteration are printed.

maxiter

Maximum number of iteration. If specified, algorithm will stop by either below the threshold or maxiter reached. If not specified, algorithm will only be monitored by convergence criterion.

nrep

Default is NA. Only needed if "small_EM" is specified for initial. Number of random starts for the small EM initialization.

niter

Default is NA. Only needed if "small_EM" is specified for initial. Number of iterations for each random start in the small EM initialization.

sim

Indicator of whether this is simulated data. Simulated data input must as a list of multiple datasets (indexed by "run"), with each dataset must be a list of W and true_lab. Default is FALSE.

Value

A list contains the parameters when the algorithm converges. pi_g = estimated class size composition; z = soft class membership posterior probability; mu = estimated mean parameter for the latent variable; Sigma = estimated variance paprameter for the latent variable. Others are internal parameters that could used to check model fit and to pass to overall algorithm for model selection.

Examples

# generate data using Data.temp <- generate_data(G = 2, num_observation = c(50,50), K = 2, true_mu = list(c(0,1,0),c(-2,-5,0)),true_Sig=list(rbind(cbind(diag(1,2),0),0),rbind(cbind(diag(1,2),0),0)), seed.no = 1234, M = 10000, truelab = TRUE)

LNM.clust(data=Data.temp,run=1,G_act=2,initial="small_EM",runtime=TRUE,threshold=1e-4,verb=TRUE,nrep=30,niter=50,sim=FALSE)

yuanfang90/LNMVGA documentation built on Jan. 29, 2024, 8:24 a.m.