LNM.clust: Function for the Clustering Algorithm with specific number of...

View source: R/LNM.clust.R

LNM.clustR Documentation

Function for the Clustering Algorithm with specific number of component G_act


This function is part of the main clustering function for the proposed algorithm. It fit the proposed LNM_MM model for a specific number of component G_act. If one wants to fit G for 1:5, use the paralleled function or put this function into a loop for G.


  initial = "kmeans",
  runtime = TRUE,
  verb = FALSE,
  maxiter = NA,
  nrep = NA,
  niter = NA,
  sim = FALSE



Input data here. If sim==TRUE, data should be a list of multiple datasets (indexed by "run"), with each dataset as a list of counts W and true_lab (true class label). If no true label, set true_lab as NAs. If not simulation, data should be as the same format as described for each dataset of the simulation.


Keep track of run number of datasets. For simulation this could be the index of the simulated data; for other cases, could run several times too with random initialization to pick the highest BIC/ICL. If only want to run 1 time for 1 dataset, specify run=1.


Input the current actual running number of parameter.


Specify method for initializing z_ig. Possible values could be "kmeans", "random", "small_EM". Default is "kmeans".


Logical variable, if outputting the running time of the whole procedure or not.


Threshold for the Atiken's stopping creterion for convergence.


Logical variable, if the key steps of the algortihm and approximated loglikelihood for each iteration are printed.


Maximum number of iteration. If specified, algorithm will stop by either below the threshold or maxiter reached. If not specified, algorithm will only be monitored by convergence criterion.


Default is NA. Only needed if "small_EM" is specified for initial. Number of random starts for the small EM initialization.


Default is NA. Only needed if "small_EM" is specified for initial. Number of iterations for each random start in the small EM initialization.


Indicator of whether this is simulated data. Simulated data input must as a list of multiple datasets (indexed by "run"), with each dataset must be a list of W and true_lab. Default is FALSE.


A list contains the parameters when the algorithm converges. pi_g = estimated class size composition; z = soft class membership posterior probability; mu = estimated mean parameter for the latent variable; Sigma = estimated variance paprameter for the latent variable. Others are internal parameters that could used to check model fit and to pass to overall algorithm for model selection.


# generate data using Data.temp <- generate_data(G = 2, num_observation = c(50,50), K = 2, true_mu = list(c(0,1,0),c(-2,-5,0)),true_Sig=list(rbind(cbind(diag(1,2),0),0),rbind(cbind(diag(1,2),0),0)), seed.no = 1234, M = 10000, truelab = TRUE)


