LNM.clust | R Documentation |
This function is part of the main clustering function for the proposed algorithm. It fit the proposed LNM_MM model for a specific number of component G_act. If one wants to fit G for 1:5, use the paralleled function or put this function into a loop for G.
LNM.clust(
data,
run,
G_act,
initial = "kmeans",
runtime = TRUE,
threshold,
verb = FALSE,
maxiter = NA,
nrep = NA,
niter = NA,
sim = FALSE
)
data |
Input data here. If sim==TRUE, data should be a list of multiple datasets (indexed by "run"), with each dataset as a list of counts W and true_lab (true class label). If no true label, set true_lab as NAs. If not simulation, data should be as the same format as described for each dataset of the simulation. |
run |
Keep track of run number of datasets. For simulation this could be the index of the simulated data; for other cases, could run several times too with random initialization to pick the highest BIC/ICL. If only want to run 1 time for 1 dataset, specify run=1. |
G_act |
Input the current actual running number of parameter. |
initial |
Specify method for initializing z_ig. Possible values could be "kmeans", "random", "small_EM". Default is "kmeans". |
runtime |
Logical variable, if outputting the running time of the whole procedure or not. |
threshold |
Threshold for the Atiken's stopping creterion for convergence. |
verb |
Logical variable, if the key steps of the algortihm and approximated loglikelihood for each iteration are printed. |
maxiter |
Maximum number of iteration. If specified, algorithm will stop by either below the threshold or maxiter reached. If not specified, algorithm will only be monitored by convergence criterion. |
nrep |
Default is NA. Only needed if "small_EM" is specified for initial. Number of random starts for the small EM initialization. |
niter |
Default is NA. Only needed if "small_EM" is specified for initial. Number of iterations for each random start in the small EM initialization. |
sim |
Indicator of whether this is simulated data. Simulated data input must as a list of multiple datasets (indexed by "run"), with each dataset must be a list of W and true_lab. Default is FALSE. |
A list contains the parameters when the algorithm converges. pi_g = estimated class size composition; z = soft class membership posterior probability; mu = estimated mean parameter for the latent variable; Sigma = estimated variance paprameter for the latent variable. Others are internal parameters that could used to check model fit and to pass to overall algorithm for model selection.
# generate data using Data.temp <- generate_data(G = 2, num_observation = c(50,50), K = 2, true_mu = list(c(0,1,0),c(-2,-5,0)),true_Sig=list(rbind(cbind(diag(1,2),0),0),rbind(cbind(diag(1,2),0),0)), seed.no = 1234, M = 10000, truelab = TRUE)
LNM.clust(data=Data.temp,run=1,G_act=2,initial="small_EM",runtime=TRUE,threshold=1e-4,verb=TRUE,nrep=30,niter=50,sim=FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.