Description Usage Arguments Details Value Author(s) References See Also Examples
Fits mixtures of multivariate modified t-factor analyzers via the alternating expectation-conditional maximization algorithm to the given data under a clustering (default) or classification paradigm (by giving either training index or percentage of data taken to be known) in serial or parallel.
1 2 3 |
x |
A numeric matrix or data frame. |
Gs |
An integer or integer vector indicating the number of groups to fit. Default is 1-4. |
Qs |
An integer or integer vector indicating the number of factors to fit. Default is 1-2. |
clas |
Integer between 0–100 giving the percentage of data taken to be known. Simulates a classification scenario. Additional options to be provided in future updates... |
init |
A list of initializing classification of the form that |
scale |
Logical indicating whether or not the function should scale the data. Default is |
models |
A character string or character vector giving the models to fit. See details for instructions on choices. |
dfstart |
The initialized value for the degrees of freedom. The default is 50. |
dfupdate |
Character string ( |
known |
A vector of known classifications that can be numeric or character - optional for clustering, necessary for classification. Must be the same length as the number of rows in the data set. |
gauss |
Logical indicating if the algorithm should use the gaussian distribution. Currently equivalent to setting |
eps |
Tolerance value for the convergence criterion for the AECM algorithm. |
parallel.cores |
Logical or integer specifying number of computing cores to utilize for coarse-grain parallelization of the algorithm. If |
Model specification (via the models
argument) follows nomenclature developed from the factor analyzer decomposition of the covariance matrix. The nomenclature refers to the decomposition and constraints on the covariance matrix:
Σ_g = Λ_g Λ_g' + ω_g δ_g
The first letter can be a "C"
(constrained across groups) or "U"
(unconstrained) and that refers to setting Λ_g = Λ or not, respectively. The second letter has the same choices, resulting in ω_g = ω or not. The third letter is permitted as a "C"
, "U"
, or "I"
(constrained to be the identity matrix), applying those constraints to δ_g. The fourth, and final, letter refers to the degrees of freedom, and again is permitted "C"
or "U"
.
As many models as desired can be selected and ran via the vector supplied to models
. The complete list of possible names is: "UUUU", "UUUC", "UCCU","UCCC", "UUIU", "UUIC", "UCIU", "UCIC", "CUUU", "CUUC", "CCCU", "CCCC", "CUIU", "CUIC", "CCIU", "CCIC", "CUCU", "CUCC", "UUCU", "UUCC", "UCUU", "UCUC", "CCUU", "CCUC".
More commonly, subsets can be called by the following character strings:
"all"
runs all 24 MMtFA models (default),
"dfunconstrained"
runs the 12 unconstrained degrees of freedom models,
"dfconstrained"
runs the 12 constrained degrees of freedom models,
Also note that for G=1, several models are equivalent (for example, UUUU and CCCC). Thus, for G=1 only one model from each set of equivalent models will be run.
x |
Data used for clustering/classification. |
classification |
Vector of group classifications as determined by the BIC. |
bic |
BIC of the best fitted model. |
modelname |
Name of the best model according to the BIC. |
allbic |
Matrix of BIC values according to model and G. A value of -Inf is returned when the model did not converge. |
bestmodel |
Character string giving best model (BIC) details. |
G |
Value corresponding to the number of components chosen by the BIC. |
tab |
Classification table for BIC-selected model (only available when |
fuzzy |
The fuzzy clustering matrix for the model selected by the BIC. |
logl |
The log-likelihood corresponding to the model with the best BIC. |
iter |
The number of iterations until convergence for the model selected by the BIC. |
parameters |
List containing the fitted parameters: |
iclresults |
List containing all the previous outputs, except |
Jeffrey L. Andrews, Paul D. McNicholas, and Mathieu Chalifour
Andrews JL and McNicholas PD (2011a), 'Extending mixtures of multivariate t-factor analyzers'. Statistics and Computing 21(3), 361–373.
Andrews JL and McNicholas PD (2011b), 'Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis'. Journal of Statistical Planning and Inference 141(4), 1479–1486.
See package manual MMtFA
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ###Note that only one model is run for each example
###in order to reduce computation time
#Clustering iris data with hard random start
tirisr <- mmtfa(iris[,-5], models="UUUU", Gs=1:3, Qs=1, init="hard")
#Clustering iris data with hierarchical starting values
initial_list <- list()
clustree <- hclust(dist(iris[,-5]))
for(i in 1:3){
initial_list[[i]] <- cutree(clustree,i)
}
tirish <- mmtfa(iris[,-5], models="CUCU", Gs=1:3, Qs=1, init=initial_list)
#Classification with the iris data set via percentage of data taken to have known membership
tirisc <- mmtfa(iris[,-5], Qs=1, models="CUIU", init="uniform",clas=50, known=iris[,5])
tirisc$tab
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.