vsccmanly: Variable Selection for Skewed Clustering and Classification

View source: R/vsccmanly.R

vsccmanlyR Documentation

Variable Selection for Skewed Clustering and Classification

Description

Performs variable selection under a clustering framework. Accounts for mixtures of non-Gaussian distributions via the ManlyTransform (via 'ManlyMix').

Usage

vsccmanly(x, G=2:9, numstart=100, selection="backward",forcereduction=FALSE,
                     initstart="k-means", seedval=2354)

Arguments

x

Data frame or matrix to perform variable selection on

G

Vector for the number of groups to consider during initialization and/or post-selection analysis. Default is 2-9.

numstart

Number of random starts.

selection

Forward or backward transformation parameter selection. User may also choose to fit a full Manly mixture (options are 'forward', 'backward', or 'none').

forcereduction

Logical indicating if the full data set should be considered (FALSE) when selecting the ‘best’ variable subset via total model uncertainty.

initstart

Method for initial starting values (options are 'k-means' or 'hierarchical').

seedval

Value of seed, used for k-means initialization.

Value

selected

A list containing the subsets of variables selected for each relation. Each set is numbered according to the number in the exponential of the relationship. For instance, vscc_object$selected[[3]] corresponds to the variable subset selected by the cubic relationship.

wss

The within-group variance associated with each variable from the full data set.

topselected

The best variable subset according to the total model uncertainty.

initialrun

Results from the initial model, prior to variable selection; an object of class ManlyMix.

bestmodel

Results from the best model on the selected variable subset; an object of class ManlyMix.

variables

Variables used to fit the final model.

chosenrelation

Numeric indication of the relationship chosen according to total model uncertainty. The number corresponds to exponent in the relationship: for instance, a value of '4' suggests the quartic relationship. If the value "Full dataset" is given, then the unreduced data provides the best model uncertainty; can be avoided by specifying forcereduction=TRUE in the function call.

uncertainty

Total model uncertainty associated with the best relationship.

allmodelfit

List containing the results (ManlyMix objects) from the post-selection analysis on each variable subset. Number corresponds to the exponent in the relationship. For instance, vscc_object$allmodelfit[[1]] gives the results from the analysis on the variables selected by the linear relationship.

Author(s)

Jeffrey L. Andrews, Mackenzie R. Neal, Paul D. McNicholas

References

See citation("vscc") for the variable selection references.

See Also

vscc

Examples

## Not run: 
data(ais)
X=ais[,3:13]
aisfor=vsccmanly(as.data.frame(scale(X)),G=2:9,selection = "forward", forcereduction = TRUE,
                        initstart = "k-means",seedval=2354) 
aisfor$variables #Show selected variables
table(ais[,1], aisfor$bestmodel$id) #Clustering results on reduced data set

## End(Not run)

vscc documentation built on Oct. 18, 2023, 1:16 a.m.