vsccmanly: Variable Selection for Skewed Clustering and Classification
In vscc: Variable Selection for Clustering and Classification

View source: R/vsccmanly.R

vsccmanly

R Documentation

Variable Selection for Skewed Clustering and Classification

Description

Performs variable selection under a clustering framework. Accounts for mixtures of non-Gaussian distributions via the ManlyTransform (via 'ManlyMix').

Usage

vsccmanly(x, G=2:9, numstart=100, selection="backward",forcereduction=FALSE,
                     initstart="k-means", seedval=2354)

Arguments

`x`	Data frame or matrix to perform variable selection on
`G`	Vector for the number of groups to consider during initialization and/or post-selection analysis. Default is 2-9.
`numstart`	Number of random starts.
`selection`	Forward or backward transformation parameter selection. User may also choose to fit a full Manly mixture (options are 'forward', 'backward', or 'none').
`forcereduction`	Logical indicating if the full data set should be considered (FALSE) when selecting the ‘best’ variable subset via total model uncertainty.
`initstart`	Method for initial starting values (options are 'k-means' or 'hierarchical').
`seedval`	Value of seed, used for k-means initialization.

Value

`selected`	A list containing the subsets of variables selected for each relation. Each set is numbered according to the number in the exponential of the relationship. For instance, `vscc_object$selected[[3]]` corresponds to the variable subset selected by the cubic relationship.
`wss`	The within-group variance associated with each variable from the full data set.
`topselected`	The best variable subset according to the total model uncertainty.
`initialrun`	Results from the initial model, prior to variable selection; an object of class `ManlyMix`.
`bestmodel`	Results from the best model on the selected variable subset; an object of class `ManlyMix`.
`variables`	Variables used to fit the final model.
`chosenrelation`	Numeric indication of the relationship chosen according to total model uncertainty. The number corresponds to exponent in the relationship: for instance, a value of '4' suggests the quartic relationship. If the value `"Full dataset"` is given, then the unreduced data provides the best model uncertainty; can be avoided by specifying `forcereduction=TRUE` in the function call.
`uncertainty`	Total model uncertainty associated with the best relationship.
`allmodelfit`	List containing the results (`ManlyMix` objects) from the post-selection analysis on each variable subset. Number corresponds to the exponent in the relationship. For instance, `vscc_object$allmodelfit[[1]]` gives the results from the analysis on the variables selected by the linear relationship.

Author(s)

Jeffrey L. Andrews, Mackenzie R. Neal, Paul D. McNicholas

References

See citation("vscc") for the variable selection references.

Examples

## Not run: 
data(ais)
X=ais[,3:13]
aisfor=vsccmanly(as.data.frame(scale(X)),G=2:9,selection = "forward", forcereduction = TRUE,
                        initstart = "k-means",seedval=2354) 
aisfor$variables #Show selected variables
table(ais[,1], aisfor$bestmodel$id) #Clustering results on reduced data set

## End(Not run)

vscc documentation built on Oct. 18, 2023, 1:16 a.m.