MultiCHull: Convex-Hull-Based Model Selection for multiple Samples

View source: R/MultiCHull.R

MultiCHullR Documentation

Convex-Hull-Based Model Selection for multiple Samples

Description

Applying the CHull function on multiple samples of fit values at once, such as bootstrap samples.

Usage

MultiCHull(data, bound = "lower", PercentageFit = 0.01)

## S3 method for class 'MultiCHull'
plot(x, col = NULL, pch = NULL, whichticks = NULL, las = 2, ...)

## S3 method for class 'MultiCHull'
print(x, ...)

## S3 method for class 'MultiCHull'
summary(object, ...)

Arguments

data

Dataframe with complexity in 1st column and fit measures in next columns

bound

Boundary of convex hull to inspect: upper or lower

PercentageFit

Required proportion of increase in fit of a more complex model

x

An object of the type produced by MultiCHull

col

Vector of colors used for plots

pch

Vector of pch symbols

whichticks

Model names of ticks that should be displayed

las

Orientation of tick mark labels

...

Additional arguments

object

An object of the type produced by MultiCHull

Value

st

Dataframe with scree test values

tab

Table which indicates the selected model in each sample

frq

Table which indicates how often each model is selected

Origdata

Original dataframe

Bound

Boundary of convex hull that was requested

PercentageFit

Requested proportion of increase in fit of a more complex model

Details

MultiCHull function

MultiCHull applies the CHull code on multiple samples of fit values. To this end, the input parameter data consists of a dataframe with complexity values in the first column and fit values in the next columns. The different samples can for example be bootstrap samples, or fit values obtained with different random starts, or from different fit measures, etc. It is possible that in some samples no optimal solution can be found. This will generate a warning, which will include the sample number.

Data frame st contains per sample the scree test values of the solutions that were found on the upper or lower bound of the hull (see also CHull). In each sample, the least and most complex model receive a 0 value. The other models have an NA value. tab is also a dataframe, which indicates per sample the top three of optimal models (indicated by a 1, 2 and 3). The other models have an NA value. Finally, in frq the frequencies are shown for each model of being selected as the optimal model.

Plot function

Applying the method plot() on output of MultiCHull yields a plot with the models on the x-axis, ordered by increasing complexity. By default, all model names are shown as perpendicular labels on the x-axis, but one can choose to display specific model names only (e.g., whichticks=c("model13","model20")). The tick mark labels can be made horizontal, by putting parameter las to 0.

Solid lines (only shown in case of 20 or less samples) indicate the scree test values per sample, and symbols indicate the top three of the models per sample. The symbols can be adjusted with the parameter pch and the colors with col. The model (or multiple models) that is selected most often across samples, is indicated with a horizontal line.

See Also

CHull

Examples

data <- cbind(c(305,456,460,607,612,615,758,764,768,770,909,916,921,924),
c(152,89,79,71,57,57,64,49,47,47,60,41,39,39))
test <- array(rnorm(14*20,sd=2.5),c(14,20))
for (i in 1:20){
  data <- cbind(data,data[,2]+test[,i])
}

output <- MultiCHull(data)
summary(output)
plot(output)


multichull documentation built on Oct. 26, 2023, 5:08 p.m.