TMC: Compare model selection criteria

Description Usage Arguments Details Value Author(s) References Examples

Description

Computes convex combinations of model selection criteria. The function is very customizable, allowing the user to specify what type of model is to be tested, which criteria are to be used, and many other options described below.

Usage

1
TMC(num.Iter = 50, data.Size = 100, make.Data = gen.Data, make.Params = gen.Params, model.List, weight.Vector = rep(1, times = length(model.List)), msc.List, fit.Model = fit.Models, stepSize = 0.05, sumstats = list("Median Rank" = median), huge = FALSE, var.Frame = data.frame(), par.Sigma = 1, data.Sigma = 1, barebones = FALSE, allow.Negs = FALSE, thresholds = c(1, 2, 3, 5, 10), test.Size = 0, scale.Frame = TRUE, use.Ranks = TRUE, ...)

Arguments

num.Iter

The number of iterations. This will be the total number of times that the entire loop described in the Details section will be executed.

data.Size

For time-series (and possibly other extended types), the size of each simulated data set.

make.Data

The name (not quoted) of a function used to simulate data. Must take the results of make.Params as only argument. A sensible default is gen.Data for time series and regression.

make.Params

The name (not quoted) of a function used to simulate parameters. Must take a single model as its only argument. A sensible default is gen.Params for time series and regression.

model.List

A list of candidate models. The true model will be chosen from this list in each iteration, and the MSC values of every model in this list will then be calculated, from which the rank of the true model is computed. Utility functions for constructing such model lists are make.Model.List.Reg and make.Model.List.TS.

weight.Vector

A numeric vector, the same length as model.List, of the weights (probabilities) of each model. Used to choose the true model at each iteration. Need not be scaled, but must be nonnegative. To construct a vector of weights for individual models based on a prior distribution on the number of terms (or complexity) of the underlying model, use weightsGivenSize. Another possible utility function, which weights models of only a specified size, is weight.Only.N.

msc.List

A list of model selection criterion functions. The length must be more than 1, but should not be much larger than 3 to avoid computational overflow. The recommended number of MSCs is 3. Each function must take a fitted model object (produced by fit.Model) as its only argument. Commonly used functions include AIC, BIC, and for time series models, holdout.Mean for mean absolute deviation on a holdout sample, and holdout.Med for the median absolute deviation. This list, however, is by no means exhaustive and new MSC functions can easily be written – see details below.

fit.Model

The function used to fit the models defined by model.List. Whenever possible, we recommend that this be a built-in R function, e.g., lm or arima.

stepSize

The mesh of the grid of convex combinations. Bear in mind the number of convex combinations will be roughly proportional to (1/stepSize)^length(msc.List), so don't make stepSize too small, especially if msc.List is longer than 3!

sumstats

The summary functions of the distributions of ranks. Used for graphical displays of the final msc object. Note that the average and also all the summary functions generated by thresholds (see below) are automatically included in the final object, so there is no need to put them in this list.

huge

Required to be set to TRUE if the matrix of convex combinations will be larger than roughly 500000. To avoid unexpectedly long calculations.

var.Frame

For models with covariates, this should be the data.frame containing them. For other models, it is ignored.

par.Sigma

This argument may be passed to make.Params, and is the standard deviation used in gen.Params.lmFormula, for example.

data.Sigma

An optional argument to be passed to make.Data.

barebones

For large computations, we recommend this be set to TRUE. It will throw away the individual ranks at each iteration, updating only the summary functions, in order to reduce space requirements. If barebones is TRUE, summary.Functions is restricted to pre-defined functions which can be updated dynamically, such as mean, and cannot include functions which require the whole sample, such as median.

allow.Negs

If TRUE, the matrix of convex combinations will be expanded to include linear combinations with negative weights. Greatly increases computation, and is rarely helpful.

thresholds

Must be a numeric vector. Included as a simple way to generate summary functions — for each element k of this vector, the summary function P(Rank > k) will be computed and included in the final object. Note that if barebones is set to TRUE, the elements of thresholds are the ONLY summary functions the user can specify (this must be enforced to ensure that the barebones routine does not need to keep track of all the ranks from individual iterations, but instead can retain only the updated summary function values.

test.Size

The size of the subset of each sample to be used as a holdout sample. Ordinarily, this is set to 0, but for certain MSCs, namely those whose names begin with "holdout", it needs to be set to a nonzero number to be useful. A common rule of thumb is to set the size to be roughly ten percent of the total sample size. Note, however, that whenever this argument is nonzero, the function will slow down considerably, since it is then forced to fit all models twice (once with the full sample, once with only the "training" sample, without the holdout sample included.)

scale.Frame

Logical indicating whether var.Frame should be scaled first. If true, each column will be centered by its mean and divided by its standard deviation.

use.Ranks

Logical. If TRUE, then in each iteration, the msc values for each criterion will be scaled by taking ranks. If FALSE, then they will be scaled by standardizing instead.

...

Other arguments to be passed to other functions.

Details

The basic algorithm is as follows:

  1. Choose a true model from model.List, by simulating a random entry using the weights (if present) given in weight.Vector.

  2. Simulate parameters for that model by calling make.Params with argument true.Model as given above.

  3. Simulated data from true.Model with params given above by calling function make.Data.

  4. Fit all models in model.List to the simulated data set.

  5. Calculated the model selection criteria in msc.List to each fitted model, and take the ranks of these values (within each individual MSC.)

  6. For each convex combination in the grid implicitly defined by stepSize, calculated the convex combination of ranks of the different MSCs for each model in model.List

  7. Among these values, calculate the rank of true.Model.

After these steps have been iterated num.Iter times, the summary functions specified in sumstats, as well as the average and threshold functions defined by thresholds, are computed for each convex combination.

New model selection functions, or additional methods for existing ones, can easily be written. The object initially passed to each such function will be of class "fmo", a class used internally in TMC. An fmo object will contain at least the components

full

the fitted model object resulting from applying fit.Model to the full data set generated by gen.Data

train

the fitted model object resulting from applying fit.Model to only the training part of the data set (that is, the data set less any observations held out for msc functions involving a holdout sample.) If test.Size = 0, this is NULL.

test.Frame

the matrix of covariates associated with the holdout sample, if any. If test.Size = 0, this is NULL.

test.Vector

the actual vector of observations held out as a test sample. If test.Size = 0, this is NULL.

S2

an unbiased estimate of residual variance in regression models, included only for convenience in calculating Cp to avoid recalculating for every criterion.

Thus, to write a new model selection criterion function, one should create a generic function with a method for class "fmo", and further methods for whatever classes of model objects for which one can actually compute the criterion directly. The method for class "fmo" is typically very simple, and usually involves calling another method of the same function on some part of the fmo object itself, typically the full component for ordinary model selection criteria or the train component for criteria involving a holdout sample. For example, see PRESS.

gen.Data, gen.Params, and fit.Models are intended to be sensible defaults, but they certainly need not be the only functions one uses for simulating parameters, data, and fitting models. New methods can easily be written for all three such functions. It is recommended that, to do this, one creates a new class, create a list of model specifications (e.g., model formulae or order specifications) of this new class, and then write methods for gen.Params, etc. for this new class.

Value

An object of class msc, or an object of class barebones, which inherits from msc, if barebones is TRUE. Contains the following components:

call

The matched call

Sum.Stats

A data.frame, with each row representing a convex combination of MSCs. The first 3 columns give the weights corresponding to the combination, and the remaining columns give the values of all summary statistics corresponding to the combination.

var.Frame

For models containing covariates, a data.frame containing them.

error.Iterations

Iteration numbers in which the attempt to fit the true model to the simulated data set resulted in an error, thus making it impossible to compute a rank.

num.Errors

The length of error.Iterations.

time.Taken

The total length of time to complete the call.

simulated.Models

The formula corresponding to the true model chosen in each iteration.

simulation.Attempts

The number of attempts needed, during each iteration, to simulate data successfully. Mainly useful for diagnostic purposes when simulation of time series results in non-stationary data.

In addition, if barebones is FALSE, the following components will also be included:

ranks.Mat

A matrix containing the ranks corresponding to each combination for every iteration. One can use this, for example, to calculate the values of new summary functions.

simulated.Data

A list of data vectors simulated at each iteration.

simulated.Parameters

A list of vectors containing the simulated parameters from each iteration.

simulated.Models

A list of the actual models chosen (from the prior given by weight.Vector). Each will be an element of model.List.

Plus several other components which are taken directly from the call, for convenience in later processing.

Author(s)

Andrew K. Smith

References

A more complete description of the algorithm used, as well as a discussion of its properties and illustrations of its potential utility, can be found at http://www.isye.gatech.edu/~asmith/combmsc.pdf.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Regression example
vars <- rnorm(60)
dim(vars)<- c(20,3)
vars <- data.frame(vars)

result <- TMC(num.Iter = 3, model.List = make.Model.List.Reg(vars), msc.List = list(BIC, AIC, PRESS), var.Frame = vars)

# Time Series Example
modList <- make.Model.List.TS(c(1,0,1,0,0,1))

result2 <- TMC(num.Iter = 3,model.List = modList, msc.List = list(BIC, holdout.Mean,
AIC), test.Size = 10)

CombMSC documentation built on May 2, 2019, 2:32 p.m.