TMC: Compare model selection criteria

Description Usage Arguments Details Value Author(s) References Examples


Computes convex combinations of model selection criteria. The function is very customizable, allowing the user to specify what type of model is to be tested, which criteria are to be used, and many other options described below.


TMC(num.Iter = 50, data.Size = 100, make.Data = gen.Data, make.Params = gen.Params, model.List, weight.Vector = rep(1, times = length(model.List)), msc.List, fit.Model = fit.Models, stepSize = 0.05, sumstats = list("Median Rank" = median), huge = FALSE, var.Frame = data.frame(), par.Sigma = 1, data.Sigma = 1, barebones = FALSE, allow.Negs = FALSE, thresholds = c(1, 2, 3, 5, 10), test.Size = 0, scale.Frame = TRUE, use.Ranks = TRUE, ...)



The number of iterations. This will be the total number of times that the entire loop described in the Details section will be executed.


For time-series (and possibly other extended types), the size of each simulated data set.


The name (not quoted) of a function used to simulate data. Must take the results of make.Params as only argument. A sensible default is gen.Data for time series and regression.


The name (not quoted) of a function used to simulate parameters. Must take a single model as its only argument. A sensible default is gen.Params for time series and regression.


A list of candidate models. The true model will be chosen from this list in each iteration, and the MSC values of every model in this list will then be calculated, from which the rank of the true model is computed. Utility functions for constructing such model lists are make.Model.List.Reg and make.Model.List.TS.


A numeric vector, the same length as model.List, of the weights (probabilities) of each model. Used to choose the true model at each iteration. Need not be scaled, but must be nonnegative. To construct a vector of weights for individual models based on a prior distribution on the number of terms (or complexity) of the underlying model, use weightsGivenSize. Another possible utility function, which weights models of only a specified size, is weight.Only.N.


A list of model selection criterion functions. The length must be more than 1, but should not be much larger than 3 to avoid computational overflow. The recommended number of MSCs is 3. Each function must take a fitted model object (produced by fit.Model) as its only argument. Commonly used functions include AIC, BIC, and for time series models, holdout.Mean for mean absolute deviation on a holdout sample, and holdout.Med for the median absolute deviation. This list, however, is by no means exhaustive and new MSC functions can easily be written – see details below.


The function used to fit the models defined by model.List. Whenever possible, we recommend that this be a built-in R function, e.g., lm or arima.


The mesh of the grid of convex combinations. Bear in mind the number of convex combinations will be roughly proportional to (1/stepSize)^length(msc.List), so don't make stepSize too small, especially if msc.List is longer than 3!


The summary functions of the distributions of ranks. Used for graphical displays of the final msc object. Note that the average and also all the summary functions generated by thresholds (see below) are automatically included in the final object, so there is no need to put them in this list.


Required to be set to TRUE if the matrix of convex combinations will be larger than roughly 500000. To avoid unexpectedly long calculations.


For models with covariates, this should be the data.frame containing them. For other models, it is ignored.


This argument may be passed to make.Params, and is the standard deviation used in gen.Params.lmFormula, for example.


An optional argument to be passed to make.Data.


For large computations, we recommend this be set to TRUE. It will throw away the individual ranks at each iteration, updating only the summary functions, in order to reduce space requirements. If barebones is TRUE, summary.Functions is restricted to pre-defined functions which can be updated dynamically, such as mean, and cannot include functions which require the whole sample, such as median.


If TRUE, the matrix of convex combinations will be expanded to include linear combinations with negative weights. Greatly increases computation, and is rarely helpful.


Must be a numeric vector. Included as a simple way to generate summary functions — for each element k of this vector, the summary function P(Rank > k) will be computed and included in the final object. Note that if barebones is set to TRUE, the elements of thresholds are the ONLY summary functions the user can specify (this must be enforced to ensure that the barebones routine does not need to keep track of all the ranks from individual iterations, but instead can retain only the updated summary function values.


The size of the subset of each sample to be used as a holdout sample. Ordinarily, this is set to 0, but for certain MSCs, namely those whose names begin with "holdout", it needs to be set to a nonzero number to be useful. A common rule of thumb is to set the size to be roughly ten percent of the total sample size. Note, however, that whenever this argument is nonzero, the function will slow down considerably, since it is then forced to fit all models twice (once with the full sample, once with only the "training" sample, without the holdout sample included.)


Logical indicating whether var.Frame should be scaled first. If true, each column will be centered by its mean and divided by its standard deviation.


Logical. If TRUE, then in each iteration, the msc values for each criterion will be scaled by taking ranks. If FALSE, then they will be scaled by standardizing instead.


Other arguments to be passed to other functions.


The basic algorithm is as follows:

  1. Choose a true model from model.List, by simulating a random entry using the weights (if present) given in weight.Vector.

  2. Simulate parameters for that model by calling make.Params with argument true.Model as given above.

  3. Simulated data from true.Model with params given above by calling function make.Data.

  4. Fit all models in model.List to the simulated data set.

  5. Calculated the model selection criteria in msc.List to each fitted model, and take the ranks of these values (within each individual MSC.)

  6. For each convex combination in the grid implicitly defined by stepSize, calculated the convex combination of ranks of the different MSCs for each model in model.List

  7. Among these values, calculate the rank of true.Model.

After these steps have been iterated num.Iter times, the summary functions specified in sumstats, as well as the average and threshold functions defined by thresholds, are computed for each convex combination.

New model selection functions, or additional methods for existing ones, can easily be written. The object initially passed to each such function will be of class "fmo", a class used internally in TMC. An fmo object will contain at least the components


the fitted model object resulting from applying fit.Model to the full data set generated by gen.Data


the fitted model object resulting from applying fit.Model to only the training part of the data set (that is, the data set less any observations held out for msc functions involving a holdout sample.) If test.Size = 0, this is NULL.


the matrix of covariates associated with the holdout sample, if any. If test.Size = 0, this is NULL.


the actual vector of observations held out as a test sample. If test.Size = 0, this is NULL.


an unbiased estimate of residual variance in regression models, included only for convenience in calculating Cp to avoid recalculating for every criterion.

Thus, to write a new model selection criterion function, one should create a generic function with a method for class "fmo", and further methods for whatever classes of model objects for which one can actually compute the criterion directly. The method for class "fmo" is typically very simple, and usually involves calling another method of the same function on some part of the fmo object itself, typically the full component for ordinary model selection criteria or the train component for criteria involving a holdout sample. For example, see PRESS.

gen.Data, gen.Params, and fit.Models are intended to be sensible defaults, but they certainly need not be the only functions one uses for simulating parameters, data, and fitting models. New methods can easily be written for all three such functions. It is recommended that, to do this, one creates a new class, create a list of model specifications (e.g., model formulae or order specifications) of this new class, and then write methods for gen.Params, etc. for this new class.


An object of class msc, or an object of class barebones, which inherits from msc, if barebones is TRUE. Contains the following components:


The matched call


A data.frame, with each row representing a convex combination of MSCs. The first 3 columns give the weights corresponding to the combination, and the remaining columns give the values of all summary statistics corresponding to the combination.


For models containing covariates, a data.frame containing them.


Iteration numbers in which the attempt to fit the true model to the simulated data set resulted in an error, thus making it impossible to compute a rank.


The length of error.Iterations.


The total length of time to complete the call.


The formula corresponding to the true model chosen in each iteration.


The number of attempts needed, during each iteration, to simulate data successfully. Mainly useful for diagnostic purposes when simulation of time series results in non-stationary data.

In addition, if barebones is FALSE, the following components will also be included:


A matrix containing the ranks corresponding to each combination for every iteration. One can use this, for example, to calculate the values of new summary functions.


A list of data vectors simulated at each iteration.


A list of vectors containing the simulated parameters from each iteration.


A list of the actual models chosen (from the prior given by weight.Vector). Each will be an element of model.List.

Plus several other components which are taken directly from the call, for convenience in later processing.


Andrew K. Smith


A more complete description of the algorithm used, as well as a discussion of its properties and illustrations of its potential utility, can be found at


# Regression example
vars <- rnorm(60)
dim(vars)<- c(20,3)
vars <- data.frame(vars)

result <- TMC(num.Iter = 3, model.List = make.Model.List.Reg(vars), msc.List = list(BIC, AIC, PRESS), var.Frame = vars)

# Time Series Example
modList <- make.Model.List.TS(c(1,0,1,0,0,1))

result2 <- TMC(num.Iter = 3,model.List = modList, msc.List = list(BIC, holdout.Mean,
AIC), test.Size = 10)

CombMSC documentation built on May 2, 2019, 2:32 p.m.