cv.shim: Cross-validation for shim

Description Usage Arguments Details Author(s)

View source: R/models.R


Does k-fold cross-validation for shim and determines the optimal pair of tuning parameters (λ_β and λ_γ)


cv.shim(x, y, main.effect.names, interaction.names, weights,
  lambda.beta = NULL, lambda.gamma = NULL, nlambda.gamma = 10,
  nlambda.beta = 10, nlambda = 100, parallel = TRUE,
  type.measure = c("mse"), nfolds = 10, ...)



Design matrix of dimension n x q, where n is the number of subjects and q is the total number of variables; each row is an observation vector. This must include all main effects and interactions as well, with column names corresponding to the names of the main effects (e.g. x1, x2, E) and their interactions (e.g. x1:E, x2:E). All columns should be scaled to have mean 0 and variance 1; this is done internally by the shim function.


response variable. For family="gaussian" should be a 1 column matrix or numeric vector. For family="binomial", if the response is a vector it can be numeric with 0 for failure and 1 for success, or a factor with the first level representing "failure" and the second level representing "success". Alternatively, For binomial logistic regression, the response can be a matrix where the first column is the number of "successes" and the second column is the number of "failures".


character vector of main effects names. MUST be ordered in the same way as the column names of x. e.g. if the column names of x are "x1","x2" then main.effect.names = c("x1","x2")


character vector of interaction names. MUST be separated by a colon (e.g. x1:x2), AND MUST be ordered in the same way as the column names of x


observation weights. Can be total counts if responses are proportion matrices. Default is 1 for each observation. Currently NOT IMPLEMENTED


sequence of tuning parameters for the main effects. If NULL (default), this function will automatically calculate a sequence using the shim_once function which will be over a grid of tuning parameters for gamma as well. If the user specifies a sequence then this function will not automatically perform the serach over a grid. You will need to create the grid yourself e.g. repeat the lambda.gamma for each value of lambda.beta


sequence of tuning parameters for the interaction effects. Default is NULL which means this function will automatically calculate a sequence of tuning paramters. See shim_once for details on how this sequence is calculated.


number of tuning parameters for gamma. This needs to be specified even for user defined inputs


number of tuning parameters for beta. This needs to be specified even for user defined inputs


total number of tuning parameters. If lambda.beta = NULL and lambda.gamma = NULL then nlambda should be equal to nlambda.beta x nlambda.gamma. This is important to specify especially when a user defined sequence of tuning parameters is set.


If TRUE, use parallel foreach to fit each fold. Must register parallel before hand using the registerDoMC function from the doMC package. See the example below for details.


loss to use for cross-validation. Currently only 1 option. The default is type.measure="mse", which uses squared-error for gaussian models


number of folds - default is 10. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is nfolds=3


The function runs shim nfolds+1 times; the first to get the tuning parameter sequences, and then the remainder to compute the fit with each of the folds omitted. The error is accumulated, and the average error and standard deviation over the folds is computed. Note also that the results of cv.shim are random, since the folds are selected at random using the createfolds function. Users can reduce this randomness by running cv.shim many times, and averaging the error curves.


Sahir Bhatnagar

Maintainer: Sahir Bhatnagar [email protected]

sahirbhatnagar/shim documentation built on May 25, 2017, 11:36 p.m.