wrapper_loss: Wrapper to estimate the deviance loss by cross-validation

Description Usage Arguments Details Value Author(s) References See Also

Description

The function wrapper_loss estimates the deviance loss in a multinomial regression model by leave-one-out cross validation using fast_multinom and deviance_loss. This wrapper was used in our analysis in Bertl et al. (2007) (see References). The function wrapper_loss_binom uses a binomial model instead.

Usage

1
2
3
4
5
wrapper_loss(cv, cv_index, mi_index, model_index, modelfile, datafolder,
  resultsfolder, per_obs, nested_samples = T)

wrapper_loss_binom(cv, cv_index, mi_index, model_index, modelfile, datafolder,
  resultsfolder, per_obs, nested_samples = T)

Arguments

cv

integer. Number of pieces the dataset has been divided into for cross validation.

cv_index

integer. Which cross-validation slice is currently used?

mi_index

integer. Number of multiple imputation replicate.

model_index

integer. Number of the model in the model matrix.

modelfile

character. File that contains the models in the form of a matrix (see examples).

datafolder

character. Folder that contains the dataset at the location paste0(datafolder, cvslices[1], "/imp", mi_index, ".txt").

resultsfolder

character. Where to save the estimated loss and the estimated regression model. Note that the VC matrix is not saved.

per_obs

logical. If per_obs==T, the loss is normalized by the total number of observations (sum of all counts), so it is the mean loss.

nested_samples

logical. Default=T. Are the samples nested in the cancer types?

Details

This function estimates a multinomial regression model on the joint set of all cross validation pieces that dataset has been divided into except cv_index. Then, the deviance loss is estimated on the dataset cv_index. In a further step, the function wrapper_average_loss should be used for averaging over the loss estimates.

The data is prepared and the regression is estimated as in wrapper_fast_multinom. As the contrasts are irrelevant for prediction, they cannot be set here. By default, nested contrasts are used for the sample to avoid overspecifying the model (because this is not handled correctly by the function glm4, see fast_multinom for details. The option nested_samples allows to remove the nesting, if the cancer_type is not part of the model.

The scripts that were used to run this function and that show all settings used in Bertl et al. (2007) are available in this package in the folder inst/Bertl_et_al_2017. The pre-processed data can be downloaded from figshare.

Value

There is no output. The regression coefficients and the loss estimate are saved.

Author(s)

Johanna Bertl

References

Bertl, J.; Guo, Q.; Rasmussen, M. J.; Besenbacher, S; Nielsen, M. M.; Hornshøj, H.; Pedersen, J. S. & Hobolth, A. A Site Specific Model And Analysis Of The Neutral Somatic Mutation Rate In Whole-Genome Cancer Data. bioRxiv, 2017. doi: https://doi.org/10.1101/122879 http://www.biorxiv.org/content/early/2017/06/21/122879

See Also

fast_multinom, deviance_loss, wrapper_fast_multinom


MultinomialMutations/MultinomialMutations documentation built on May 22, 2019, 4:39 p.m.