compare: A Comparison of Regression Modelling Strategies

Description Usage Arguments Details Value Note References Examples

Description

Compare two nested modelling strategies and return measures of their relative predictive performance

Usage

1
2
compare(model, Ntrials, strat1, strat2, data, Nrows, Ncomp, int = TRUE,
  int.adj, trim = FALSE, output = TRUE)

Arguments

model

the type of regression model. Either "linear" or "logistic".

Ntrials

number of simulation trials.

strat1

a list containing the strategy name and strategy-specific parameter values. This modelling strategy is taken as the reference for comparison.

strat2

a list containing the strategy name and strategy-specific parameter values. This modelling strategy is compared with the reference strategy, strat1.

data

a list describing the dataset in which the selected modelling strategies will be compared. If the first object in the list is "norm" or "unif", the user may submit parameters for generating multivariable simulated datasets (see details below. Users may specify their own dataset using the format list("dataset", user_data), where the second object in the list is their own dataset.

Nrows

the number of rows of observations in simulated datasets of type "norm" or "unif".

Ncomp

the number of rows of observations in the comparison set. This dataset is taken to represent the overall population, from which the training set is sampled. When data is of type "dataset", if Ncomp is not specified, the original data will be used as the comparison dataset.

int

logical. If int == TRUE an intercept will be included in the regression model.

int.adj

logical. If int.adj == TRUE the intercept will be re-estimated after shrinkage is applied to the regression coefficients.

trim

logical. If trim == TRUE a "trimmed" comparison distribution will be returned, along with a victory rate and median precision ratio derived using the trimmed distribution. The trimmed distribution only contains precision ratios within a range of plus or minus two times the interquartile range around the median precision ratio.

output

logical. If output == TRUE the function will return two graphical representations of the comparison distribution.

Details

This is the core function in the apricom package. The *compare* function can be used to compare the performance of two prediction model building approaches for either simulated or user-specified data. For further details, see the apricom user manual.

The following strategies are currently supported: heuristic shrinkage ("heuristic"), split-sample-derived shrinkage ("split"), cross-validation-derived shrinkage ("kcv"), leave-one-out cross-validation-derived shrinkage ("loocv"), bootstrap-derived shrinkage ("boot") and penalized logistic regression using Firth's penalty ("pml.firth"). Furthermore, models built using these methods may be compared with raw models fitted by ordinary least squares estimation ("lsq") or maximum likelihood estimation ("ml").

Strategies should be specified within the "strat1" and "strat2" arguments in the form of a list, starting with the strategy name (as listed above in parentheses), followed by relevant parameters for each respective method. For further details see individual help files for each strategy, and the examples below. Note that in the *compare* function call, the dataset should not be specified within the "strat1" or "strat2" arguments, and instead should only be called within the "data" argument.

Value

compare returns a list containing the following:

VR

the victory rate of strategy 2 over strategy 1.

MPR

the median precision ratio over Ntrials comparison trials.

PR.IQR

the precision ratio interquartile range over Ntrials comparison trials.

VR.trim

if trim == TRUE the "trimmed" victory rate of strategy 2 over strategy 1 is returned.

MPR.trim

if trim == TRUE the "trimmed" median precision ratio over Ntrials comparison trials is returned.

distribution

the comparison distribution of strategy 2 vs. strategy 1. This is the distribution of precision ratios generated from Ntrials comparison trials

distribution.trim

if trim == TRUE the "trimmed" comparison distribution is returned.

N.rejected

the number of trials excluded from the comparison distribution by trimming

strat1

modelling strategy 1

strat2

modelling strategy 2

shrinkage1

If strategy 1 is a shrinkage-after-estimation technique, a vector or matrix containing the shrinkage factor estimated in each trial is returned

shrinkage1

If strategy 1 is a shrinkage-after-estimation technique, a vector or matrix containing the shrinkage factor estimated in each trial is returned

Note

When using compare it is strongly recommended that ideally 10000 comparison trials are used, to give stable estimates. Comparisons with logistic regression modelling model adjustment strategies are considerably slower than with linear regression, and 1000-5000 trials may be preferred. The examples provided in this documentation use considerably fewer comparison trials and yield highly unstable estimates.

References

Pestman W., Groenwold R. H. H., Teerenstra. S, "Comparison of strategies when building linear prediction models." Numerical Linear Algebra with Applications (2013)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
## Example 1: Comparison of heuristic formula-derived shrinkage against
## a raw least squares model. Data is simulated multivariable random
## normally distributed.The comparison set will have 2000 rows. Here only
## 10 trial replicates are used, but at least 1000 should be used in practice

 mu <- c(rep(0, 21))
 rho <- 0.5
 comp1 <- compare(model = "linear", Ntrials = 10, strat1 = list("lsq"),
          strat2 = list("heuristic", DF = 8),
          data = list("norm", mu, rho), Nrows = 200, Ncomp = 2000,
          int = TRUE, int.adj = FALSE, trim = FALSE, output = TRUE)


## Example 2: A truncated comparison of 10-rep, 10-fold
## cross-validation-derived shrinkage against leave-one-out cross-validation.
## Data is simulated multivariable random uniformly distributed
## (50 rows; 5 predictors with mean=0 ; r^2 = 0.7)
## The comparison set will contain 1000 observations.

mu <- c(rep(0, 6))
rho <- 0.7
comp2 <- compare(model = "linear", Ntrials = 10, strat1 = list("loocv"),
          strat2 = list("kcv", k = 10, nreps = 10),data = list("unif", mu, rho),
          Nrows = 50, Ncomp = 1000, trim = TRUE)


## Example 3:  Comparison of penalized logistic regression with
## Firth's penalty against raw logistic regression model using
## maximum likelihood estimation.
## Note that the logistf package is required for pml.firth.

library(shrink)
data(deepvein)
dv.data <- datashape(deepvein, y = 3, x = 4:11)
set.seed(123)
comp4 <- compare(model = "logistic", Ntrials = 10,
         strat1 = list("ml"), strat2 = list("pml.firth"),
         data = list("dataset", dv.data), int = TRUE,
         int.adj = TRUE, trim = FALSE, output = TRUE)

apricom documentation built on May 2, 2019, 6:21 a.m.

Related to compare in apricom...