Split-sample-derived Shrinkage After Estimation

Share:

Description

Shrink regression coefficients using a split-sample-derived shrinkage factor.

Usage

1
splitval(dataset, model, nrounds, fract, sdm, int = TRUE, int.adj)

Arguments

dataset

a dataset for regression analysis. Data should be in the form of a matrix, with the outcome variable as the final column. Application of the datashape function beforehand is recommended, especially if categorical predictors are present. For regression with an intercept included a column vector of 1s should be included before the dataset (see examples)

model

type of regression model. Either "linear" or "logistic".

nrounds

the number of times to replicate the sample splitting process.

fract

the fraction of observations designated to the training set

sdm

a shrinkage design matrix. For examples, see ols.shrink

int

logical. If TRUE the model will include a regression intercept.

int.adj

logical. If TRUE the regression intercept will be re-estimated after shrinkage of the regression coefficients.

Details

This function applies sample-splitting to a dataset in order to derive a shrinkage factor and apply it to the regression coefficients. Data are randomly split into two sets, a training set and a test set. Regression coefficients are estimated using the training sample, and then a shrinkage factor is estimated using the test set. The mean of N shrinkage factors is then applied to the original regression coeffients, and the regression intercept may be re-estimated.

This process can currently be applied to linear or logistic regression models.

Value

splitval returns a list containing the following:

raw.coeff

the raw regression model coefficients, pre-shrinkage.

shrunk.coeff

the shrunken regression model coefficients

lambda

the mean shrinkage factor over Nrounds split-sample replicates

Nrounds

the number of rounds of sample splitting

sdm

the shrinkage design matrix used to apply the shrinkage factor(s) to the regression coefficients

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Example 1: Linear regression using the iris dataset
## Split-sample-derived shrinkage with 100 rounds of sample-splitting
data(iris)
iris.data <- as.matrix(iris[, 1:4])
iris.data <- cbind(1, iris.data)
sdm1 <- matrix(c(0, 1, 1, 1), nrow = 1)
set.seed(321)
splitval(dataset = iris.data, model = "linear", nrounds = 100,
fract = 0.75, sdm = sdm1, int = TRUE, int.adj = TRUE)

## Example 2: logistic regression using a subset of the mtcars data
## Split-sample-derived shrinkage
data(mtcars)
mtc.data <- cbind(1,datashape(mtcars, y = 8, x = c(1, 6, 9)))
head(mtc.data)
set.seed(123)
splitval(dataset = mtc.data, model = "logistic",
nrounds = 100, fract = 0.5)