Simulated Regression Data

Share:

Description

Data a simulation study reported by Shao (1993, Table 1). The linear regression model Shao (1993, Table 2) reported 4 simulation experiments using 4 different values for the regression coefficients:

y = 2 + b[2] x2 + b[3] x3 + b[4] x4 + b[5] x5 + e,

where e is an independent normal error with unit variance.

The four regression coefficients for the four experiments are shown in the table below,

Experiment b[2] b[3] b[4] b[5]
1 0 0 4 0
2 0 0 4 8
3 9 0 4 8
4 9 6 4 8

The table below summarizes the probability of correct model selection in the experiment reported by Shao (1993, Table 2). Three model selection methods are compared: LOOCV (leave-one-out CV), CV(d=25) or the delete-d method with d=25 and APCV which is a very efficient computation CV method but specialized to the case of linear regression.

Experiment LOOCV CV(d=25) APCV
1 0.484 0.934 0.501
2 0.641 0.947 0.651
3 0.801 0.965 0.818
4 0.985 0.948 0.999

The CV(d=25) outperforms LOOCV in all cases and it also outforms APCV by a large margin in Experiments 1, 2 and 3 but in case 4 APCV is slightly better.

Usage

1

Format

A data frame with 40 observations on the following 4 inputs.

x2

a numeric vector

x3

a numeric vector

x4

a numeric vector

x5

a numeric vector

Source

Shao, Jun (1993). Linear Model Selection by Cross-Validation. Journal of the American Statistical Assocation 88, 486-494.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#In this example BICq(q=0.25) selects the correct model but BIC does not
data(Shao)
X<-as.matrix.data.frame(Shao)
b<-c(0,0,4,0)
set.seed(123321123)
#Note: matrix multiplication must be escaped in Rd file
y<-X%*%b+rnorm(40)
Xy<-data.frame(Shao, y=y)
bestglm(Xy)
bestglm(Xy, IC="BICq")