Data a simulation study reported by Shao (1993, Table 1). The linear regression model Shao (1993, Table 2) reported 4 simulation experiments using 4 different values for the regression coefficients:

*
y = 2 + b[2] x2 + b[3] x3 + b[4] x4 + b[5] x5 + e,*

where *e* is an independent normal error with unit variance.

The four regression coefficients for the four experiments are shown in the table below,

Experiment | b[2]
| b[3]
| b[4]
| b[5] |

1 | 0 | 0 | 4 | 0 |

2 | 0 | 0 | 4 | 8 |

3 | 9 | 0 | 4 | 8 |

4 | 9 | 6 | 4 | 8 |

The table below summarizes the probability of correct model selection in the experiment reported by Shao (1993, Table 2). Three model selection methods are compared: LOOCV (leave-one-out CV), CV(d=25) or the delete-d method with d=25 and APCV which is a very efficient computation CV method but specialized to the case of linear regression.

Experiment | LOOCV | CV(d=25) | APCV |

1 | 0.484 | 0.934 | 0.501 |

2 | 0.641 | 0.947 | 0.651 |

3 | 0.801 | 0.965 | 0.818 |

4 | 0.985 | 0.948 | 0.999 |

The CV(d=25) outperforms LOOCV in all cases and it also outforms APCV by a large margin in Experiments 1, 2 and 3 but in case 4 APCV is slightly better.

1 |

A data frame with 40 observations on the following 4 inputs.

`x2`

a numeric vector

`x3`

a numeric vector

`x4`

a numeric vector

`x5`

a numeric vector

Shao, Jun (1993). Linear Model Selection by Cross-Validation. Journal of the American Statistical Assocation 88, 486-494.

1 2 3 4 5 6 7 8 9 10 |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.