DoubleD: Double Descent Phenomenon

Double DescentR Documentation

Double Descent Phenomenon

Description

Belkin and others have shown that some machine learning algorithms exhibit surprising behavior when in overfitting settings. The classic U-shape of mean loss plotted against model complexity may be followed by a surprise second "mini-U."

Alternatively, one might keep the model complexity fixed while varying the number of data points n, including over a region in which n is smaller than the complexity value of the model. The surprise here is that mean loss may actually increase with n in the overfitting region.

The function doubleD facilitates easy exploration of this phenomenon.

Usage

doubleD(qeFtnCall,xPts,nReps,makeDummies=NULL,classif=FALSE)

Arguments

qeFtnCall

Quoted string; somewhere should include 'xPts[i]'.

xPts

Range of values to be used in the experiments, e.g. a vector of degrees for polynomial models.

nReps

Number of repetitions for each experiment, typically the number in the holdout set.

makeDummies

If non-NULL, call regtools::factorsToDummies on the dataset of this name. This avoids the problem of some levels of a factor appearing in the holdout set but not the training set.

classif

Set TRUE if this is a classification problem.

Details

The function will run the code in qeFtnCall nreps times for each level specified in xPts, recording the test and training error in each case. So, for each level, we will have a mean test and training error.

Value

Each call in xPts results in one line in the return value of doubleD. The return matrix can then be plotted, using the generic plot.doubleD. Mean test (red) and training (blue) accuracy will be plotted against xPts.

Author(s)

Norm Matloff

Examples

   ## Not run: 
      data(mlb1)
      hw <- mlb1[,2:3]
      doubleD('qePolyLin(hw,"Weight",deg=xPts[i])',1:20,250)
   
## End(Not run)

matloff/qeML documentation built on Dec. 15, 2024, 10:15 a.m.