descriptors: Data Set Characteristics Available when Fitting Models

Description Usage Details

Description

When using the fit() functions there are some variables that will be available for use in arguments. For example, if the user would like to choose an argument value based on the current number of rows in a data set, the .obs() function can be used. See Details below.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
.cols()

.preds()

.obs()

.lvls()

.facts()

.x()

.y()

.dat()

Details

Existing functions:

For example, if you use the model formula circumference ~ . with the built-in Orange data, the values would be

1
2
3
4
5
6
7
8
 .preds() =   2          (the 2 remaining columns in `Orange`)
 .cols()  =   5          (1 numeric column + 4 from Tree dummy variables)
 .obs()   = 35
 .lvls()  =  NA          (no factor outcome)
 .facts() =   1          (the Tree predictor)
 .y()     = <vector>     (circumference as a vector)
 .x()     = <data.frame> (The other 2 columns as a data frame)
 .dat()   = <data.frame> (The full data set)

If the formula Tree ~ . were used:

1
2
3
4
5
6
7
8
 .preds() =   2          (the 2 numeric columns in `Orange`)
 .cols()  =   2          (same)
 .obs()   = 35
 .lvls()  =  c("1" = 7, "2" = 7, "3" = 7, "4" = 7, "5" = 7)
 .facts() =   0
 .y()     = <vector>     (Tree as a vector)
 .x()     = <data.frame> (The other 2 columns as a data frame)
 .dat()   = <data.frame> (The full data set)

To use these in a model fit, pass them to a model specification. The evaluation is delayed until the time when the model is run via fit() (and the variables listed above are available). For example:

1
2
3
4
library(modeldata)
data("lending_club")

rand_forest(mode = "classification", mtry = .cols() - 2)

When no descriptors are found, the computation of the descriptor values is not executed.


parsnip documentation built on July 8, 2020, 7:22 p.m.