logistic_reg: General Interface for Logistic Regression Models

Description Usage Arguments Details Engine Details Note See Also Examples

View source: R/logistic_reg.R

Description

logistic_reg() is a way to generate a specification of a model before fitting and allows the model to be created using different packages in R, Stan, keras, or via Spark. The main arguments for the model are:

These arguments are converted to their specific names at the time that the model is fit. Other options and argument can be set using set_engine(). If left to their defaults here (NULL), the values are taken from the underlying model functions. If parameters need to be modified, update() can be used in lieu of recreating the object from scratch.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
logistic_reg(mode = "classification", penalty = NULL, mixture = NULL)

## S3 method for class 'logistic_reg'
update(
  object,
  parameters = NULL,
  penalty = NULL,
  mixture = NULL,
  fresh = FALSE,
  ...
)

Arguments

mode

A single character string for the type of model. The only possible value for this model is "classification".

penalty

A non-negative number representing the total amount of regularization (glmnet, keras, and spark only). For keras models, this corresponds to purely L2 regularization (aka weight decay) while the other models can be a combination of L1 and L2 (depending on the value of mixture).

mixture

A number between zero and one (inclusive) that represents the proportion of regularization that is used for the L2 penalty (i.e. weight decay, or ridge regression) versus L1 (the lasso) (glmnet and spark only).

object

A logistic regression model specification.

parameters

A 1-row tibble or named list with main parameters to update. If the individual arguments are used, these will supersede the values in parameters. Also, using engine arguments in this object will result in an error.

fresh

A logical for whether the arguments should be modified in-place of or replaced wholesale.

...

Not used for update().

Details

For logistic_reg(), the mode will always be "classification".

The model can be created using the fit() function using the following engines:

Engine Details

Engines may have pre-set default arguments when executing the model fit call. For this type of model, the template of the fit calls are below.

glm

logistic_reg() %>% 
  set_engine("glm") %>% 
  set_mode("classification") %>% 
  translate()
1
2
3
4
5
6
7
## Logistic Regression Model Specification (classification)
## 
## Computational engine: glm 
## 
## Model fit template:
## stats::glm(formula = missing_arg(), data = missing_arg(), weights = missing_arg(), 
##     family = stats::binomial)

glmnet

logistic_reg() %>% 
  set_engine("glmnet") %>% 
  set_mode("classification") %>% 
  translate()
1
2
3
4
5
6
7
## Logistic Regression Model Specification (classification)
## 
## Computational engine: glmnet 
## 
## Model fit template:
## glmnet::glmnet(x = missing_arg(), y = missing_arg(), weights = missing_arg(), 
##     family = "binomial")

For glmnet models, the full regularization path is always fit regardless of the value given to penalty. Also, there is the option to pass multiple values (or no values) to the penalty argument. When using the predict() method in these cases, the return value depends on the value of penalty. When using predict(), only a single value of the penalty can be used. When predicting on multiple penalties, the multi_predict() function can be used. It returns a tibble with a list column called .pred that contains a tibble with all of the penalty results.

stan

logistic_reg() %>% 
  set_engine("stan") %>% 
  set_mode("classification") %>% 
  translate()
1
2
3
4
5
6
7
## Logistic Regression Model Specification (classification)
## 
## Computational engine: stan 
## 
## Model fit template:
## rstanarm::stan_glm(formula = missing_arg(), data = missing_arg(), 
##     weights = missing_arg(), family = stats::binomial, refresh = 0)

Note that the refresh default prevents logging of the estimation process. Change this value in set_engine() will show the logs.

For prediction, the stan engine can compute posterior intervals analogous to confidence and prediction intervals. In these instances, the units are the original outcome and when std_error = TRUE, the standard deviation of the posterior distribution (or posterior predictive distribution as appropriate) is returned.

spark

logistic_reg() %>% 
  set_engine("spark") %>% 
  set_mode("classification") %>% 
  translate()
1
2
3
4
5
6
7
## Logistic Regression Model Specification (classification)
## 
## Computational engine: spark 
## 
## Model fit template:
## sparklyr::ml_logistic_regression(x = missing_arg(), formula = missing_arg(), 
##     weight_col = missing_arg(), family = "binomial")

keras

logistic_reg() %>% 
  set_engine("keras") %>% 
  set_mode("classification") %>% 
  translate()
1
2
3
4
5
6
7
## Logistic Regression Model Specification (classification)
## 
## Computational engine: keras 
## 
## Model fit template:
## parsnip::keras_mlp(x = missing_arg(), y = missing_arg(), hidden_units = 1, 
##     act = "linear")

Parameter translations

The standardized parameter names in parsnip can be mapped to their original names in each engine that has main parameters:

parsnip glmnet spark keras
penalty lambda reg_param penalty
mixture alpha elastic_net_param NA

Note

For models created using the spark engine, there are several differences to consider. First, only the formula interface to via fit() is available; using fit_xy() will generate an error. Second, the predictions will always be in a spark table format. The names will be the same as documented but without the dots. Third, there is no equivalent to factor columns in spark tables so class predictions are returned as character columns. Fourth, to retain the model object for a new R session (via save()), the model$fit element of the parsnip object should be serialized via ml_save(object$fit) and separately saved to disk. In a new session, the object can be reloaded and reattached to the parsnip object.

See Also

fit()

Examples

1
2
3
4
5
6
7
logistic_reg()
# Parameters can be represented by a placeholder:
logistic_reg(penalty = varying())
model <- logistic_reg(penalty = 10, mixture = 0.1)
model
update(model, penalty = 1)
update(model, penalty = 1, fresh = TRUE)

parsnip documentation built on July 1, 2020, 10:33 p.m.