knitr::opts_chunk$set( eval = identical(Sys.getenv("NOT_CRAN"), "true"), fig.width = 7, fig.height = 5, warning = FALSE, message = FALSE ) # Sys.setenv("_R_USE_PIPEBIND_" = TRUE)
This package is planned to make it compatible for any machine learning task, even time series and image classification cam be supported. Yes, you can do both linear regression and logistic regression with extra steps: heavily customized optimizer and loss functions. The train_nn() function (available on >v0.3.x) supports this { optimizer $\leftrightarrow$ optimizer_args } and { loss }. For both cases, the key is to remove all hidden layers and rely entirely on the output layer and the appropriate loss function to recover the classical model's behavior.
box::use( kindling[train_nn, act_funs, args], recipes[ recipe, step_dummy, step_normalize, all_nominal_predictors, all_numeric_predictors ], rsample[initial_split, training, testing], yardstick[metric_set, rmse, rsq, accuracy, mn_log_loss], dplyr[mutate, select], tibble[tibble] )
A standard linear regression model predicts a continuous outcome as a weighted sum of inputs — no nonlinearity, no hidden layers. A neural network recovers this exactly when:
hidden_neurons = integer(0) or simply omit it),loss = "mse").Under these conditions, gradient descent minimizes the same objective as ordinary least squares, and the learned weights converge to the OLS solution given sufficient epochs and a small learning rate.
We use mtcars to predict fuel efficiency (mpg) from the other variables.
set.seed(42) split = initial_split(mtcars, prop = 0.8) train = training(split) test = testing(split) rec = recipe(mpg ~ ., data = train) |> step_normalize(all_numeric_predictors())
To create no hidden units, the hidden_neuron parameter from train_nn() considers the following to achieve:
NULLc()In this example, the empty vector c() is used and will collapse the network to a single linear layer from inputs to output. The optimizer = "rmsprop" with a small learn_rate mirrors classical gradient descent for OLS.
lm_nn = train_nn( mpg ~ ., data = train, hidden_neurons = c(), loss = torch::nnf_l1_loss, optimizer = "rmsprop", learn_rate = 0.01, epochs = 200, verbose = FALSE ) lm_nn
preds = predict(lm_nn, newdata = test) tibble( truth = test$mpg, estimate = preds ) |> metric_set(rmse, rsq)(truth = truth, estimate = estimate)
lm()lm_fit = lm(mpg ~ ., data = train) tibble( truth = test$mpg, estimate = predict(lm_fit, newdata = test) ) |> metric_set(rmse, rsq)(truth = truth, estimate = estimate)
The two models should produce very similar RMSE and $R^2$ values. Any small gap reflects that gradient descent is an iterative approximation, while lm() solves for the exact OLS coefficients directly. Increasing epochs or switching to optimizer = "lbfgs" (if supported) will close the gap further.
Logistic regression models a binary or multiclass outcome by passing a linear combination of inputs through a sigmoid or softmax activation. A neural network with:
loss = "cross_entropy") for the loss functionis mathematically equivalent to logistic regression.
We use the Sonar dataset from {mlbench} to distinguish rocks from mines (binary outcome).
data("Sonar", package = "mlbench") sonar = Sonar set.seed(42) split_s = initial_split(sonar, prop = 0.8, strata = Class) train_s = training(split_s) test_s = testing(split_s) rec_s = recipe(Class ~ ., data = train_s) |> step_normalize(all_numeric_predictors())
logit_nn = train_nn( Class ~ ., data = train_s, hidden_neurons = c(), loss = "cross_entropy", optimizer = "adam", learn_rate = 0.01, epochs = 200, verbose = FALSE ) logit_nn
preds_s = predict(logit_nn, newdata = test_s, type = "response") tibble( truth = test_s$Class, estimate = preds_s ) |> accuracy(truth = truth, estimate = estimate)
glm() / nnet::multinom()box::use(nnet[multinom]) glm_fit = glm(Class ~ ., data = train_s, family = binomial()) tibble( truth = test_s$Class, estimate = { as.factor({ preds = predict(glm_fit, newdata = test_s, type = "response") ifelse(preds < 0.5, "M", "R") }) } ) |> accuracy(truth = truth, estimate = estimate)
Again, accuracy should be comparable between the two approaches. The neural network version converges iteratively, so the match is not guaranteed to be exact, but both are optimizing the same cross-entropy objective over a linear model.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.