Getting Started with tidylearn
In tidylearn: A Unified Tidy Interface to R's Machine Learning Ecosystem

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)

Introduction

tidylearn provides a unified tidyverse-compatible interface to R's machine learning ecosystem. It wraps proven packages like glmnet, randomForest, xgboost, e1071, cluster, and dbscan - you get the reliability of established implementations with the convenience of a consistent, tidy API.

What tidylearn does:

Provides one consistent interface (tl_model()) to 20+ ML algorithms
Returns tidy tibbles instead of varied output formats
Offers unified ggplot2-based visualization across all methods
Enables pipe-friendly workflows

What tidylearn is NOT:

A reimplementation of ML algorithms (uses established packages under the hood)
A replacement for the underlying packages (you can access the raw model via model$fit)

Installation

# Install from CRAN
install.packages("tidylearn")

# Or install development version from GitHub
# devtools::install_github("ces0491/tidylearn")

library(tidylearn)
library(dplyr)

The Unified Interface

The core of tidylearn is the tl_model() function, which dispatches to the appropriate underlying package based on the method you specify. The wrapped packages include stats, glmnet, randomForest, xgboost, gbm, e1071, nnet, rpart, cluster, and dbscan.

Supervised Learning

Classification

# Classification with logistic regression
model_logistic <- tl_model(iris, Species ~ ., method = "logistic")
print(model_logistic)

# Make predictions
predictions <- predict(model_logistic)
head(predictions)

Regression

# Regression with linear model
model_linear <- tl_model(mtcars, mpg ~ wt + hp, method = "linear")
print(model_linear)

# Predictions
predictions_reg <- predict(model_linear)
head(predictions_reg)

Unsupervised Learning

Dimensionality Reduction

# Principal Component Analysis
model_pca <- tl_model(iris[, 1:4], method = "pca")
print(model_pca)

# Transform data
transformed <- predict(model_pca)
head(transformed)

Clustering

# K-means clustering
model_kmeans <- tl_model(iris[, 1:4], method = "kmeans", k = 3)
print(model_kmeans)

# Get cluster assignments
clusters <- model_kmeans$fit$clusters
head(clusters)

# Compare with actual species
table(clusters$cluster, iris$Species)

Data Preprocessing

tidylearn provides comprehensive preprocessing functions:

# Prepare data with multiple preprocessing steps
processed <- tl_prepare_data(
  iris,
  Species ~ .,
  impute_method = "mean",
  scale_method = "standardize",
  encode_categorical = FALSE
)

# Check preprocessing steps applied
names(processed$preprocessing_steps)

# Use processed data for modeling
model_processed <- tl_model(processed$data, Species ~ ., method = "forest")

Train-Test Splitting

# Simple random split
split <- tl_split(iris, prop = 0.7, seed = 123)

# Train model
model_train <- tl_model(split$train, Species ~ ., method = "logistic")

# Test predictions
predictions_test <- predict(model_train, new_data = split$test)
head(predictions_test)

# Stratified split (maintains class proportions)
split_strat <- tl_split(iris, prop = 0.7, stratify = "Species", seed = 123)

# Check proportions are maintained
prop.table(table(split_strat$train$Species))
prop.table(table(split_strat$test$Species))
prop.table(table(iris$Species))

Wrapped Packages

tidylearn provides a unified interface to these established R packages:

Supervised Methods

| Method | Underlying Package | Function Called | |--------|-------------------|-----------------| | "linear" | stats | lm() | | "polynomial" | stats | lm() with poly() | | "logistic" | stats | glm(..., family = binomial) | | "ridge", "lasso", "elastic_net" | glmnet | glmnet() | | "tree" | rpart | rpart() | | "forest" | randomForest | randomForest() | | "boost" | gbm | gbm() | | "xgboost" | xgboost | xgb.train() | | "svm" | e1071 | svm() | | "nn" | nnet | nnet() | | "deep" | keras | keras_model_sequential() |

Unsupervised Methods

| Method | Underlying Package | Function Called | |--------|-------------------|-----------------| | "pca" | stats | prcomp() | | "mds" | stats, MASS, smacof | cmdscale(), isoMDS(), etc. | | "kmeans" | stats | kmeans() | | "pam" | cluster | pam() | | "clara" | cluster | clara() | | "hclust" | stats | hclust() | | "dbscan" | dbscan | dbscan() |

Accessing the Underlying Model

You always have access to the raw model from the underlying package via $fit:

# Example: Access the raw randomForest object
model_forest <- tl_model(iris, Species ~ ., method = "forest")
class(model_forest$fit)  # This is the randomForest object

# Use package-specific functions if needed
# randomForest::varImpPlot(model_forest$fit)

Next Steps

Now that you understand the basics, explore:

Supervised Learning - Dive deeper into classification and regression
Unsupervised Learning - Explore clustering and dimensionality reduction
Integration Workflows - Combine supervised and unsupervised learning
AutoML - Automated machine learning with tl_auto_ml()

Summary

tidylearn is a wrapper package that provides:

Unified Interface: One function (tl_model()) that dispatches to proven packages like glmnet, randomForest, xgboost, e1071, and others
Transparency: Access raw model objects via model$fit for package-specific functionality
Tidy Output: All results are tibbles for easy manipulation with dplyr and ggplot2
Consistent Visualization: Unified ggplot2-based plots regardless of model type

The underlying algorithms are unchanged - tidylearn simply makes them easier to use together.

# Quick example combining everything
data_split <- tl_split(iris, prop = 0.7, stratify = "Species", seed = 42)
data_prep <- tl_prepare_data(data_split$train, Species ~ ., scale_method = "standardize")
model_final <- tl_model(data_prep$data, Species ~ ., method = "forest")
test_preds <- predict(model_final, new_data = data_split$test)

print(model_final)

Any scripts or data that you put into this service are public.

tidylearn documentation built on Feb. 6, 2026, 5:07 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

tidylearn
A Unified Tidy Interface to R's Machine Learning Ecosystem

Getting Started with tidylearn
In tidylearn: A Unified Tidy Interface to R's Machine Learning Ecosystem

Introduction

Installation

The Unified Interface

Supervised Learning

Classification

Regression

Unsupervised Learning

Dimensionality Reduction

Clustering

Data Preprocessing

Train-Test Splitting

Wrapped Packages

Supervised Methods

Unsupervised Methods

Accessing the Underlying Model

Next Steps

Summary

Try the tidylearn package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

tidylearn A Unified Tidy Interface to R's Machine Learning Ecosystem

Getting Started with tidylearn In tidylearn: A Unified Tidy Interface to R's Machine Learning Ecosystem

Introduction

Installation

The Unified Interface

Supervised Learning

Classification

Regression

Unsupervised Learning

Dimensionality Reduction

Clustering

Data Preprocessing

Train-Test Splitting

Wrapped Packages

Supervised Methods

Unsupervised Methods

Accessing the Underlying Model

Next Steps

Summary

Try the tidylearn package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

tidylearn
A Unified Tidy Interface to R's Machine Learning Ecosystem

Getting Started with tidylearn
In tidylearn: A Unified Tidy Interface to R's Machine Learning Ecosystem