Getting Started with tidylearn

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)

Introduction

tidylearn provides a unified tidyverse-compatible interface to R's machine learning ecosystem. It wraps proven packages like glmnet, randomForest, xgboost, e1071, cluster, and dbscan - you get the reliability of established implementations with the convenience of a consistent, tidy API.

What tidylearn does:

What tidylearn is NOT:

Installation

# Install from CRAN
install.packages("tidylearn")

# Or install development version from GitHub
# devtools::install_github("ces0491/tidylearn")
library(tidylearn)
library(dplyr)

The Unified Interface

The core of tidylearn is the tl_model() function, which dispatches to the appropriate underlying package based on the method you specify. The wrapped packages include stats, glmnet, randomForest, xgboost, gbm, e1071, nnet, rpart, cluster, and dbscan.

Supervised Learning

Classification

# Classification with logistic regression
model_logistic <- tl_model(iris, Species ~ ., method = "logistic")
print(model_logistic)
# Make predictions
predictions <- predict(model_logistic)
head(predictions)

Regression

# Regression with linear model
model_linear <- tl_model(mtcars, mpg ~ wt + hp, method = "linear")
print(model_linear)
# Predictions
predictions_reg <- predict(model_linear)
head(predictions_reg)

Unsupervised Learning

Dimensionality Reduction

# Principal Component Analysis
model_pca <- tl_model(iris[, 1:4], method = "pca")
print(model_pca)
# Transform data
transformed <- predict(model_pca)
head(transformed)

Clustering

# K-means clustering
model_kmeans <- tl_model(iris[, 1:4], method = "kmeans", k = 3)
print(model_kmeans)
# Get cluster assignments
clusters <- model_kmeans$fit$clusters
head(clusters)
# Compare with actual species
table(clusters$cluster, iris$Species)

Data Preprocessing

tidylearn provides comprehensive preprocessing functions:

# Prepare data with multiple preprocessing steps
processed <- tl_prepare_data(
  iris,
  Species ~ .,
  impute_method = "mean",
  scale_method = "standardize",
  encode_categorical = FALSE
)
# Check preprocessing steps applied
names(processed$preprocessing_steps)
# Use processed data for modeling
model_processed <- tl_model(processed$data, Species ~ ., method = "forest")

Train-Test Splitting

# Simple random split
split <- tl_split(iris, prop = 0.7, seed = 123)

# Train model
model_train <- tl_model(split$train, Species ~ ., method = "logistic")

# Test predictions
predictions_test <- predict(model_train, new_data = split$test)
head(predictions_test)
# Stratified split (maintains class proportions)
split_strat <- tl_split(iris, prop = 0.7, stratify = "Species", seed = 123)

# Check proportions are maintained
prop.table(table(split_strat$train$Species))
prop.table(table(split_strat$test$Species))
prop.table(table(iris$Species))

Wrapped Packages

tidylearn provides a unified interface to these established R packages:

Supervised Methods

| Method | Underlying Package | Function Called | |--------|-------------------|-----------------| | "linear" | stats | lm() | | "polynomial" | stats | lm() with poly() | | "logistic" | stats | glm(..., family = binomial) | | "ridge", "lasso", "elastic_net" | glmnet | glmnet() | | "tree" | rpart | rpart() | | "forest" | randomForest | randomForest() | | "boost" | gbm | gbm() | | "xgboost" | xgboost | xgb.train() | | "svm" | e1071 | svm() | | "nn" | nnet | nnet() | | "deep" | keras | keras_model_sequential() |

Unsupervised Methods

| Method | Underlying Package | Function Called | |--------|-------------------|-----------------| | "pca" | stats | prcomp() | | "mds" | stats, MASS, smacof | cmdscale(), isoMDS(), etc. | | "kmeans" | stats | kmeans() | | "pam" | cluster | pam() | | "clara" | cluster | clara() | | "hclust" | stats | hclust() | | "dbscan" | dbscan | dbscan() |

Accessing the Underlying Model

You always have access to the raw model from the underlying package via $fit:

# Example: Access the raw randomForest object
model_forest <- tl_model(iris, Species ~ ., method = "forest")
class(model_forest$fit)  # This is the randomForest object

# Use package-specific functions if needed
# randomForest::varImpPlot(model_forest$fit)

Next Steps

Now that you understand the basics, explore:

  1. Supervised Learning - Dive deeper into classification and regression
  2. Unsupervised Learning - Explore clustering and dimensionality reduction
  3. Integration Workflows - Combine supervised and unsupervised learning
  4. AutoML - Automated machine learning with tl_auto_ml()

Summary

tidylearn is a wrapper package that provides:

The underlying algorithms are unchanged - tidylearn simply makes them easier to use together.

# Quick example combining everything
data_split <- tl_split(iris, prop = 0.7, stratify = "Species", seed = 42)
data_prep <- tl_prepare_data(data_split$train, Species ~ ., scale_method = "standardize")
model_final <- tl_model(data_prep$data, Species ~ ., method = "forest")
test_preds <- predict(model_final, new_data = data_split$test)

print(model_final)


Try the tidylearn package in your browser

Any scripts or data that you put into this service are public.

tidylearn documentation built on Feb. 6, 2026, 5:07 p.m.