
Machine Learning for Tidynauts
tidylearn provides a unified tidyverse-compatible interface to R's machine
learning ecosystem. It wraps proven packages like glmnet, randomForest,
xgboost, e1071, cluster, and dbscan - you get the reliability of established
implementations with the convenience of a consistent, tidy API.
What tidylearn does:
tl_model()) to 20+ ML algorithms%>%What tidylearn is NOT:
model$fit)Each ML package in R has its own API, output format, and conventions. tidylearn provides a translation layer so you can:
| Without tidylearn | With tidylearn | | ------------------------------------- | ----------------------- | | Learn different APIs for each package | One API for everything | | Write custom code to extract results | Consistent tibble output | | Create different plots for each model | Unified visualization | | Manage package-specific quirks | Focus on your analysis |
The underlying algorithms are unchanged - tidylearn simply makes them easier to use together.
# Install from CRAN
install.packages("tidylearn")
# Or install development version from GitHub
# devtools::install_github("ces0491/tidylearn")
A single tl_model() function dispatches to the appropriate underlying package:
library(tidylearn)
# Classification -> uses randomForest::randomForest()
model <- tl_model(iris, Species ~ ., method = "forest")
# Regression -> uses stats::lm()
model <- tl_model(mtcars, mpg ~ wt + hp, method = "linear")
# Regularization -> uses glmnet::glmnet()
model <- tl_model(mtcars, mpg ~ ., method = "lasso")
# Clustering -> uses stats::kmeans()
model <- tl_model(iris[,1:4], method = "kmeans", k = 3)
# PCA -> uses stats::prcomp()
model <- tl_model(iris[,1:4], method = "pca")
All results come back as tibbles, ready for dplyr and ggplot2:
# Predictions as tibbles
predictions <- predict(model, new_data = test_data)
# Metrics as tibbles
metrics <- tl_evaluate(model, test_data)
# Easy to pipe
model %>%
predict(test_data) %>%
bind_cols(test_data) %>%
ggplot(aes(x = actual, y = prediction)) +
geom_point()
You always have access to the raw model from the underlying package:
model <- tl_model(iris, Species ~ ., method = "forest")
# Access the randomForest object directly
model$fit # This is the randomForest::randomForest() result
# Use package-specific functions if needed
randomForest::varImpPlot(model$fit)
tidylearn provides a unified interface to these established R packages:
| Method | Underlying Package | Function Called |
| -------- | ------------------- | ----------------- |
| "linear" | stats | lm() |
| "polynomial" | stats | lm() with poly() |
| "logistic" | stats | glm(..., family = binomial) |
| "ridge", "lasso", "elastic_net" | glmnet | glmnet() |
| "tree" | rpart | rpart() |
| "forest" | randomForest | randomForest() |
| "boost" | gbm | gbm() |
| "xgboost" | xgboost | xgb.train() |
| "svm" | e1071 | svm() |
| "nn" | nnet | nnet() |
| "deep" | keras | keras_model_sequential() |
| Method | Underlying Package | Function Called |
|--------|-------------------|-----------------|
| "pca" | stats | prcomp() |
| "mds" | stats, MASS, smacof | cmdscale(), isoMDS(), etc. |
| "kmeans" | stats | kmeans() |
| "pam" | cluster | pam() |
| "clara" | cluster | clara() |
| "hclust" | stats | hclust() |
| "dbscan" | dbscan | dbscan() |
Beyond wrapping individual packages, tidylearn provides orchestration functions that combine multiple techniques:
# Reduce dimensions before classification
reduced <- tl_reduce_dimensions(iris, response = "Species",
method = "pca", n_components = 3)
model <- tl_model(reduced$data, Species ~ ., method = "logistic")
# Add cluster membership as a feature
enriched <- tl_add_cluster_features(data, response = "target",
method = "kmeans", k = 3)
model <- tl_model(enriched, target ~ ., method = "forest")
# Use clustering to propagate labels to unlabeled data
model <- tl_semisupervised(data, target ~ .,
labeled_indices = labeled_idx,
cluster_method = "kmeans")
# Automatically try multiple approaches
result <- tl_auto_ml(data, target ~ .,
time_budget = 300)
result$leaderboard
Consistent ggplot2-based plotting regardless of model type:
# Generic plot method works for all model types
plot(forest_model) # Automatic visualization based on model type
plot(linear_model) # Diagnostic plots for regression
plot(pca_result) # Variance explained for PCA
# Specialized plotting functions for unsupervised learning
plot_clusters(clustering_result, cluster_col = "cluster")
plot_variance_explained(pca_result$fit$variance_explained)
# Interactive dashboard for detailed exploration
tl_dashboard(model, test_data)
tidylearn is built on these principles:
Transparency: The underlying packages do the real work. tidylearn makes them easier to use together without hiding what's happening.
Consistency: One interface, tidy output, unified visualization - across all methods.
Accessibility: Focus on your analysis, not on learning different package APIs.
Interoperability: Results work seamlessly with dplyr, ggplot2, and the broader tidyverse.
# View package help
?tidylearn
# Explore main functions
?tl_model
?tl_evaluate
?tl_auto_ml
Contributions are welcome! Please feel free to submit a Pull Request.
MIT License - see LICENSE for details.
Cesaire Tobias (cesaire@sheetsolved.com)
tidylearn is a wrapper that builds upon the excellent work of many R package authors. The actual algorithms are implemented in:
Thank you to all the package maintainers whose work makes tidylearn possible.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.