knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 )
tidylearn provides a unified tidyverse-compatible interface to R's machine
learning ecosystem. It wraps proven packages like glmnet, randomForest, xgboost,
e1071, cluster, and dbscan - you get the reliability of established
implementations with the convenience of a consistent, tidy API.
What tidylearn does:
tl_model()) to 20+ ML algorithmsWhat tidylearn is NOT:
model$fit)# Install from CRAN install.packages("tidylearn") # Or install development version from GitHub # devtools::install_github("ces0491/tidylearn")
library(tidylearn) library(dplyr)
The core of tidylearn is the tl_model() function, which dispatches to the
appropriate underlying package based on the method you specify. The wrapped
packages include stats, glmnet, randomForest, xgboost, gbm, e1071, nnet, rpart,
cluster, and dbscan.
# Classification with logistic regression model_logistic <- tl_model(iris, Species ~ ., method = "logistic") print(model_logistic)
# Make predictions predictions <- predict(model_logistic) head(predictions)
# Regression with linear model model_linear <- tl_model(mtcars, mpg ~ wt + hp, method = "linear") print(model_linear)
# Predictions predictions_reg <- predict(model_linear) head(predictions_reg)
# Principal Component Analysis model_pca <- tl_model(iris[, 1:4], method = "pca") print(model_pca)
# Transform data transformed <- predict(model_pca) head(transformed)
# K-means clustering model_kmeans <- tl_model(iris[, 1:4], method = "kmeans", k = 3) print(model_kmeans)
# Get cluster assignments clusters <- model_kmeans$fit$clusters head(clusters)
# Compare with actual species table(clusters$cluster, iris$Species)
tidylearn provides comprehensive preprocessing functions:
# Prepare data with multiple preprocessing steps processed <- tl_prepare_data( iris, Species ~ ., impute_method = "mean", scale_method = "standardize", encode_categorical = FALSE )
# Check preprocessing steps applied names(processed$preprocessing_steps)
# Use processed data for modeling model_processed <- tl_model(processed$data, Species ~ ., method = "forest")
# Simple random split split <- tl_split(iris, prop = 0.7, seed = 123) # Train model model_train <- tl_model(split$train, Species ~ ., method = "logistic") # Test predictions predictions_test <- predict(model_train, new_data = split$test) head(predictions_test)
# Stratified split (maintains class proportions) split_strat <- tl_split(iris, prop = 0.7, stratify = "Species", seed = 123) # Check proportions are maintained prop.table(table(split_strat$train$Species)) prop.table(table(split_strat$test$Species)) prop.table(table(iris$Species))
tidylearn provides a unified interface to these established R packages:
| Method | Underlying Package | Function Called |
|--------|-------------------|-----------------|
| "linear" | stats | lm() |
| "polynomial" | stats | lm() with poly() |
| "logistic" | stats | glm(..., family = binomial) |
| "ridge", "lasso", "elastic_net" | glmnet | glmnet() |
| "tree" | rpart | rpart() |
| "forest" | randomForest | randomForest() |
| "boost" | gbm | gbm() |
| "xgboost" | xgboost | xgb.train() |
| "svm" | e1071 | svm() |
| "nn" | nnet | nnet() |
| "deep" | keras | keras_model_sequential() |
| Method | Underlying Package | Function Called |
|--------|-------------------|-----------------|
| "pca" | stats | prcomp() |
| "mds" | stats, MASS, smacof | cmdscale(), isoMDS(), etc. |
| "kmeans" | stats | kmeans() |
| "pam" | cluster | pam() |
| "clara" | cluster | clara() |
| "hclust" | stats | hclust() |
| "dbscan" | dbscan | dbscan() |
You always have access to the raw model from the underlying package via $fit:
# Example: Access the raw randomForest object model_forest <- tl_model(iris, Species ~ ., method = "forest") class(model_forest$fit) # This is the randomForest object # Use package-specific functions if needed # randomForest::varImpPlot(model_forest$fit)
Now that you understand the basics, explore:
tl_auto_ml()tidylearn is a wrapper package that provides:
tl_model()) that dispatches to proven
packages like glmnet, randomForest, xgboost, e1071, and othersmodel$fit for package-specific
functionalityThe underlying algorithms are unchanged - tidylearn simply makes them easier to use together.
# Quick example combining everything data_split <- tl_split(iris, prop = 0.7, stratify = "Species", seed = 42) data_prep <- tl_prepare_data(data_split$train, Species ~ ., scale_method = "standardize") model_final <- tl_model(data_prep$data, Species ~ ., method = "forest") test_preds <- predict(model_final, new_data = data_split$test) print(model_final)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.