In jeremyrcoyle/sl3: Pipelines for Machine Learning and Super Learning

library(knitr)
knitr::opts_chunk$set(echo = TRUE)
library(sl3)
library(delayed)
library(SuperLearner)
library(future)
library(ggplot2)
library(data.table)
library(stringr)
library(scales)

Introduction

This document consists of some simple benchmarks for various choices of Super Learner implementation, wrapper functions, and parallelization schemes. The purpose of this document is two-fold:

Compare the computational performance of these methods
Illustrate the use of these different methods

Test Setup

Test System

uname <- system("uname -a", intern = TRUE)
os <- sub(" .*", "", uname)
if (os == "Darwin") {
  cpu_model <- system("sysctl -n machdep.cpu.brand_string", intern = TRUE)
  cpus_physical <- as.numeric(system("sysctl -n hw.physicalcpu", intern = TRUE))
  cpus_logical <- as.numeric(system("sysctl -n hw.logicalcpu", intern = TRUE))
  cpu_clock <- system("sysctl -n hw.cpufrequency_max", intern = TRUE)
  memory <- system("sysctl -n hw.memsize", intern = TRUE)
} else if (os == "Linux") {
  cpu_model <- system("lscpu | grep 'Model name'", intern = TRUE)
  cpu_model <- gsub("Model name:[[:blank:]]*", "", cpu_model)
  cpus_logical <- system("lscpu | grep '^CPU(s)'", intern = TRUE)
  cpus_logical <- as.numeric(gsub("^.*:[[:blank:]]*","", cpus_logical))
  tpc <- system("lscpu | grep '^Thread(s) per core'", intern = TRUE)
  tpc <- as.numeric(gsub("^.*:[[:blank:]]*", "", tpc))
  cpus_physical <- cpus_logical / tpc
  cpu_clock <- as.numeric(gsub("GHz", "", gsub("^.*@", "", cpu_model)))*10^9
  memory <- system("cat /proc/meminfo | grep '^MemTotal'", intern = TRUE)
  memory <- as.numeric(gsub("kB", "", gsub("^.*:", "", memory)))*2^10
} else {
  stop("unsupported OS")
}

CPU model: r cpu_model
Physical cores: r as.numeric(cpus_physical)
Logical cores: r as.numeric(cpus_logical)
Clock speed: r as.numeric(cpu_clock)/10^9GHz
Memory: r round(as.numeric(memory)/2^30, 1)GB

Test Data

n = 1e4
data(cpp_imputed)
cpp_big <- cpp_imputed[sample(nrow(cpp_imputed), n, replace = TRUE), ]
covars <- c("apgar1", "apgar5", "parity", "gagebrth", "mage", "meducyrs",
            "sexn")
outcome <- "haz"

task <- sl3_Task$new(cpp_big, covariates = covars, outcome = outcome,
                     outcome_type = "continuous")

Number of observations: r nrow(task$X)
Number of covariates: r ncol(task$X)

Test Descriptions

Legacy `SuperLearner`

The legacy SuperLearner package serves as a suitable baseline. We can fit it sequentially (no parallelization):

time_SuperLearner_sequential <- system.time({
  SuperLearner(task$Y, as.data.frame(task$X), newX = NULL, family = gaussian(), 
               SL.library = c("SL.glmnet", "SL.randomForest", "SL.speedglm"),
               method = "method.NNLS", id = NULL, verbose = FALSE,
               control = list(), cvControl = list(), obsWeights = NULL,
               env = parent.frame())
})

We can also fit it using multicore parallelization, using the mcSuperLearner function.

options(mc.cores = cpus_physical)
time_SuperLearner_multicore <- system.time({
  mcSuperLearner(task$Y, as.data.frame(task$X), newX = NULL,
                 family = gaussian(),
                 SL.library = c("SL.glmnet", "SL.randomForest", "SL.speedglm"),
                 method = "method.NNLS", id = NULL, verbose = FALSE,
                 control = list(), cvControl = list(), obsWeights = NULL,
                 env = parent.frame())
})

The SuperLearner package supports a number of other parallelization schemes, although these weren't tested here.

`sl3` with Legacy `SuperLearner` Wrappers

To maximize comparability with the legacy implementation, we can use sl3 with the SuperLearner wrappers, so that the actual computation used to train the learners is identical:

sl_glmnet <- Lrnr_pkg_SuperLearner$new("SL.glmnet")
sl_random_forest <- Lrnr_pkg_SuperLearner$new("SL.randomForest")
sl_speedglm <- Lrnr_pkg_SuperLearner$new("SL.speedglm")
nnls_lrnr <- Lrnr_nnls$new()

sl3_legacy <- Lrnr_sl$new(list(sl_random_forest, sl_glmnet, sl_speedglm),
                          nnls_lrnr)

`sl3` with Native Learners

We can also use native sl3 learners, which have been rewritten to be performant on large sample sizes:

lrnr_glmnet <- Lrnr_glmnet$new()
random_forest <- Lrnr_randomForest$new()
glm_fast <- Lrnr_glm_fast$new()
nnls_lrnr <- Lrnr_nnls$new()

sl3_native <- Lrnr_sl$new(list(random_forest, lrnr_glmnet, glm_fast), nnls_lrnr)

`sl3` Parallelization Options

sl3 uses the delayed package to parallelize training tasks. Delayed, in turn, uses the future package to support a range of parallel back-ends. We test several of these, for both the legacy wrappers and native learners.

First, sequential evaluation (no parallelization):

plan(sequential)
test <- delayed_learner_train(sl3_legacy, task)
time_sl3_legacy_sequential <- system.time({
  sched <- Scheduler$new(test, SequentialJob)
  cv_fit <- sched$compute()
})

test <- delayed_learner_train(sl3_native, task)
time_sl3_native_sequential <- system.time({
  sched <- Scheduler$new(test, SequentialJob)
  cv_fit <- sched$compute()
})

Next, multicore parallelization:

plan(multicore, workers = cpus_physical)
test <- delayed_learner_train(sl3_legacy, task)
time_sl3_legacy_multicore <- system.time({
  sched <- Scheduler$new(test, FutureJob, nworkers = cpus_physical,
                         verbose = FALSE)
  cv_fit <- sched$compute()
})

test <- delayed_learner_train(sl3_native, task)
time_sl3_native_multicore <- system.time({
  sched <- Scheduler$new(test, FutureJob, nworkers = cpus_physical,
                         verbose = FALSE)
  cv_fit <- sched$compute()
})

We also test multicore parallelization with hyper-threading -- we use a number of workers equal to the number of logical, not physical, cores:

plan(multicore, workers = cpus_logical)
test <- delayed_learner_train(sl3_legacy, task)
time_sl3_legacy_multicore_ht <- system.time({
  sched <- Scheduler$new(test, FutureJob, nworkers = cpus_logical,
                         verbose = FALSE)
  cv_fit <- sched$compute()
})

test <- delayed_learner_train(sl3_native, task)
time_sl3_native_multicore_ht <- system.time({
  sched <- Scheduler$new(test, FutureJob, nworkers = cpus_logical,
                         verbose = FALSE)
  cv_fit <- sched$compute()
})

Finally, we test parallelization using multisession:

plan(multisession, workers = cpus_physical)
test <- delayed_learner_train(sl3_legacy, task)
time_sl3_legacy_multisession <- system.time({
  sched <- Scheduler$new(test, FutureJob, nworkers = cpus_physical,
                         verbose = FALSE)
  cv_fit <- sched$compute()
})

test <- delayed_learner_train(sl3_native, task)
time_sl3_native_multisession <- system.time({
  sched <- Scheduler$new(test, FutureJob, nworkers = cpus_physical,
                         verbose = FALSE)
  cv_fit <- sched$compute()
})

Results

results <- rbind(time_sl3_legacy_sequential, time_sl3_legacy_multicore,
                 time_sl3_legacy_multicore_ht, time_sl3_legacy_multisession,
                 time_sl3_native_sequential, time_sl3_native_multicore,
                 time_sl3_native_multicore_ht, time_sl3_native_multisession,
                 time_SuperLearner_sequential, time_SuperLearner_multicore
                )
test <- rownames(results)

results <- as.data.table(results)

invisible(results[, test := gsub("time_", "", test)])
invisible(results[, native := str_detect(test, "native")])
invisible(results[, parallel := !str_detect(test, "sequential")])
results <- results[order(results$elapsed)]
invisible(results[, test := factor(test, levels = test)])
breaks = 2^(-20:20)
ggplot(results, aes(y = test, x = elapsed, color = factor(native),
                    shape = factor(parallel))) + geom_point(size = 3) +
  xlab("Time (seconds) -- log scale") + ylab("Test") + theme_bw() +
  scale_shape_discrete("Parallel Computation") +
  scale_color_discrete("Native Learners") +
  scale_x_continuous(trans = log2_trans(), breaks = breaks) +
  annotation_logticks(base = 2, sides = "b")
save(results, file = "benchmark_results.rdata")

We can see that using the native learners results in about a 4x speedup relative to the legacy wrappers. This can be at least partially explained by the fact that legacy SL.randomForest wrapper uses randomForest.formula for continuous data, which resorts to using the model.matrix function, known to be slow on large datasets. Improvements to the legacy wrappers would probably reduce or eliminate this difference.

We can also see that multicore parallelization for the legacy SuperLearner function results in another 4x speedup on this system. Relative to that, the sl3_legacy_multicore test results in almost an additional 2x speedup. This can be explained by the use of delayed parallelization. While mcSuperLearner parallelizes simply across the $V$ cross-validation folds, delayed allows sl3 to parallelize across all training tasks that comprise the SuperLearner, which is a total of $(V+1)*n_{learners}$ training tasks, $n_{learners}$ is the number of learners in the library (here 4), and $(V+1)$ is one more than the number of cross-validation folds, accounting for the re-fit to the full data typically implemented in the SuperLearner algorithm. We don't see a substantial difference between the three parallelization schems for sl3.

These effects appear multiplicative, resulting in the fastest implementation, sl3_improved_multicore_ht (sl3 with native learners and hyper-threaded multicore parallelization), being about 32x faster than the slowest, SuperLearner_sequential (Legacy SuperLearner without parallelization). This is a dramatic improvement in the time required to run this SuperLearner.

Session Information

sessionInfo()

jeremyrcoyle/sl3 documentation built on Nov. 18, 2024, 4:21 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jeremyrcoyle/sl3
Pipelines for Machine Learning and Super Learning

In jeremyrcoyle/sl3: Pipelines for Machine Learning and Super Learning

Introduction

Test Setup

Test System

Test Data

Test Descriptions

Legacy `SuperLearner`

`sl3` with Legacy `SuperLearner` Wrappers

`sl3` with Native Learners

`sl3` Parallelization Options

Results

Session Information

R Package Documentation

Browse R Packages

We want your feedback!

jeremyrcoyle/sl3 Pipelines for Machine Learning and Super Learning

In jeremyrcoyle/sl3: Pipelines for Machine Learning and Super Learning

Introduction

Test Setup

Test System

Test Data

Test Descriptions

Legacy SuperLearner

sl3 with Legacy SuperLearner Wrappers

sl3 with Native Learners

sl3 Parallelization Options

Results

Session Information

R Package Documentation

Browse R Packages

We want your feedback!

jeremyrcoyle/sl3
Pipelines for Machine Learning and Super Learning

Legacy `SuperLearner`

`sl3` with Legacy `SuperLearner` Wrappers

`sl3` with Native Learners

`sl3` Parallelization Options