View source: R/estimate_runtime.R
estimate_runtime | R Documentation |
Estimate runtime of fitting a computationally intensive model to a big dataset prior to the run itself, which, in some cases, may be measured in hours or days. The runtime is estimated by extrapolation from a best-fitting model (power, exponential or linear) fitted to a sample of runtimes in a small range.
estimate_runtime(code, subset_sizes, full_size)
code |
String: code executing the model in one line. Should specify execution of an iterable subset of the full dataset. Usually, this is done by setting the data argument inside the model's function to, for example, |
subset_sizes |
Numeric vector: a range of subsets of the full dataset that have manageable running times (e.g. from several seconds to several minutes) that extends as far as practical into the full dataset. May require some trial-and-error to determine an optimal trade-off between the time it takes to produce an estimate and the accuracy of the estimate. As we would commonly want to estimate long runtimes fairly quickly, the accuracy won't be great, but the estimate would still be useful as a ballpark indicator. |
full_size |
Numeric value: full size of the dataset, i.e. |
Annotated ggplot2
graph showing estimated runtime over the full dataset's size.
## Not run:
library(data.table)
library(randomForest)
n = 1e6
DT <- data.table(OUTCOME = sample(c(0L,1L), n, replace = T) |> as.factor(),
FEATURE1 = sample(LETTERS[1:4], n, replace = T),
FEATURE2 = sample(LETTERS[5:8], n, replace = T),
FEATURE3 = sample(LETTERS[9:15], n, replace = T),
FEATURE4 = sample(LETTERS[1:10], n, replace = T),
FEATURE5 = runif(n, 1, 10) |> round(2),
FEATURE6 = runif(n, 20, 40) |> round(2),
FEATURE7 = rnorm(n, 50, 20) |> round(2),
FEATURE8 = rnorm(n, 100, 40) |> round(2))
estimate_runtime(
code = "randomForest(OUTCOME ~ ., data = DT[sample(.N, i)])",
subset_sizes = c(2500,5000,10000,15000,25000,50000),
full_size = n
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.