knitr::opts_chunk$set( comment = "#>", collapse = TRUE, cache = TRUE, fig.align="center", fig.pos="t" )
Consider the following benchmark to evaluate different functions for calculating the cumulative sum of the whole numbers from 1 to 100:
x = 1:100 # initiate vector to cumulatively sum # Method 1: with a for loop (10 lines) cs_for = function(x){ for(i in x){ if(i == 1){ xc = x[i] } else { xc = c(xc, sum(x[1:i])) } } xc } # Method 2: with apply (3 lines) cs_apply = function(x){ sapply(x, function(x) sum(1:x)) } # Method 3: cumsum(x)
Which method is fastest and how many times faster is it?
cumsum
function is the fastest; around 300 times.Run the same benchmark, but with the results reported in seconds, on a vector of all the whole numbers from 1 to 50,000. Hint: also use the argument neval = 1
so that each command is only run once to ensure the results complete (even with a single evaluation the benchmark may take up to or more than a minute to complete, depending on your system). Does the relative time difference increase or decrease? By how much?
```r library("microbenchmark") x = 1:5e4 # initiate vector to cumulatively sum microbenchmark(cs_for(x), cs_apply(x), cumsum(x), times = 1, unit = "s")
```
The relative times increase, by an three orders of magnitude between the fastest and slowest method. When x = 1:100
, cumsum(x)
is around 250 times faster than cs_for(x)
. In the above results, the relative difference is around $17 / 0.0001$.
Test how long the different methods for subsetting the data frame df
, presented Chapter 1, take on your computer. Is it faster or slower at subsetting than the computer on which this book was compiled?
```r library("microbenchmark") df = data.frame(v = 1:4, name = letters[1:4]) microbenchmark(df[3, 2], df[3, "name"], df$name[3])
```
Use system.time()
and a for()
loop to test how long it takes to perform the subsetting operation 50,000 times. Before testing this, do you think it will be more or less than 1 second, for each subsetting method? Hint: the test for the first method is shown below:
```r df = data.frame(v = 1:4, name = letters[1:4])
system.time( for(i in 1:50000){ df[3, 2] } ) system.time( for(i in 1:50000){ df[3, "name"] } )
system.time( for(i in 1:50000){ df$name[3] } )
```
Note that unlike with the cumsum(x)
example, the relative times of the different methods does not change here: they are simply replicated many times.
Bonus exercise: try profiling a section of code you have written using profvis. Where are the bottlenecks? Were they where you expected?
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.