In the notes we generated a numeric data set. Here we'll repeat the timings, but with characters. First generate a simple data set.
x = as.character(runif(1e6)) x[sample(1e6, 1e5)] = NA # 10% NAs dd = as.data.frame(replicate(10, x))
Hint: It's worth reading the RStudio blog from time to time.
You can also use environments to speed up passing data (Bioconductor does this). Unfortunately, we don't have time to cover environments in this course - see our Advanced R programming
To create an enviroment we use the new.env()
function
e = new.env()
and then place a matrix into that enviroment
x = matrix(runif(10000), ncol = 10) e$x = x
Next we'll create two functions to benchmark
f = function(x) { x[1, 1] = 1 x } g = function(e) e$x[1, 1] = 1
Then time as before
system.time({ for (i in 1:100000) { x = f(x) } }) system.time({ for (i in 1:100000) { g(e) } })
Repeat the above task but change the matrix to a data frame.
The reason that using an environment is more efficient is because we pass by reference (again we don't cover that today)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.