Exercise 1

In the notes we generated a numeric data set. Here we'll repeat the timings, but with characters. First generate a simple data set.

x = as.character(runif(1e6))
x[sample(1e6, 1e5)] = NA # 10% NAs
dd =  as.data.frame(replicate(10, x))

Hint: It's worth reading the RStudio blog from time to time.

Exercise 2 (bonus)

You can also use environments to speed up passing data (Bioconductor does this). Unfortunately, we don't have time to cover environments in this course - see our Advanced R programming

To create an enviroment we use the new.env() function

e = new.env()

and then place a matrix into that enviroment

x = matrix(runif(10000), ncol = 10)
e$x = x

Next we'll create two functions to benchmark

f = function(x) {
  x[1, 1] = 1
  x
}
g = function(e) e$x[1, 1] = 1

Then time as before

system.time({
  for (i in 1:100000) {
    x = f(x)
  }
})

system.time({
  for (i in 1:100000) {
    g(e)
  }
})

Repeat the above task but change the matrix to a data frame.

The reason that using an environment is more efficient is because we pass by reference (again we don't cover that today)



jr-packages/jrBig documentation built on Jan. 1, 2020, 2:02 p.m.