Exercise 1

In the notes we generated a numeric data set. Here we'll repeat the timings, but with characters. First generate a simple data sets.

x = as.character(runif(1e6))
x[sample(1e6, 1e5)] = NA # 10% NAs
dd =  as.data.frame(replicate(10, x))

then time writing/reading the data frame to a file.

Exercise 2

Instead of passing a data frame to a function. Try just passing a very large vector. Is there still a speed-up?

f = function(x) {
  x[1] = 0; x
}
g = function(e) e$x[1] = 0 
x = rnorm(1e5)
e = new.env(); e$x = x
system.time(replicate(1000, f(x)))
system.time(replicate(1000, g(e)))

Hint: It's worth reading the RStudio blog from time to time.



jr-packages/jrBig documentation built on Jan. 1, 2020, 2:02 p.m.