completed = read.csv("extdata/typeform.csv") setnicepar = function(mar=c(3, 3, 2, 1), mgp=c(2, 0.4, 0), tck=-.01, cex.axis=0.9, las=1, mfrow=c(1, 1), ...) { par(mar = mar, mgp = mgp, tck = tck, cex.axis = cex.axis, las = las, mfrow = mfrow, ...) }
Contains copies of slides and exercises
install.packages("drat") drat::addRepo("jr-packages") install.packages("efficientTutorial")
Repo at https://github.com/jr-packages/efficientTutorial
Slides at https://github.com/jumpingrivers/t/2018-efficient-erum
Dr Colin Gillespie
Senior Statistics Lecturer, Newcastle University
Consultant at Jumping Rivers
setnicepar() tab = table(completed$list_iVD4_choice) barplot(tab, col = "steelblue")
We also have a Transitionning Physicist in the room
setnicepar(mfrow = c(1, 2)) r_fun = factor(as.numeric(completed$opinionscale_nW40), levels = 1:10) barplot(table(r_fun), col = "steelblue", main = "Functions", ylim = c(0, 40)) r_for = factor(as.numeric(completed$opinionscale_LMTT), levels = 1:10) barplot(table(r_for), col = "steelblue", main = "Loops", ylim = c(0, 40))
Around 40% of people have built a package
Most people haven't used C/C++
Small subset of efficient R programming course
Based https://github.com/csgillespie/efficientR
## Slides browseVignettes("efficientTutorial")
The goal is to give a flavour of the topics
The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.
Donald Knuth
system.time()
microbenchmark()
system.time()
Easy to use
```r
x
system.time(x <- rnorm(1000000)^2.3) ```
Hard to compare multiple benchmarks
microbenchmark()
microbenchmark()
makes it easy to compare multiple functionsd_m
) vs data frame (d_df
)library("microbenchmark") (res = microbenchmark(times = 1000, unit = "ms", # milliseconds d_m[1, ], d_df[1, ], d_m[, 1], d_df[, 1])) #Unit: milliseconds # expr min lq mean median uq max neval cld # d_m[1, ] 0.004 0.008 0.014 0.014 0.0204 0.049 1000 a # d_df[1, ] 4.722 5.067 5.681 5.333 5.6767 109.383 1000 b # d_m[, 1] 0.006 0.006 0.007 0.007 0.0081 0.024 1000 a # d_df[, 1] 0.006 0.008 0.012 0.012 0.0153 0.0558 1000 a
d_m = matrix(1:10000, ncol = 100) d_df = as.data.frame(d_m) colnames(d_df) = paste0("c", seq_along(d_df)) res = microbenchmark::microbenchmark(times = 1000, unit = "ms", # milliseconds d_m[1, ], d_df[1, ], d_m[, 1], d_df[, 1]) saveRDS(res, "extdata/data_matrix.rds")
plot(res, log = "y")
res = readRDS("extdata/data_matrix.rds") setnicepar() plot(res, log = "y", colour = "steelblue") grid()
All data in a matrix must be the same type
Less overhead than a data frame, so faster
Never ask on Stackoverflow which method is faster!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.