completed = read.csv("extdata/typeform.csv") setnicepar = function(mar=c(3,3,2,1), mgp=c(2,0.4,0), tck=-.01, cex.axis=0.9, las=1, mfrow=c(1,1),...) { par(mar=mar, mgp=mgp, tck=tck, cex.axis=cex.axis, las=las,mfrow=mfrow,...) }
Contains copies of slides and exercises
install.packages("drat") drat::addRepo("jr-packages") install.packages("efficientTutorial")
Repo at https://github.com/jr-packages/efficientTutorial
Slides at https://jumpingrivers.com/t/2018-efficient-erum
Dr Colin Gillespie
Senior Statistics Lecturer, Newcastle University
Consultant at Jumping Rivers
setnicepar() tab = table(completed$list_iVD4_choice) barplot(tab, col="steelblue")
We also have a Transitionning Physicist in the room
setnicepar(mfrow=c(1, 2)) r_fun = factor(as.numeric(completed$opinionscale_nW40), levels=1:10) barplot(table(r_fun), col="steelblue", main = "Functions", ylim=c(0, 40)) r_for = factor(as.numeric(completed$opinionscale_LMTT), levels=1:10) barplot(table(r_for), col="steelblue", main = "Loops", ylim=c(0, 40))
Around 40% of people have built a package
Most people haven't used C/C++
Small subset of efficient R programming course
Based https://github.com/csgillespie/efficientR
## Slides browseVignettes("efficientTutorial")
The goal is to give a flavour of the topics
The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.
Donald Knuth
system.time()
microbenchmark()
system.time()
Easy to use
```r
x
system.time(x <- rnorm(1000000)^2.3) ```
Hard to compare multiple benchmarks
microbenchmark()
microbenchmark()
makes it easy to compare multiple functionsd_m
) vs data frame (d_df
)library("microbenchmark") (res = microbenchmark(times = 1000, unit = "ms", # milliseconds d_m[1,], d_df[1,], d_m[,1], d_df[,1])) #Unit: milliseconds # expr min lq mean median uq max neval cld # d_m[1, ] 0.004 0.008 0.014 0.014 0.0204 0.049 1000 a # d_df[1, ] 4.722 5.067 5.681 5.333 5.6767 109.383 1000 b # d_m[, 1] 0.006 0.006 0.007 0.007 0.0081 0.024 1000 a # d_df[, 1] 0.006 0.008 0.012 0.012 0.0153 0.0558 1000 a
d_m = matrix(1:10000, ncol=100) d_df = as.data.frame(d_m) colnames(d_df) = paste0("c", 1:ncol(d_df)) res = microbenchmark::microbenchmark(times = 1000, unit = "ms", # milliseconds d_m[1,], d_df[1,], d_m[,1], d_df[,1]) saveRDS(res, "extdata/data_matrix.rds")
plot(res, log="y")
res = readRDS("extdata/data_matrix.rds") setnicepar() plot(res, log="y", colour="steelblue") grid()
All data in a matrix must be the same type
Less overhead than a data frame, so faster
Never ask on Stackoverflow which method is faster!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.