Home

/

GitHub

/

jr-packages/efficientTutorial

/

In jr-packages/efficientTutorial: Efficient R Programming

Common pitfalls

The benefit of using R is that coding time is greatly reduced.
But it's very easy to write programs that are incredibly slow.

Object growth

Object growth: method 1

Method 1 creates an empty vector, and grows the object:

n = 100000
myvec = NULL; myvec = c()
for (i in 1:n)
    myvec = c(myvec, i)

Object growth: method 2

Create an object of the final length and then changes the values in the object by subscripting:

myvec = numeric(n)
for (i in 1:n)
    myvec[i] = i

Object growth: method 3

Directly creates the final object:

myvec = 1:n

Object growth: timings

$n$ | 1 | 2 | 3 ----|---|---|---- $10^5$ | 0.208 | 0.024 | 0.000 $10^6$ | 25.50 | 0.220 | 0.000 $10^7$ | 3827.0 | 2.212 | 0.000

Object growth: timings

local(source("code/01-vector_growth.R", local = TRUE))

Object growth

Object growth can be quite insidious since it is easy to hide growing objects in your code. For example:

n = 2
hit = NULL
for (i in 1:n) {
    if (runif(1) < 0.3) 
        hit[i]  = TRUE
    else
        hit[i]  = FALSE
}

Exercise Rewrite the above code to avoid object growth

Avoid rbind too!

A more common - and possibly more dangerous - problem is with rbind:

df1 = data.frame(a = character(0), b = numeric(0))
for (i in 1:n)
    df1 = rbind(df1, 
            data.frame(a = sample(letters, 1), b = runif(1)))

Probably the main reason this is more common is because it is more likely that each iteration will have a different number of observations.
However, a reasonable upper bound on the size of the final object is often known.
- So pre-allocate!

Rule 1

Vectorise

When writing code in R, you need to remember that you are using R and not C (or even F77!). For example,

x = runif(1000) + 1
logsum = 0
for (i in seq_along(x))
    logsum = logsum + log(x[i])

This is a piece R code that has a strong, unhealthy influence from C.

Vectorise

Instead, we should write

logsum = sum(log(x))

x = runif(2)

It's faster. When $n = 10^7$ the ``R way'' is about forty times faster.
It's neater.
It doesn't contain a bug when \cc{x} is of length $0$.

Vectorise

Another common example is subsetting a vector. When writing in C, we would have something like:

x = rnorm(10)
ans = NULL
for (i in seq_along(x)) {
    if (x[i] < 0) 
        ans = c(ans, x[i])
}

Exercise: Rewrite the above code in a vectorised format

Example: Monte-Carlo integration

It's also important to make full use of R functions that use vectors. For example, suppose we wish to estimate [ \int_0^1 x^2 dx ] using a basic Monte-Carlo method.

Example: Monte-Carlo integration

local(source("code/01-monte_carlo.R", local = TRUE))

Monte Carlo Integration

Initialise: hits = 0
for i in 1:N
- Generate two random numbers, $U_1, U_2$, between 0 and 1
- If $U_2 < U_1^2$, then hits = hits + 1
end for
Area estimate = hits/N.

Example: Monte-Carlo integration

N = 500000
f = function(N) {
    hits = 0
    for (i in 1:N)  {
        u1 = runif(1); u2 = runif(1)
        if (u1^2 > u2)
            hits = hits + 1
    }
    return(hits / N)
}

Which in R takes a few seconds:

system.time(f(N))

Example: Monte-Carlo integration

N = 500000
f = function(N) {
    hits = 0
    for (i in 1:N)  {
        u1 = runif(1); u2 = runif(1)
        if (u1^2 > u2)
            hits = hits + 1
    }
    return(hits / N)
}

Exercise: Can you vectorise the above code?

If you can't vectorise

Put any object creation outside the loop. For example

jitter = function(x, k) rnorm(1, x, k)
parts = rnorm(10)
post = numeric(length(parts))

for (i in seq_along(parts)) {
    k = 1.06 * sd(parts) / length(parts)
    post[i] = jitter(parts[i], k)
}

If you can't vectorise

Can be rewritten as

k = 1.06 * sd(parts) / length(parts)
for (i in seq_along(parts))
    post[i] = jitter(parts[i], k)

Exercises 1 & 2

vignette("common", package = "efficientTutorial")

jr-packages/efficientTutorial documentation built on Feb. 16, 2020, 7:05 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jr-packages/efficientTutorial
Efficient R Programming

In jr-packages/efficientTutorial: Efficient R Programming

Common pitfalls

Object growth

Object growth: method 1

Object growth: method 2

Object growth: method 3

Object growth: timings

Object growth: timings

Object growth

Avoid rbind too!

Rule 1

Vectorise

Vectorise

Vectorise

Vectorise

Example: Monte-Carlo integration

Example: Monte-Carlo integration

Monte Carlo Integration

Example: Monte-Carlo integration

Example: Monte-Carlo integration

If you can't vectorise

If you can't vectorise

Exercises 1 & 2

R Package Documentation

Browse R Packages

We want your feedback!

jr-packages/efficientTutorial Efficient R Programming

In jr-packages/efficientTutorial: Efficient R Programming

Common pitfalls

Object growth

Object growth: method 1

Object growth: method 2

Object growth: method 3

Object growth: timings

Object growth: timings

Object growth

Avoid rbind too!

Rule 1

Vectorise

Vectorise

Vectorise

Vectorise

Example: Monte-Carlo integration

Example: Monte-Carlo integration

Monte Carlo Integration

Example: Monte-Carlo integration

Example: Monte-Carlo integration

If you can't vectorise

If you can't vectorise

Exercises 1 & 2

R Package Documentation

Browse R Packages

We want your feedback!

jr-packages/efficientTutorial
Efficient R Programming