In akrmenec/efficient-R: Becoming an Efficient R Programmer

knitr::opts_chunk$set(
  comment = "#>",
  collapse = TRUE,
  cache = TRUE, 
  fig.align="center",
  fig.pos="t"
)
library(microbenchmark)

Chapter 7 Solutions

Base R timings

Exercises {-}

Create a vector x. Benchmark any(is.na(x)) against anyNA(). Do the results vary with the size of the vector.

```r library(microbenchmark) N = 10000 x = rnorm(N)

No NAs - results are similar

microbenchmark(any(is.na(x)), anyNA(x), times = 1000)

Some NAs - big difference

x[sample(1:N, N/10)] = NA microbenchmark(any(is.na(x)), anyNA(x), times = 1000)

1. Examine the following function definitions to give you an idea of how integers are used.
  * `tail.matrix()`
    ```r
    ## Default value of n is an integer
    args(tail.matrix)
    ```
    * `lm()`. 
    ```r
    # getFunction("lm")
    # Function uses integer in the match calls
    # mf <- mf[c(1L, m)]
    ```

1. Construct a matrix of integers and a matrix of numerics. Using `pryr::object_size()`, compare the 
  objects.

    ```r
    m_int = matrix(sample(1:10000), ncol=10)
    m_num = as.numeric(m_int)

    ## Approximately 2
    pryr::object_size(m_num)/pryr::object_size(m_int)
    ```


1. How does the function `seq.int()`, which was used in the `tail.matrix()` function, 
  differ to the standard `seq()` function?

    ```r
    # seq is is an S3 generic, i.e. it looks at the class
    #of the first argument, then calls another function
    getFunction("seq")
    ```

### Profvis Monopoly

#### Exercise {-}

The `move_square()` function above uses a vectorised solution. Whenever we move, we
always roll six dice, then examine the outcome and determine the number of doubles.
However, this is potentially wasteful, since the probability of getting one double is
$1/6$ and two doubles is $1/36$. Another method is too only roll additional dice if
and when they are needed. Implement and time this solution.

```r
## Original function
profvis({
  for(i in 1:10000) {
    current = 0
    df = data.frame(d1 = sample(1:6, 3, replace=TRUE), 
                    d2 = sample(1:6, 3, replace=TRUE))

    df$Total = apply(df, 1, sum)
    df$IsDouble = df$d1 == df$d2

    if(df$IsDouble[1] & df$IsDouble[2] & df$IsDouble[3]) {
      current = 11#Go To Jail
    } else if(df$IsDouble[1] & df$IsDouble[2]) {
      current = current + sum(df$Total[1:3])
    } else if(df$IsDouble[1]) {
      current = current + sum(df$Total[1:2])
    } else {
      current = current + df$Total[1]
    }
  }
}, interval = 0.005)

## With improvements
profvis({
  for(i in 1:10000) {
    current  =0 
    dd = matrix(sample(1:6, 6, replace=TRUE), ncol=2)
    Total = rowSums(dd)         
    IsDouble = dd[,1] == dd[,2]
    if(IsDouble[1] && IsDouble[2] && IsDouble[3]) {
      current = 11#Go To Jail
    } else if(IsDouble[1] && IsDouble[2]) {
      current = current + sum(Total[1:3])
    } else if(IsDouble[1]) {
      current = current + Total[1:2]
    } else {
      current = Total[1]
    }
    current
  }

}, interval = 0.005)

## Abandoning the vectorised approach
profvis({
  for(i in 1:10000) {
    die1 = sample(1:6, 2, replace=TRUE)
    current = sum(die1)
    if(die1[1] == die1[2]) {
      die2 = sample(1:6, 2, replace=TRUE)
      current = current + sum(die2)
      if(die2[1] == die2[2]){
        die3 = sample(1:6, 2, replace=TRUE)
        if(die3[1] == die3[2]){
          current = 11
        } else {
          current = current  + sum(die3)
        } 
      } 
    }
  }
}, interval = 0.005)

library(Rcpp)
cppFunction('
double test1() {
  double a = 1.0 / 81;
  double b = 0;
  for (int i = 0; i < 729; i++)
    b = b + a;
  return b;
}'
)

cppFunction('
float test2() {
  float a = 1.0 / 81;
  float b = 0;
  for (int i = 0; i < 729; i++)
    b = b + a;
  return b;
}'
)

Exercises {-}

Consider the following piece of code

```{Rcpp eval=FALSE, tidy=FALSE} double test1() { double a = 1.0 / 81; double b = 0; for (int i = 0; i < 729; i++) b = b + a; return b; }

1. Save the function `test1()` in a separate file. Make sure it works.
    ```r
    test1()
    ```
2. Write a similar function in R and compare the speed of the C++ and R versions.

    ```r
    test1_r = function() {
      a = 1/81
      b = 0
      for(i in 0:728) 
        b = b + a
      return(b)
    }
    test1_r() - test1()
    ```
  3. Create a function called `test2()` where the `double` variables have been replaced by `float`. Do you still get the correct answer?
    ```r
    ## No!  
    test1_r() - test2()
    ```

4. Change `b = b + a` to `b += a` to make your code more C++ like. 
    ```{Rcpp eval=FALSE, tidy=FALSE}
    for (int i = 0; i < 729; i++)
      b = b + a;
    ```
5. (Hard) What's the difference between `i++` and `++i`?

    * See this [Stackoverflow](https://stackoverflow.com/questions/24853/what-is-the-difference-between-i-and-i)
  answer for a good overview

#### Exercises {-}

1. Construct an R version (using a `for` loop rather than the vectorised solution),
`res_r()` and compare the three function variants. 
```r
sq_diff_r = function(x, y) (x - y)^2
cppFunction('
NumericVector res_c(NumericVector x, NumericVector y) {
  int i;
  int n = x.size();
  NumericVector residuals(n);
  for(i=0; i < n; i++) {
    residuals[i] = pow(x[i] - y[i], 2);
  }
  return residuals;
}
')
cppFunction('
NumericVector res_sugar(NumericVector x, NumericVector y) {
  return pow(x-y, 2);
}
')

```r
x = rnorm(1000000)
y = rnorm(1000000)
microbenchmark(sq_diff_r(x, y), res_c(x, y), res_sugar(x, y), times=1000)
```

In the above example, res_sugar() is faster than res_c(). Do you know why?
- The res_sugar() is clever particularly clever and avoids creating a vector.

akrmenec/efficient-R documentation built on May 28, 2019, 4:53 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com