knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
IS_EVAL = FALSE

This vignette explain about the steps that is taken to do benchmark for geospatial workfflow across different combination of R implementation and operating system and platform using altRnative R package.

To load the package, run this command:

library(altRnative)

Since this package is using docker image, there is a function to pull the needed docker images:

pull_docker_image(c('gnu-r', 'mro'), c('debian', 'ubuntu', 'fedora'))

You can also specify the pulling in the benchmark_code function by adding pull_image parameter and set it to TRUE. The default value is FALSE since mostly you only need to run it once.

First, we will check that the package is working properly by running very simple code (e.g. 1+1).

simple_benchmark_result = benchmarks_code(
  code = "1 + 1", 
  r_implementations = c('gnu-r', 'mro'), 
  platforms = c('debian', 'ubuntu', 'fedora', 'archlinux'),
  times = 10
  )

The result is a micorbenchmark object that can be printed or create the graph (using ggplot2)

print(simple_benchmark_result)
library('ggplot2')
autoplot(simple_benchmark_result)
boxplot(simple_benchmark_result, log = FALSE)

Or can be saved on the disk:

  filename = paste0('./plots/baseline.png')
  print(filename)
  png(filename=filename, width = 800, height = 400)
  benchmark_boxplot(simple_benchmark_result, main = 'Baseline Benchmark')
  dev.off()

Next, we use Spatial Data Science with R book from Prof. Edzer Pebesma as the code for geospatial benchmark. I have modified the book to remove the part that using stars data and Air Quality data since both are very big. The modified book can be obtained from my fork on Github. You can eithe download or clone using git to your local machine.

```{bash clone_book, eval=FALSE} cd /home/ismailsunni/dev/r git clone git@github.com:ismailsunni/sdsr.git

After that you can open an R session and run the benchmark. First you need to define the R code/script that will be benchmarked. In this case, you need to set the working directory to mapped directory inside the container (e.g. `/home/docker/sdsr`). Cleaning the cache and removing old result is needed also since it will disturb the benchmarking process. Lastly, `render_book` will run all the code in the book and render it.
```{R sdsr_code, eval=IS_EVAL}
code = expression(
  setwd('/home/docker/sdsr'), 
  bookdown::clean_book(TRUE), 
  unlink('_book/', recursive=TRUE), 
  unlink('_bookdown_files', recursive=TRUE), 
  bookdown::render_book('index.Rmd', 'bookdown::gitbook')
  )

In this step, you will run the benchmark using the previous code as input. Please note that the volumes in the benchmark_code function is mapped from your sdsr directory. In my case, it is /home/ismailsunni/dev/r/sdsr. Other arguments are platforms and r_implementations to set the platform and R implementations of the docker containers. Lastly, times argument is used to specify how many times the code needs to be run in the benchmark process. ```{R sdsr_benchmark, eval=IS_EVAL} sdsr_result = benchmarks_code( code = code, platforms = c('debian', 'ubuntu', 'fedora', 'archlinux'), r_implementations = c('gnu-r', 'mro'), volumes = '/home/ismailsunni/dev/r/sdsr:/home/docker/sdsr', times = 3 )

From here, you can see the result or plot the graph like the previous result.
```r
print(sdsr_result)
autoplot(sdsr_result)

Running Benchmark Per Chapter

We have added a script to extract the R code for all chapter in the SDSR book in this fork. If you updated the book, you can run Rscript extract_sdsr_code.R. Here we will show how to run benchmark for each chapter of SDSR.

# Commented entries do not have any code.
chapters = c(
    '01-hello.R',
    '02-Spaces.R',
    '03-Geometries.R',
    '04-Raster-Cube.R',
    '05-GeomManipulations.R',
    '06-Attributes.R',
    '07-ReferenceSystems.R',
    '08-Plotting.R',
    '09-BasePlot.R',
    '10-Ggplot2.R',
    # Files below do not have files
    # '11-Interactive.R',
    # '12-SummarizingGeoms.R',
    # '13-PointPattern.R',
    # '14-Manipulating.R',
    # '15-UpDownscaling.R',
    # '16-Geostatistics.R',
    # '17-Areal.R',
    # '18-SpatialRegression.R',
    # '19-Movement.R',
    # '20-STModelling.R',
    # '30-sp-raster.R',
    '98-rbascis.R'
)

benchmark_result <- c()

for (chapter in chapters){
    code = substitute(
      expression(
        setwd('/home/docker/sdsr/code'), 
        source(chapter)), 
      list(chapter = chapter))
    print(code)
    chapter_result = benchmarks_code(
      code = code,
      platforms = c('debian', 'ubuntu', 'fedora', 'archlinux'),
      r_implementations = c('gnu-r', 'mro'),
      volumes = '/home/ismailsunni/dev/r/sdsr:/home/docker/sdsr',
      times = 10
    )
    # append
    benchmark_result[[chapter]] = chapter_result
}

And save the box plot in the disk.

# Save plot as image
for (chapter in chapters){
  chapter_result = benchmark_result[chapter]
  filename = paste0('./plots/', substr(chapter, 0, nchar(chapter) - 1), 'png')
  print(filename)
  png(filename=filename, width = 800, height = 400)
  benchmark_boxplot(chapter_result[[1]], main = chapter)
  dev.off()
}

Normalize the result to get the duration ratio.

for (chapter in chapters){
  filename = paste0('./plots/ratio_', substr(chapter, 0, nchar(chapter) - 1), 'png')
  print(filename)
  png(filename=filename, width = 800, height = 400)
  normalize_benchmark_boxplot(benchmark_result[chapter][[1]], baseline_image = 'gnu-r on debian', main = paste0('Duration Ratio for ', chapter))
  dev.off()
}

For running all the code in one run you can run the sdsr.R file only.

    sdsr_code = expression(setwd('/home/docker/sdsr/code'), source('sdsr.R'))
    print(sdsr_code)
    sdsr_code_benchmark_result = benchmarks_code(
      code = sdsr_code,
      platforms = c('debian', 'ubuntu', 'fedora', 'archlinux'),
      r_implementations = c('gnu-r', 'mro'),
      volumes = '/home/ismailsunni/dev/r/sdsr:/home/docker/sdsr',
      times = 10
    )

And save the box plot in the disk.

  filename = paste0('./plots/sdsr.png')
  print(filename)
  png(filename=filename, width = 800, height = 400)
  benchmark_boxplot(sdsr_code_benchmark_result, main = 'All SDSR Code ')
  dev.off()

Normalize the result to get the duration ratio.

  filename = paste0('./plots/ratio_sdsr.png')
  png(filename=filename, width = 800, height = 400)
  normalize_benchmark_boxplot(
    sdsr_code_benchmark_result, 
    baseline_image = 'gnu-r on debian', 
    main = 'Duration Ratio for All SDSR Code'
    )
  dev.off()


ismailsunni/altRnative documentation built on April 1, 2020, 2:22 a.m.