In pokyah/agrometeoR: Spatial interpolation for Agromet project @ CRA-W

library(devtools)
library(dplyr)
library(learnr)
library(rgdal)
library(raster)
library(sp)
library(sf)
library(rgeos)
library(fontawesome)
library(leaflet)
library(leaflet.extras)
library(ggplot2)
library(FNN)
library(agrometAPI)
library(elevatr)
library(rnaturalearth)
library(rnaturalearthhires)
library(kknn)
library(bst)
library(DiceKriging)
library(mlr)
library(geojsonsf)
library(ggplot2)
load("FOSS4GBXL2018.RData")

HELLO !

Welcome to this Session!

Why and How to use R as an opensource GIS : The AGROMET project usecase

presented by Thomas Goossens

r fontawesome::fa("linkedin", fill = "#75aadb") @thomasgoossens1985

r fontawesome::fa("github", fill = "#75aadb") @pokyah

r fontawesome::fa("envelope", fill = "#75aadb") hello.pokyah@gmail.com

knitr::include_graphics("images/foss4gbe.svg")

knitr::include_graphics("images/craw_fr.png")

PROJECT PRESENTATION

A very short introduction to the project

What?

r fontawesome::fa("bullseye", height = "50px", fill = "#75aadb") Providing hourly gridded weather data @ 1km² resolution for Wallonia

Why?

r fontawesome::fa("leaf", height = "50px", fill = "#75aadb")

Feeding decision tools for agricultural warning systems based on crop monitoring models (EU directive for Sustainable use of pesticides)

How?

r fontawesome::fa("th", height = "50px", fill = "#75aadb") spatializing data from PAMESEB Automatic Weather Station Network

GIS BUILDING BLOCKS

DATA

r fa("cubes" , height = "50px", fill = "#75aadb")

Weather data from the stations
explanatory variables (DEM, land cover, etc)
interpolation grid
map backgrounds

TOOLS

r fa("wrench" , height = "50px", fill = "#75aadb")

API query
Data manipulation
interpolation algorithms (linear regression, ANN, kriging)
algorithms output benchmarking
dataviz/mapping tools

WHY R ?

r fa("r-project" , height = "50px", fill = "#75aadb") It's FOSS ! Already used by our weather specialist partners : RMI + KNMI Increased popularity among tech companies Impressive growth and active community There is a package for that ™ collaborative work & science reproducibility

CODE LIBRARIES

Libraries are packaged thematic R-softwares containing predefined sets of functions to help you save code

R libraries are available on CRAN and r fa("github").
one-line code to install these (install.packages(<PACKAGE_NAME>)).
We need various libraries to acquire the required datasets (DEM, boundaries, weather stations data, etc...) and manipulate these.
pre-installed for this demo

library(devtools)
library(learnr)
library(rgdal)
library(sp)
library(raster)
library(sf)
library(rgeos)
library(fontawesome)
library(leaflet)
library(mlr)
library(dplyr)
library(ggplot2)
library(FNN)
library(agrometAPI)
library(elevatr)
library(rnaturalearth)
library(rnaturalearthhires)

DATA ACQUISITION

AGROMET WEATHER STATIONS DATA

We can get the temperature data from our stations using the agrometAPI library

# getting data from API (requires a token)
rawdata = agrometAPI::get_data(dfrom = "2018-10-01", dto = "2018-10-01")
rawdata = agrometAPI::type_data(rawdata)

# Let's see how it looks
rawdata

FILTERING DATA

Filtering is easy with dplyr

# Let's Filter it using dplyr
ourdataset = rawdata %>%
  dplyr::filter(!is.na(mtime)) %>%
  dplyr::filter(sid != 38 & sid != 41) %>%
  dplyr::filter(!is.na(from)) %>%
  dplyr::filter(!is.na(to)) %>%
  dplyr::filter(!is.na(tsa)) %>%
  dplyr::filter(poste != "China") %>%
  dplyr::filter(!type_name %in% c("PS2000","PESSL","BODATA","Sencrop","netdl1000","SYNOP")) %>%
  dplyr::select(c(sid, poste, longitude, latitude, altitude, mtime, tsa))

# Let's see how it looks
ourdataset

MAKING DATA SPATIAL

The new sf library allows to easily manipulate data with spatial attributes.

! Coordinate Reference System (CRS) See this cheatsheet.

# making it spatial
ourdataset = sf::st_as_sf(ourdataset, 
  coords = c("longitude", "latitude"),
  crs = 4326)

# outputing the CRS to the console
sf::st_crs(ourdataset)

DOWNLOADING ADMIN BOUNDARIES

Many R libraries provide the ability to get admin boundaries. We use rnaturalearth

# downloading admin boundaries of Belgium
belgium = sf::st_as_sf((rnaturalearth::ne_states(country = 'belgium')))
# inspecting the data
belgium
# visualizin
plot(belgium)

FILTERING TO ONLY KEEP OUR INTEREST ZONE

Let's do it again with dplyr

# Filtering Wallonia data
wallonia = belgium %>% dplyr::filter(region == "Walloon")
# visualizing
plot(wallonia)
# Checking the Coordinate Reference system
sf::st_crs(wallonia)

DOWNLOADING ELEVATION DATA - DEM RASTER

Again, many libraries provide ways to download such data. We use elevatr

Raster resolution is controlled by z param. See package documentation

# downloading DEM data and storing it in an object. Z = 5 ==> about 3km² resolution
elevation = elevatr::get_elev_raster(as(wallonia, "Spatial"), z = 5, src = "aws")
class(elevation)
plot(elevation)
# checking CRS
raster::crs(elevation, asText = TRUE)

CROPPING WITH OUR INTEREST ZONE BORDERS

package raster has a function for this !

# masking
elevation = raster::mask(elevation, as(wallonia, "Spatial"))
# ploting the elevation raster
plot(elevation)

INTERPOLATION GRID

Grid is quickly built with new sf library

# building the grid at 5 km² resolution
grid = sf::st_sf(sf::st_make_grid(x = sf::st_transform(wallonia, crs = 3812),  cellsize = 5000, what = "centers"))
# limit it to Wallonia and not full extent
grid = sf::st_intersection(grid, sf::st_transform(wallonia, crs = 3812))
# reproject it 
grid = sf::st_transform(grid, crs = 4326)
# visualizing 
plot(grid)

Extracting raster data at the locations of grid points

Extracting explanatory variables (limited to elevation raster for this example).

# extracting
extracted <- raster::extract(
  elevation,
  as(grid,"Spatial"),
  fun = mean,
  na.rm = FALSE,
  df = TRUE
)
# renaming columns
colnames(extracted) = c("ID", "altitude")
# inspecting
extracted

INJECTING INTO OUR GRID

Add the raster extracted data to our sf grid

# injecting
grid$altitude = extracted$altitude
# keeping only the altitude data
grid = dplyr::select(grid, altitude)
# visualizing
plot(grid)
# checking CRS
sf::st_crs(grid)

DATA VISUALIZATION

LEAFLET MAP

Making our spatial data intelligible by mapping it using leaflet

preparing the color palettes and settings for mobile phone responsiveness

elevation.pal <- colorNumeric(reverse = TRUE, "RdYlGn", values(elevation),
  na.color = "transparent")
temperature.pal <- colorNumeric(reverse = TRUE, "RdBu", domain=ourdataset$tsa,
  na.color = "transparent")
responsiveness = "\'<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\'"

mapping the various datasets

map <- leaflet() %>% 
     addProviderTiles(
         providers$OpenStreetMap.BlackAndWhite, group = "B&W") %>%
     addProviderTiles(
         providers$Esri.WorldImagery, group = "Satelitte") %>%
     addRasterImage(
         elevation, group = "Elevation", colors = elevation.pal, opacity = 0.8) %>%
     addPolygons(
         data = wallonia, group = "Admin", color = "#444444", weight = 1, smoothFactor = 0.5,
         opacity = 1, fillOpacity = 0.1, fillColor = "grey") %>%
     addCircleMarkers(
         data = ourdataset,
         group = "Stations",
         color = ~temperature.pal(tsa),
         stroke = FALSE,
        fillOpacity = 0.8,
         label = ~htmltools::htmlEscape(as.character(tsa))) %>%
    addCircleMarkers(
        data = grid,
        group = "Grid",
        radius = 1,
        color = "orange",
        stroke = FALSE, fillOpacity = 1) %>%
    addLegend(
      values = values(elevation), group = "Elevation",
      position = "bottomright", pal = elevation.pal,
      title = "Elevation (m)") %>%
     addLayersControl(
         baseGroups = c("B&W", "Satelitte"),
         overlayGroups = c("Elevation", "Admin", "Stations", "Grid"),
         options = layersControlOptions(collapsed = TRUE)
     ) %>%
     addFullscreenControl() %>%
     hideGroup(c("Slope", "Aspect")) %>%
     addEasyButton(easyButton(
         icon = "fa-crosshairs", title = "Locate Me",
         onClick = JS("function(btn, map){ map.locate({setView: true}); }"))) %>%
     htmlwidgets::onRender(paste0("
       function(el, x) {
       $('head').append(",responsiveness,");
       }"))
map

INTERPOLATION

Spatialization or spatial interpolation creates a continuous surface from values measured at discrete locations to predict values at any location in the interest zone with the best accuracy.

2 approaches

To predict values at any location :

~~physical atmospherical models~~ (not straight forward to develop an explicit physical model describing how the output data can be derived from the input data)
supervised machine learning regression algorithms : given a set of continuous data, find the best relationship that represents the set of continuous data (common approach largely discussed in the academic litterature)

Supervised Machine learning

We will go through a very simple example of machine learning usecase using mlr package

mlr is a r fa("r-project") library that offers a standardized interface for all its machine learning algorithms. Full details & capabilities on the website

short definition

From machinelearningmastery.com :

Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output : Y = f(X) The goal is to approximate the mapping function so well that when you have new input data (x), you can predict the output variables (Y) for that data. It is called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process

Defining our Machine Learning task

# loading the mlr library
library(mlr)
# extract the lon and lat from the sf geometry
coords = data.frame(sf::st_coordinates(ourdataset))
# attributing our original dataset to another var (to avoid overwriting)
ourTask = ourdataset
# converting our dataset from sf to simple df
sf::st_geometry(ourTask) <- NULL
# joining the coords
ourTask = dplyr::bind_cols(ourTask, coords)
# Dropping the non-explanatory features
ourTask = dplyr::select(ourTask, -c(sid, poste, mtime))
# defining our taks
ourTask = mlr::makeRegrTask(id = "FOSS4G_example", data = ourTask, target = "tsa")
# checking our data
head(mlr::getTaskData(ourTask))

Defining our learners (learning algorithms)

# Defining our learners
ourLearners = list(
  l1 = mlr::makeLearner(cl = "regr.lm", id = "linearRegression"),
  l2 = mlr::makeLearner(cl = "regr.fnn", id = "FastNearestNeighbours"),
  l3 = mlr::makeLearner(cl = "regr.nnet", id = "NeuralNetwork", par.vals = list(size = 10)),
  l4 = mlr::makeLearner(cl = "regr.km", id = "Kriging"),
  l5 = mlr::makeLearner(cl = "regr.kknn", id = "KNearestNeighborRegression"))

# 
ourLearners

Defining our resampling strategy

# Defining our learners
ourResamplingStrategy = mlr::makeResampleDesc("LOO")
ourResamplingStrategy

Performing our benchmark

Let's find which learner provides the best results (the lowest RMSE) for our specific spatial interpolation problem

#seting seed to make benchmark reproductible
# https://mlr-org.github.io/mlr/articles/tutorial/benchmark_experiments.html
set.seed(5030)

# performing the benchmark of our learners on our task
ourbenchmark = mlr::benchmark(
  learners = ourLearners,
  tasks = ourTask,
  resamplings = ourResamplingStrategy,
  measures = list(rmse)
)

performances = mlr::getBMRAggrPerformances(bmr = ourbenchmark, as.df = TRUE)
performances
best.learner = data.frame(performances %>% 
    slice(which.min(rmse.test.rmse)))

# Vizualizing the benchamrk result
library(ggplot2)
plotBMRBoxplots(
  bmr = ourbenchmark,
  measure = rmse,
  order.lrns = mlr::getBMRLearnerIds(ourbenchmark)) +
  aes(color = learner.id)

Training the best learner

let's train our best (lowest RMSE) learner on our dataset

# extract the lon and lat from the sf geometry
coords = data.frame(st_coordinates(grid))
# attributing our original dataset to another var (to avoid overwriting)
ourPredictionGrid = grid
# converting our dataset from sf to simple df
st_geometry(ourPredictionGrid) <- NULL
# joining the coords
ourPredictionGrid = dplyr::bind_cols(ourPredictionGrid, coords)
# removing NA rows from ourPredictionGrid (to avoid mlr bug : https://github.com/mlr-org/mlr/issues/1515)
ourPredictionGrid = dplyr::filter(ourPredictionGrid, !is.na(altitude))

# training the neural net on the dataset
ourModel = mlr::train(
  learner = mlr::getBMRLearners(bmr = ourbenchmark)[[as.character(best.learner$learner.id)]],
  task = ourTask)

ourModel

Predicting using the trained learner

Let's predict the value of tsa at the locations of our grid

# using our model to make the prediction
ourPrediction = predict(
  object = ourModel,
  newdata = ourPredictionGrid
)$data

# injecting the predicted values in the prediction grid
ourPredictedGrid = dplyr::bind_cols(ourPredictionGrid, ourPrediction)

# making the predicted grid a spatial object again
ourPredictedGrid = sf::st_as_sf(ourPredictedGrid, coords = c("X", "Y"), crs = 4326)
plot(ourPredictedGrid)

# Let's fake a raster rendering for better rendering
sfgrid = st_sf(sf::st_make_grid(x = sf::st_transform(wallonia, 3812),  cellsize = 5000, what = "polygons"))
ourPredictedGrid = sf::st_transform(ourPredictedGrid, crs = 3812)
ourPredictedGrid = sf::st_join(sfgrid, ourPredictedGrid)
ourPredictedGrid = ourPredictedGrid %>%
  dplyr::filter(!is.na(response))

Mapping the prediction

Adding our prediction layer to leaflet map

# reprojecting to 4326 for leaflet
ourPredictedGrid = sf::st_transform(ourPredictedGrid, 4326)

# adding to our map
map2 = map %>% 
  addPolygons(
    data = ourPredictedGrid,
    group = "Predictions",
    color = ~temperature.pal(response),
    stroke = FALSE,
    fillOpacity = 0.9,
    label = ~htmltools::htmlEscape(as.character(response))) %>%
  addLegend(
    values = ourPredictedGrid$response,
    group = "Predictions",
    position = "bottomleft", pal = temperature.pal,
    title = "predicted T (°C)") %>%
  addLayersControl(
         baseGroups = c("B&W", "Satelitte"),
         overlayGroups = c("Stations", "Predictions", "Elevation", "Admin", "Grid"),
         options = layersControlOptions(collapsed = TRUE)
     )
map2

INTEGRATION

EXPORTATION TO VARIOUS FORMATS

We can easily write to various formats thanks to sf library. Also possible to write to POSTGIS db thanks to rpostigis

Let's see how to write to geojson

# extracting centroid from polygons for light geojson
# this kind of operation require projected coordinates !
spatialized = sf::st_transform(ourPredictedGrid, crs = 3812)
spatialized$geometry <- spatialized %>% 
  st_centroid() %>% 
  st_geometry()

# back to 4326 for geojson standard
spatialized = sf::st_transform(spatialized, crs = 4326)

# exporting to geojson
sf::st_write(obj = spatialized, dsn = "spatialized.geojson")

# let's read the geojson
cat(readLines('spatialized.geojson'), sep = '\n')

# deleting the file
file.remove("spatialized.geojson")

INTEGRATION WITH OTHER PROGRAMMING LANGUAGE

We can easily run R code from bash (Rscript command), python (py2 library), php (example)

REPRODUCIBLE RESEARCH

Reproducibility is the capacity of repeating an experiment in any place with any person. It is only assured by providing complete setup instructions and resources.

Why is it so important ?

http://ropensci.github.io/reproducibility-guide/sections/introduction/
http://blog.revolutionanalytics.com/2017/04/reproducible-data-science-with-r.html

How can we achieve this with R ?

R package management system
Docker is your best friend for this purpose.
Packrat is a library for Reproducible package management for R for isolated, reproductible and portable work. More here

How can we dockerize an R app (called shiny app) ?

https://www.shinyproxy.io/deploying-apps/#run-shinyproxy
https://github.com/o2r-project/containerit/blob/master/vignettes/containerit.Rmd
https://jlintusaari.github.io/2018/07/how-to-compile-rmarkdown-documents-using-docker/
http://tamaszilagyi.com/blog/dockerized-shiny-app-development/

ABOUT

WHAT IS THE AUTHOR SPATIAL BACKGROUND ?

Don't be afraid to code. Yes, you can too !

Master in Geography (2007) @ ==> a looong time ago !
Few years of research : climate change, ice cores ==> no spatial skills
2015 : first coding experience (JS) ==> no spatial skills
2016 : stateOftheMap, foss4gbxl, opensource GIS ==>the beginning
2017 : job as geostatistician @CRAW ==> First experience with R
2018 : giving a first talk @foss4gbxl about spatial with R !

HOW TO START DOING GIS WITH R ?

Many ressources exist online. Books, videos, interactive tutorials, help forums, etc.

Read the book geocomputation with R
Follow the datacamp interactive course
Check my curated list of free tools and datasets on my blog.

COLOFON

You can also run this presentation at home

This presentation was built using learnr, a package to create your own interactive tutorials thanks to shiny.
It was deployed for free on shinyapps.io

The source code is available on r fa("github" , height = "50px", fill = "#75aadb")

pokyah/agrometeoR documentation built on May 26, 2019, 7 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

pokyah/agrometeoR Spatial interpolation for Agromet project @ CRA-W