library(devtools) library(dplyr) library(learnr) library(rgdal) library(raster) library(sp) library(sf) library(rgeos) library(fontawesome) library(leaflet) library(leaflet.extras) library(ggplot2) library(FNN) library(agrometAPI) library(elevatr) library(rnaturalearth) library(rnaturalearthhires) library(kknn) library(bst) library(DiceKriging) library(mlr) library(geojsonsf) library(ggplot2) load("FOSS4GBXL2018.RData")
Welcome to this Session!
Why and How to use R as an opensource GIS : The AGROMET project usecase
presented by Thomas Goossens
r fontawesome::fa("linkedin", fill = "#75aadb")
@thomasgoossens1985
r fontawesome::fa("github", fill = "#75aadb")
@pokyah
r fontawesome::fa("envelope", fill = "#75aadb")
hello.pokyah@gmail.com
knitr::include_graphics("images/foss4gbe.svg")
knitr::include_graphics("images/craw_fr.png")
A very short introduction to the project
r fontawesome::fa("bullseye", height = "50px", fill = "#75aadb")
Providing hourly gridded weather data @ 1km² resolution for Wallonia
r fontawesome::fa("leaf", height = "50px", fill = "#75aadb")
Feeding decision tools for agricultural warning systems based on crop monitoring models (EU directive for Sustainable use of pesticides)
r fontawesome::fa("th", height = "50px", fill = "#75aadb")
spatializing data from PAMESEB Automatic Weather Station Network
r fa("cubes" , height = "50px", fill = "#75aadb")
r fa("wrench" , height = "50px", fill = "#75aadb")
r fa("r-project" , height = "50px", fill = "#75aadb")
It's FOSS !
Already used by our weather specialist partners :
RMI + KNMI
Increased popularity among tech companies
Impressive growth and active community
There is a package for that ™
collaborative work & science reproducibility
Libraries are packaged thematic R-softwares containing predefined sets of functions to help you save code
r fa("github")
.install.packages(<PACKAGE_NAME>)
).library(devtools) library(learnr) library(rgdal) library(sp) library(raster) library(sf) library(rgeos) library(fontawesome) library(leaflet) library(mlr) library(dplyr) library(ggplot2) library(FNN) library(agrometAPI) library(elevatr) library(rnaturalearth) library(rnaturalearthhires)
We can get the temperature data from our stations using the
agrometAPI
library
# getting data from API (requires a token) rawdata = agrometAPI::get_data(dfrom = "2018-10-01", dto = "2018-10-01") rawdata = agrometAPI::type_data(rawdata) # Let's see how it looks rawdata
Filtering is easy with
dplyr
# Let's Filter it using dplyr ourdataset = rawdata %>% dplyr::filter(!is.na(mtime)) %>% dplyr::filter(sid != 38 & sid != 41) %>% dplyr::filter(!is.na(from)) %>% dplyr::filter(!is.na(to)) %>% dplyr::filter(!is.na(tsa)) %>% dplyr::filter(poste != "China") %>% dplyr::filter(!type_name %in% c("PS2000","PESSL","BODATA","Sencrop","netdl1000","SYNOP")) %>% dplyr::select(c(sid, poste, longitude, latitude, altitude, mtime, tsa)) # Let's see how it looks ourdataset
The new
sf
library allows to easily manipulate data with spatial attributes.
! Coordinate Reference System (CRS) See this cheatsheet.
# making it spatial ourdataset = sf::st_as_sf(ourdataset, coords = c("longitude", "latitude"), crs = 4326) # outputing the CRS to the console sf::st_crs(ourdataset)
Many R libraries provide the ability to get admin boundaries. We use
rnaturalearth
# downloading admin boundaries of Belgium belgium = sf::st_as_sf((rnaturalearth::ne_states(country = 'belgium'))) # inspecting the data belgium # visualizin plot(belgium)
Let's do it again with
dplyr
# Filtering Wallonia data wallonia = belgium %>% dplyr::filter(region == "Walloon") # visualizing plot(wallonia) # Checking the Coordinate Reference system sf::st_crs(wallonia)
Again, many libraries provide ways to download such data. We use
elevatr
Raster resolution is controlled by z
param. See package documentation
# downloading DEM data and storing it in an object. Z = 5 ==> about 3km² resolution elevation = elevatr::get_elev_raster(as(wallonia, "Spatial"), z = 5, src = "aws") class(elevation) plot(elevation) # checking CRS raster::crs(elevation, asText = TRUE)
package
raster
has a function for this !
# masking elevation = raster::mask(elevation, as(wallonia, "Spatial")) # ploting the elevation raster plot(elevation)
Grid is quickly built with new
sf
library
# building the grid at 5 km² resolution grid = sf::st_sf(sf::st_make_grid(x = sf::st_transform(wallonia, crs = 3812), cellsize = 5000, what = "centers")) # limit it to Wallonia and not full extent grid = sf::st_intersection(grid, sf::st_transform(wallonia, crs = 3812)) # reproject it grid = sf::st_transform(grid, crs = 4326) # visualizing plot(grid)
Extracting explanatory variables (limited to elevation raster for this example).
# extracting extracted <- raster::extract( elevation, as(grid,"Spatial"), fun = mean, na.rm = FALSE, df = TRUE ) # renaming columns colnames(extracted) = c("ID", "altitude") # inspecting extracted
Add the raster extracted data to our sf grid
# injecting grid$altitude = extracted$altitude # keeping only the altitude data grid = dplyr::select(grid, altitude) # visualizing plot(grid) # checking CRS sf::st_crs(grid)
Making our spatial data intelligible by mapping it using
leaflet
elevation.pal <- colorNumeric(reverse = TRUE, "RdYlGn", values(elevation), na.color = "transparent") temperature.pal <- colorNumeric(reverse = TRUE, "RdBu", domain=ourdataset$tsa, na.color = "transparent") responsiveness = "\'<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\'"
map <- leaflet() %>% addProviderTiles( providers$OpenStreetMap.BlackAndWhite, group = "B&W") %>% addProviderTiles( providers$Esri.WorldImagery, group = "Satelitte") %>% addRasterImage( elevation, group = "Elevation", colors = elevation.pal, opacity = 0.8) %>% addPolygons( data = wallonia, group = "Admin", color = "#444444", weight = 1, smoothFactor = 0.5, opacity = 1, fillOpacity = 0.1, fillColor = "grey") %>% addCircleMarkers( data = ourdataset, group = "Stations", color = ~temperature.pal(tsa), stroke = FALSE, fillOpacity = 0.8, label = ~htmltools::htmlEscape(as.character(tsa))) %>% addCircleMarkers( data = grid, group = "Grid", radius = 1, color = "orange", stroke = FALSE, fillOpacity = 1) %>% addLegend( values = values(elevation), group = "Elevation", position = "bottomright", pal = elevation.pal, title = "Elevation (m)") %>% addLayersControl( baseGroups = c("B&W", "Satelitte"), overlayGroups = c("Elevation", "Admin", "Stations", "Grid"), options = layersControlOptions(collapsed = TRUE) ) %>% addFullscreenControl() %>% hideGroup(c("Slope", "Aspect")) %>% addEasyButton(easyButton( icon = "fa-crosshairs", title = "Locate Me", onClick = JS("function(btn, map){ map.locate({setView: true}); }"))) %>% htmlwidgets::onRender(paste0(" function(el, x) { $('head').append(",responsiveness,"); }")) map
Spatialization or spatial interpolation creates a continuous surface from values measured at discrete locations to predict values at any location in the interest zone with the best accuracy.
To predict values at any location :
~~physical atmospherical models~~ (not straight forward to develop an explicit physical model describing how the output data can be derived from the input data)
supervised machine learning regression algorithms : given a set of continuous data, find the best relationship that represents the set of continuous data (common approach largely discussed in the academic litterature)
We will go through a very simple example of machine learning usecase using
mlr
package
mlr
is a r fa("r-project")
library that offers a standardized interface for all its machine learning algorithms. Full details & capabilities on the website
From machinelearningmastery.com :
Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output : Y = f(X) The goal is to approximate the mapping function so well that when you have new input data (x), you can predict the output variables (Y) for that data. It is called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process
# loading the mlr library library(mlr) # extract the lon and lat from the sf geometry coords = data.frame(sf::st_coordinates(ourdataset)) # attributing our original dataset to another var (to avoid overwriting) ourTask = ourdataset # converting our dataset from sf to simple df sf::st_geometry(ourTask) <- NULL # joining the coords ourTask = dplyr::bind_cols(ourTask, coords) # Dropping the non-explanatory features ourTask = dplyr::select(ourTask, -c(sid, poste, mtime)) # defining our taks ourTask = mlr::makeRegrTask(id = "FOSS4G_example", data = ourTask, target = "tsa") # checking our data head(mlr::getTaskData(ourTask))
# Defining our learners ourLearners = list( l1 = mlr::makeLearner(cl = "regr.lm", id = "linearRegression"), l2 = mlr::makeLearner(cl = "regr.fnn", id = "FastNearestNeighbours"), l3 = mlr::makeLearner(cl = "regr.nnet", id = "NeuralNetwork", par.vals = list(size = 10)), l4 = mlr::makeLearner(cl = "regr.km", id = "Kriging"), l5 = mlr::makeLearner(cl = "regr.kknn", id = "KNearestNeighborRegression")) # ourLearners
# Defining our learners ourResamplingStrategy = mlr::makeResampleDesc("LOO") ourResamplingStrategy
Let's find which learner provides the best results (the lowest RMSE) for our specific spatial interpolation problem
#seting seed to make benchmark reproductible # https://mlr-org.github.io/mlr/articles/tutorial/benchmark_experiments.html set.seed(5030) # performing the benchmark of our learners on our task ourbenchmark = mlr::benchmark( learners = ourLearners, tasks = ourTask, resamplings = ourResamplingStrategy, measures = list(rmse) ) performances = mlr::getBMRAggrPerformances(bmr = ourbenchmark, as.df = TRUE) performances best.learner = data.frame(performances %>% slice(which.min(rmse.test.rmse))) # Vizualizing the benchamrk result library(ggplot2) plotBMRBoxplots( bmr = ourbenchmark, measure = rmse, order.lrns = mlr::getBMRLearnerIds(ourbenchmark)) + aes(color = learner.id)
let's train our best (lowest RMSE) learner on our dataset
# extract the lon and lat from the sf geometry coords = data.frame(st_coordinates(grid)) # attributing our original dataset to another var (to avoid overwriting) ourPredictionGrid = grid # converting our dataset from sf to simple df st_geometry(ourPredictionGrid) <- NULL # joining the coords ourPredictionGrid = dplyr::bind_cols(ourPredictionGrid, coords) # removing NA rows from ourPredictionGrid (to avoid mlr bug : https://github.com/mlr-org/mlr/issues/1515) ourPredictionGrid = dplyr::filter(ourPredictionGrid, !is.na(altitude)) # training the neural net on the dataset ourModel = mlr::train( learner = mlr::getBMRLearners(bmr = ourbenchmark)[[as.character(best.learner$learner.id)]], task = ourTask) ourModel
Let's predict the value of tsa at the locations of our grid
# using our model to make the prediction ourPrediction = predict( object = ourModel, newdata = ourPredictionGrid )$data # injecting the predicted values in the prediction grid ourPredictedGrid = dplyr::bind_cols(ourPredictionGrid, ourPrediction) # making the predicted grid a spatial object again ourPredictedGrid = sf::st_as_sf(ourPredictedGrid, coords = c("X", "Y"), crs = 4326) plot(ourPredictedGrid) # Let's fake a raster rendering for better rendering sfgrid = st_sf(sf::st_make_grid(x = sf::st_transform(wallonia, 3812), cellsize = 5000, what = "polygons")) ourPredictedGrid = sf::st_transform(ourPredictedGrid, crs = 3812) ourPredictedGrid = sf::st_join(sfgrid, ourPredictedGrid) ourPredictedGrid = ourPredictedGrid %>% dplyr::filter(!is.na(response))
Adding our prediction layer to leaflet map
# reprojecting to 4326 for leaflet ourPredictedGrid = sf::st_transform(ourPredictedGrid, 4326) # adding to our map map2 = map %>% addPolygons( data = ourPredictedGrid, group = "Predictions", color = ~temperature.pal(response), stroke = FALSE, fillOpacity = 0.9, label = ~htmltools::htmlEscape(as.character(response))) %>% addLegend( values = ourPredictedGrid$response, group = "Predictions", position = "bottomleft", pal = temperature.pal, title = "predicted T (°C)") %>% addLayersControl( baseGroups = c("B&W", "Satelitte"), overlayGroups = c("Stations", "Predictions", "Elevation", "Admin", "Grid"), options = layersControlOptions(collapsed = TRUE) ) map2
We can easily write to various formats thanks to
sf
library. Also possible to write to POSTGIS db thanks torpostigis
Let's see how to write to geojson
# extracting centroid from polygons for light geojson # this kind of operation require projected coordinates ! spatialized = sf::st_transform(ourPredictedGrid, crs = 3812) spatialized$geometry <- spatialized %>% st_centroid() %>% st_geometry() # back to 4326 for geojson standard spatialized = sf::st_transform(spatialized, crs = 4326) # exporting to geojson sf::st_write(obj = spatialized, dsn = "spatialized.geojson") # let's read the geojson cat(readLines('spatialized.geojson'), sep = '\n') # deleting the file file.remove("spatialized.geojson")
We can easily run R code from bash (
Rscript
command), python (py2
library), php (example)
Reproducibility is the capacity of repeating an experiment in any place with any person. It is only assured by providing complete setup instructions and resources.
Packrat
is a library for Reproducible package management for R for isolated, reproductible and portable work. More hereDon't be afraid to code. Yes, you can too !
Many ressources exist online. Books, videos, interactive tutorials, help forums, etc.
You can also run this presentation at home
This presentation was built using learnr
, a package to create your own interactive tutorials thanks to shiny
.
It was deployed for free on shinyapps.io
The source code is available on r fa("github" , height = "50px", fill = "#75aadb")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.