knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  eval=FALSE
)

Hex Maps

This annotated code below will show you how to extract the data to make maps similar to these hexagon maps below

NDVI Standard Score Map

knitr::include_graphics("man/figures/202203_som_ndvi.png")

Precipitaiton Record Low Map

knitr::include_graphics("man/figures/20220420_som_record_low_rainfall_2021.png")

Below is the code to extract the data relevant to your context. Once the data is extracted it can easily be mapped in any GIS software (even here in R). If you like how the above examples look, feel free to email zack.arno.ext@impact-initiatives.org for the QGIS templates. The output below can be put directly into the QGIS templates.

Inputs

The code chunk below is the only place where the end-user needs to provide changes/modifications to the code to supply the correct inputs.

To perform the analysis for the maps you must have either: a. an area of interest (AOI) spatial file, b. a pre-made grid over your AOI.

If you only have a , then the script will automatically generate b based on your desired hexagon diameter. You should put these file(s) into the data folder in this project working directory and provide the layer names as input in the code chunk below. If you do not provide either a or b a the boundary of Somalia will be used to creat the hexagon mesh. A default hexagon size of 15 km has been provided, but you should carefully select this based on what granularity would be most useful to visualize in your specific contexts. Additionally must set use the 3 letter (lower case) country code.

library(surveyGEER)
library(tidyrgee)
library(sf)
library(tidyverse)
library(rgee)
ee_Initialize()
devtools::load_all()


write_outputs_to <-  "data/"
extract_ndvi_to_hex <- c(T,F)[1]
extract_precip_to_hex <- c(T,F)[1]

country_code <- "eth"
AOI_layer_name <- "eth_adm0"
AOI_grid_layer_name <- NULL
hex_diameter <- 15000 # 15 km set as default diameter for hexagon

surveyGEER:::aoi_or_grid_warning()

Load libraries/Data

# ee_Initialize()


modis_link <- "MODIS/006/MOD13Q1"
modisIC <- ee$ImageCollection(modis_link)



chirps_link <- "UCSB-CHG/CHIRPS/DAILY"
chirps <- ee$ImageCollection(chirps_link)

The below function will either load your pre-made grid or will generate a grid based on your AOI_layer_name, country_code, and hexagon_diameter inputs.

grid <- load_aoi_grid(aoi =AOI_layer_name,
                           grid = AOI_grid_layer_name,
                           hex_diameter = hex_diameter,
                           country_code = country_code
                           )
grid_wgs84 <- grid |> st_transform(crs=4326)
leaflet::leaflet(grid_wgs84) |>
  leaflet::addTiles() |> 
  leaflet::addPolygons(data=grid_wgs84)

Data Wrangling

MODIS NDVI should be transformed by multiplying the raster values by 0.0001, below you can get a sense of what typical GEE/RGEE syntax looks like. For must R-users this looks a bit strange. tidyrgee which is shown in the following chunks simplifies a great deal of this for the end-user. Since raster math has not yet been added to to tidyrgee (coming soon), we will leave the the "classical" syntax below.

modis_ndvi <- modisIC$
  select("NDVI")$
  map(
    ee_utils_pyfunc(
      function(x){x$
          multiply(0.0001)$
          copyProperties(x,x$propertyNames())
          }
    )
  )

Now it's time for some actual analysis. Here is where tidyrgee shines and vastly simplifies the code. To use the full power of tidyrgee you should convert the imageCollection to a tidyee class object with the simple function as_tidyee. Once this has been performed it is very easy to group, filter, composite(summarise) the imageCollection.

To calculate a standard score (Z-score) for monthly data we first need to create a baseline. Therefore we filter the imageCollection to 2000-2015. Then we group that data by month and summarise, calculating a mean and standard deviation for each month. The result will store an imageCollection of 12 images (1 per month). Each image will have 2 bands the NDVI_mean and NDVI_sd. Statistics are calculated at the pixel-level for each image.

modis_ndvi_tidy <- as_tidyee(modis_ndvi)


monthly_baseline <- modis_ndvi_tidy |> 
  filter(year %in% 2000:2015) |> 
  group_by(month) |> 
  summarise(stat=list("mean","sd"))

Now that we have our baseline let's calculate the the average monthly NDVI for more recent years so that we can compare. We will first filter the ImageCollection to 2016-2022. Since we are working with the MODIS NDVI 16-day composite we have approximately 2 images per month. Therefore to simplify the process we should take the average NDVI value (per pixel) each month. To do this we can first group_by(year,month) and then call summarise. Remember if we just called group_by(month) it would return 12 images with 1 monthly average value for the entire time period of 2022-2012. By grouping by year and month we get the average monthly value for each year!

ndvi_recent_monthly <- modis_ndvi_tidy |> 
  filter(year %in% c(2016:2022)) |> 
  group_by(year,month) |> 
  summarise(
    stat="mean"
  )

Since I know I want to eventually combine the baseline stats calculated previously (NDVI_mean, NDVI_sd) with these new monthly means ,I need to rename the band produced in this new summary. By default the band_name is written as: original band names + _ + stat. To avoid duplicate band names later when joining I will rename the band names in ndvi_recent_monthly from NDVI_mean just to NDVI. We can do this with the select method just by supplying a name (will be adding a pure rename function soon).

ndvi_recent_renamed <- ndvi_recent_monthly |> 
  select(NDVI="NDVI_mean")

Now we are ready to join the ImageCollection bands into 1 ImageCollection. The following inner join returns 1 ImageCollection storing the monthly average NDVI values from 2016-2022 and the long-term (baseline) monthly average and standard deviations of NDVI

ndvi_recent_and_baseline<- inner_join(x = ndvi_recent_renamed,
                                     y = monthly_baseline,
                                     by = "month")

Now we have all the necessary ingredients to calculate a standard score (Z-score). Below you get another opportunity to view some of trickier rgee/GEE syntax. This expression functionality has not yet been adapted/wrapped in tidyrgee (coming soon) so using the rgee syntax is still our only option. This is why a basic understanding of GEE and rgee is still quite handy.

rgee cannot handle tidyee classes, therefore we need to convert the object back to the ee$ImageCollection format rgee was written for. Luckily tidyrgee is being developed to be inter-operable with rgee and provides simple helper functions such as as_ee for this type of conversion. After running as_ee we have access to all the methods available in rgee/GEE.

ndvi_recent_baseline_imageCol <- ndvi_recent_and_baseline |> 
  as_ee()

ndvi_zscore<- ndvi_recent_baseline_imageCol$map(
  function(img){
    zscore<- img$expression(
      "float((NDVI-NDVI_mean)/(NDVI_stdDev))",
      opt_map= list(NDVI= img$select("NDVI"),
                    NDVI_mean= img$select("NDVI_mean"),
                    NDVI_stdDev= img$select("NDVI_stdDev")
      )
    )$rename("NDVI_z_score")
    img$select("NDVI","NDVI_mean")$addBands(zscore)
  }

) 

Zonal stats per hexagon

ndvi_zscore is now an ee$ImageCollection class object. We can easily convert back to tidyee with as_tidyee(). However, the zonal statistics tool ee_extract_tidy() works on both on ee$ImageCollection and tidyee. Just for ease of filtering and selecting let's convert it back to tidyee

Zonal statistics are heavery computationally expensive calculations. For example I ran the stats to calculate the median value of all 3 bands for every hexagon grid (r nrow(grid)) for every month in the current composite collection (76 images) and it to 160 minutes. Therefore, let's do some pre-processing so we only extract exactly what we need:

ndvi_z<- as_tidyee(ndvi_zscore)


ndvi_z_pre_processed <- ndvi_z |>
  filter(year>=2021) |>
  select("NDVI_z_score")

You should run ?ee_extract_tidy to check all the optional parameters. It is important to include the scale of the image/imageCollection. If you are using a large featureCollection or sf spatial object you will need to adjust the via argument. I usually have best luck with via="drive". The amount of time to perform an extraction can vary due to computations in the cloud, but as a reference point this example (3509 15 km diameter grids) took 232 seconds. If your grid covers the bounding box of your study area, it is probably a good idea to clip it to the area of study polygon to avoid unnecessary computations and save time.

if(extract_ndvi_to_hex==T){
  system.time(
  grid_with_ndvi <- ndvi_z_pre_processed |> 
  ee_extract_tidy(y = grid ,stat = "mean",scale = 250,via = "drive")
  )
}

grid_w_ndvi_spatial <- grid |> 
  left_join(grid_with_ndvi)

write output

if(!is.null(write_outputs_to)){
  hex_file_path_name <- paste0(write_outputs_to,
                               "reach_",
                               country_code,
                               "_hex_",
                               hex_diameter,
                               "_ndvi.shp")
  grid_w_ndvi_spatial |> 
    sf::st_write(hex_file_path_name)

}

Now let's prep the chirps data for extraction. First convert it to tidyee format so we can leverage the easy filter/summarise methods

# get cummulative rainfall for each year on record
chirps_tidy <- as_tidyee(chirps)

Let's take the entire daily record from 1981-current and extract the yearly precipitation for each year at the pixel level. Since each pixel represents daily rainfall (mm) we can just group the images by year and sum them to get annual rainfall.

chirps_annual_precip <- chirps_tidy |> 
  select("precipitation") |> 
  group_by(year) |> 
  summarise(
    stat="sum"
  )

Next let's again use the ee_extract_tidy function to extract these yearly values. Conceptually we are using our grid to perform zonal statistics on each image (1 per year). the output is a long format tidy data frame with all the median yearly values (1 per hex per year). This took about:

if(extract_precip_to_hex){
yearly_rainfall_hex <-   chirps_annual_precip |> 
  ee_extract_tidy(y = grid,
                  stat = "median",
                  scale = 5500,
                  via = "drive")

}
yearly_rainfall_hex_1981_2021<- yearly_rainfall_hex |> 
  filter(lubridate::year(date)!=2022) |> 
  dplyr::group_by(uid) |> 
  dplyr::mutate(
    record_low = value==min(value),
    record_high= value==max(value)
  ) 

record_lows_2021 <- yearly_rainfall_hex |> 
  filter(record_low) |> 
  filter(date=="2021-01-01")

record_highs_2021 <- yearly_rainfall_hex |> 
  filter(record_high) |> 
  filter(date=="2021-01-01")


record_lows_2021 <- yearly_rainfall_hex |> 
  # filter out 2022 since it is not yet  complete
  filter(lubridate::year(date)!=2022) |> 
  # group by grid uid
  dplyr::group_by(uid) |>
  # grab two lowest annual records
  dplyr::slice_min(n = 2,order_by = value) |> 
  # calculate the difference in the two lowest annual values 
  # and specify which is the lowest with T/F
  mutate(
    lower_by = max(value)- min(value),
    record_low = value==min(value)
  ) |> 
  ungroup() |> 
  # filter just the lowest record
  filter(record_low) |> 
  mutate(
    date= lubridate::year(date)
  ) |> 
  #now filter just to year of interest 2021 to 
  # see where that year had all time lowest annual value on record
  filter(date==2021)
# join record lows to original hex object by "uid" to make spatial again
# then filter out any hex that does not have record breaking value
# change remaining hex's to centroid points so that we can map as proportional bubbles.
record_low_centroids<- grid |> 
  dplyr::left_join(record_lows_2021) |> 
  filter(record_low==T) |> 
  st_centroid() 


if(!is.null(write_outputs_to)){
  sf::st_write(obj = record_low_centroid,dsn = record_low_centroids,layer = "precip_record_low.shp")
}

QGIS Templates

We have provided QGIS templates in the map_templates folder of this directory. Once the data has been extracted and save to the data folder in the project you should open up the either of the map templates.

To use the map templates you must link the data sources to the layers. Once linked the symoblogy will automatically be set. To use these templates you should also have some layers of your own. If you put all of you GIS layers into the data folder when you link one the rest will auto-link. You should put the following layers into the data folder:

  1. Admin 1 boundary (usually OCHA or UNDP)
  2. Admin 2 boundary (usually OCHA or UNDP)
  3. Hexagon grid (either pre-existing or created by you above)
  4. NDVI layer (Hexagon output by script above)
  5. Precipitation layer (point file output by script above)
  6. World Adm0 boundaries - downloaded from natural earth
knitr::include_graphics("man/figures/qgis_layers_panel.png")
knitr::include_graphics("man/figures/qgis_layers_panel.png")


impact-initiatives-geospatial/surveyGEER documentation built on Feb. 4, 2023, 12:13 p.m.