knitr::opts_chunk$set(echo = TRUE)

RVaggregator

Raster and Vector spatial aggregation tool

Summary

RVaggregator aims to ease aggregation of spatial data. Aggregation involves the coarsening of a high spatial resolution raster across rasters or polygons of lower resolution (also called zonal statistics). Specifically, RVaggregator is a convenience wrapper for terra::extract().

I have not found an "all in one" and relatively fast aggregation solution in e.g. R/python/gdal etc that I like. RVaggregator can:

Install

Start up R (tested on R 3.5 and R 4.0) and run:

``` {r, eval = FALSE} install.packages("devtools") devtools::install_github("willmorrison1/RVaggregator")

If you run into issues with dependencies (typically R 4.0), try this

```r
Sys.setenv("R_REMOTES_NO_ERRORS_FROM_WARNINGS"="True")
devtools::install_github("willmorrison1/RVaggregator")

The required packages should then be installed.

R use

``` {r, message = FALSE, warning = FALSE} library(RVaggregator)

``` {r, fig.height = 4, fig.width = 9, message = FALSE, warning = FALSE}
par(mfrow = c(1, 3))
plot(rast("data/sample/input/sample_input_raster_ordinal.tif"), main = "input_file raster with ordinal data")

plot(vect("data/sample/input/sample_shapefile/sample_shapefile.shp"), main = "Input aggregation_file polygon data")

plot(rast("data/sample/input/sample_input_raster_ordinal.tif"), main = "Both")
plot(vect("data/sample/input/sample_shapefile/sample_shapefile.shp"), add = TRUE)

aggregated_data <- RVaggregator(input_file = "data/sample/input/sample_input_raster_ordinal.tif",
                                aggregation_file = "data/sample/input/sample_shapefile/sample_shapefile.shp",
                                aggregation_type = "fraction",
                                output_directory = "data/sample/output",
                                poly_chunk_size = 500)

The output data has same format as aggregation_file

class(aggregated_data)

Unlike aggregation_file , the polygons (each with unique ID) contain the data

head(aggregated_data)

You can also do the aggreagation using rasters, where aggregation_file raster has lower spatial resolution than input_file.

``` {r, fig.height = 4, fig.width = 7, message = FALSE, warning = FALSE} par(mfrow = c(1, 2)) plot(rast("data/sample/input/sample_input_raster_ordinal.tif"), main = "input_file raster\n with ordinal data")

plot(rast("data/sample/input/aggregation_file_shifted.tif"), main = "Input aggregation_file\nraster data")

aggregated_data <- RVaggregator(input_file = "data/sample/input/sample_input_raster_ordinal.tif", aggregation_file = "data/sample/input/aggregation_file_shifted.tif", aggregation_type = "fraction", output_directory = "data/sample/output", poly_chunk_size = 500)

When `aggregation_file` is raster, output is also raster with 3rd dimension with names of stats
``` {r}
class(aggregated_data)
names(aggregated_data)

Plot the second variable of the aggregated_data ``` {r, fig.height = 4, fig.width = 4, message = FALSE} print(names(aggregated_data)[2]) plot(aggregated_data[[2]], main = "Aggregated data,\nsecond stat")

## Poly_chunk_size

This parameter is most important and defines memory usage. Assuming input_file resolution is >= 1 m, the number of data points `n_data_points` loaded into memory at one time follows:

`n_data_points = ((poly_chunk_size*aggregation_dataset_area)/input_file_cell_area)*number_of_statistics`

where `aggregation_dataset_area` is the area of the aggregating polygon or cell (m^2), `input_file_cell_area` is the resolution of the input raster (m^2) and `number_of_statistics` is the number of aggregating statistics used. For `aggregation_type = "fraction"`, `number_of_statistics` is the number of ordinal classes + 2. For `aggregation_type = "distribution"`, it is set by the number of functions in `getManualFunctions()` + 2.

With `poly_chunk_size` set to 50, `aggregation_dataset_area` 22500 m^2 (a grid of 150 x 150 m polygons), `input_file_cell_area` 25 m^2 (5 m horizontal, 5 m vertical resolution), with an `input_file` a land cover map with 5 classes, then:

`n_data_points = ((50 * 22500) / 25) * (5 + 2) = 315000` points loaded at any one time

`n_data_points` increases when: cell size decreases, `aggregation_dataset_area` increases, `poly_chunk_size` increases, `number_of_statistics` increases. Computation time increases as `poly_chunk_size` decreases (R overhead).

## Command line use
The repo comes with a CLI wrapper e.g.

R RVaggregator-CLI.R

or 

R --no-save <RVaggregator-CLI.R

When run without arguments it will give a prompt and long description on the required inputs. the short description is:

usage: RVaggregator [--] [--help] [--opts OPTS] [--cache_directory CACHE_DIRECTORY] [--memory_fraction MEMORY_FRACTION] [--aggregation_chunk_size AGGREGATION_CHUNK_SIZE] [--aggregate_ordinal AGGREGATE_ORDINAL] input_file aggregation_file output_directory

Follow the "R use" section to familiarise yourself with the variable names and descriptions required for the CLI inputs. 

A sample command line execution under windows would be 

& "C:\Program Files\R\R-4.0.2\bin\Rscript.exe" .\CLI\RVaggregator-CLI.R data/sample/input/sample_input_raster_ordinal.tif data/sample/input/sample_shapefile/sample_shapefile.shp C:\Users\willm\Desktop\ --aggregate_ordinal TRUE ```



willmorrison1/RVaggregator documentation built on Feb. 18, 2024, 1:48 p.m.