In qgisprocess: Use 'QGIS' Processing Algorithms

For a very short introduction to qgisprocess, visit the homepage.

Here you will learn about package configuration, about basic usage illustrated by two examples, and how to pipe results into a next geoprocessing step.

Setting up the system

qgisprocess is basically a wrapper around the standalone command line tool qgis_process. Therefore, you need to have installed QGIS on your system as well as third-party providers such as GRASS GIS and SAGA to access and run all geoalgorithms provided through qgis_process from within R.

The package is meant to support current QGIS releases, i.e. both the latest and the long-term release. Although older QGIS releases are not officially supported, qgisprocess might work with QGIS versions >=3.16. Download instructions for all platforms are available at https://download.qgis.org/.

To facilitate using qgisprocess, we have created a docker image that already comes with the needed software packages. You can pull it from Github's container registry by running:

```{sh, eval=FALSE} docker pull ghcr.io/geocompx/docker:qgis

For a more detailed introduction on how to get started with docker, please refer to https://github.com/geocompx/docker.

# Package configuration

Since **qgisprocess** depends on the command line tool `qgis_process`, it already tries to detect `qgis_process` on your system when it is being loaded, and complains if it cannot find it.

```r
library("qgisprocess")

When loading qgisprocess for the first time, it will cache among others the path to qgis_process, the QGIS version and the list of known algorithms. When loading qgisprocess in later R sessions, the cache file is read instead for speed-up, on condition that it is still valid. Therefore, usually you don't have to do any configuration yourself, unless there's a message telling you to do so.

If you are interested in the details about this process, e.g. how qgisprocess detected qgis_process, run qgis_configure(use_cached_data = TRUE).

qgis_configure(use_cached_data = TRUE)

If needed the cache will be rebuilt automatically upon loading the package. This is the case when the QGIS version or the location of the qgis_process command-line utility has changed, user-settings (e.g. the option qgisprocess.path) have been altered or a changed state of the processing provider plugins (enabled vs. disabled) has been detected.

Rebuilding the cache can be triggered manually by running qgis_configure() (its default is use_cached_data = FALSE).

To determine the location of qgis_process, qgis_configure() first checks if the R option qgisprocess.path or the global environment variable R_QGISPROCESS_PATH has been set. This already indicates that you can specify one of these settings in case qgis_process has not been installed in one of the most common locations or if there are multiple QGIS versions available. If this is the case, set options(qgisprocess.path = '/path/to/qgis_process') or set the environment variable (e.g. in .Renviron) and run qgis_configure(). Under Windows make sure to indicate the path to the qgis_process-qgis.bat file, e.g.,

# specify path to QGIS installation on Windows 
options(qgisprocess.path = "C:/Program Files/QGIS 3.28/bin/qgis_process-qgis.bat")
# or use the QGIS nightly version (if installed via OSGeo4W)
# options(qgisprocess.path = "C:/OSGeo4W64/bin/qgis_process-qgis-dev.bat")
qgis_configure() # or use library(qgisprocess) if package was not loaded yet

Assuming that package loading or qgis_configure() ran successfully, we can check which QGIS version our system is running (it takes this from the cache):

qgis_version()

Next, let's check which plugins are at our disposal:

qgis_plugins()

Since we will use GRASS GIS and SAGA later on, you must have GRASS GIS and SAGA version > 7 installed on your system. You also need to install the third-party plugin 'SAGA Next Generation' in the QGIS GUI. The GRASS provider plugin is already built-in in QGIS.

Then, let's enable both plugins:

qgis_enable_plugins(c("grassprovider", "processing_saga_nextgen"))

Now, let's list all available providers including available third-party applications:

qgis_providers()

This tells us that we can also use the third-party providers GDAL, GRASS and SAGA through the QGIS interface.

Basic usage

First example

To get the complete overview of available (cached) geoalgorithms, run:

algs <- qgis_algorithms()
algs

For a directed search, use qgis_search_algorithms():

qgis_search_algorithms(algorithm = "buffer", group = "[Vv]ector")

Since we have also installed GRASS GIS and SAGA, over 1000 geoalgorithms are at our disposal. To find out about a specific geoalgorithm and a description of its arguments, use qgis_show_help(), e.g.:

qgis_show_help("native:buffer")
## Buffer (native:buffer)
## 
## ----------------
## Description
## ----------------
## This algorithm computes a buffer area for all the features in an input layer, using a fixed or dynamic distance.
## 
## The segments parameter controls the number of line segments to use to approximate a quarter circle when creating rounded offsets.
## 
## ...

To find out the arguments of a specific geoalgorithm, run:

qgis_get_argument_specs("native:buffer")

And finally run it with qgis_run_algorithm():

# if needed, first install spDataLarge:
# remotes::install_github("Nowosad/spDataLarge")
data("random_points", package = "spDataLarge")
result <- qgis_run_algorithm("native:buffer", INPUT = random_points, DISTANCE = 50)

As a convenience to the user, qgis_run_algorithm() reports all unspecified and automatically chosen arguments. If you want to have even more information on what is going on in the background, set .quiet to FALSE. The result object is of class qgis_result and contains the path to the output file created by qgis_process (when not explicitly setting an output filepath, qgisprocess creates it automatically for you). The output filepath can be extracted with qgis_extract_output(). qgis_result objects are of type list which, aside from the geoprocessing result, also contain debugging information about the used algorithm, input arguments and messages from the processing step. See ?qgis_result_status for various convenience functions to extract all of this information easily from qgis_result objects.

For QGIS 3.24 and later, qgis_run_algorithm() passes the input arguments to QGIS as a JSON string. The JSON input string is also included in qgis_result objects. Moreover, the user can specify input arguments directly as JSON in qgis_run_algorithm(). That is useful since input parameters can be copied from the QGIS GUI as JSON. This will be demonstrated in a separate tutorial.

# inspect the result object
class(result)
names(result)
result # only prints the output element(s)

To read in the QGIS output and visualize it, we can run:

library("sf")
library("mapview")
# attach QGIS output
# either do it "manually":
buf <- read_sf(qgis_extract_output(result, "OUTPUT"))
# or use the st_as_sf.qgis_result method:
buf <- sf::st_as_sf(result)
# plot your result
mapview(buf, col.regions = "blue") + 
  mapview(random_points, col.regions = "red", cex = 3)

You can convert each QGIS algorithm into an R function with qgis_function(). So using our buffer example from above, we could also run:

# create a function
qgis_buffer <- qgis_function("native:buffer")
# run the function
result <- qgis_buffer(INPUT = random_points, DISTANCE = 50)

This is basically what package qgis is doing for each available QGIS function while also providing an R help file for each function. Hence, if you prefer running QGIS with callable R functions, check it out.

Second example

As a second example, let's have a look at how to do raster processing running GRASS GIS in the background. To compute various terrain attributes of a digital elevation model, we can use grass:r.slope.aspect.

Note: in QGIS versions < 3.36, the processing provider was still called grass7 (even though this provider works with GRASS GIS 8). So if you have an older QGIS version, you must name the algorithms as grass7:r.slope.aspect etc.

qgis_get_description() (also included in qgis_show_help()) gives us the general description of the algorithm.

qgis_get_description("grass:r.slope.aspect")

We can find out about the arguments again with the help of qgis_get_argument_specs().

qgis_get_argument_specs("grass:r.slope.aspect")

qgis_get_output_specs() shows the different outputs that will be calculated:

qgis_get_output_specs("grass:r.slope.aspect")

Now let us calculate the terrain attributes.

library("terra")
# attach digital elevation model from Mt. Mongón (Peru)
dem <- rast(system.file("raster/dem.tif", package = "spDataLarge"))
# if not already done, enable the GRASS GIS plugin
# qgis_enable_plugins("grassprovider")
info <- qgis_run_algorithm(alg = "grass:r.slope.aspect", elevation = dem)

Just printing the info object shows which output files have been made:

info

Combine these output rasters as a multi-layered SpatRaster object and plot it:

# just keep the names of output rasters
nms <- qgis_get_output_specs("grass:r.slope.aspect")$name
# read in the output rasters 
r <- info[nms] |>
  unlist() |>
  rast() |>
  as.numeric()
names(r) <- nms
# plot the output
plot(r)

An alternative way to combine the rasters is given below.

r <- lapply(info[nms], \(x) as.numeric(qgis_as_terra(x))) |> 
  rast()

Since we now have many terrain attributes at our disposal, let us take the opportunity to add their values to points laying on top of them with the help of the SAGA algorithm sagang:addrastervaluestopoints.

qgis_get_argument_specs("sagang:addrastervaluestopoints")

The GRIDS argument is of type multilayer. To pass multiple layers to one argument, you can either repeat the corresponding argument as often as needed ...

rp_tp <- qgis_run_algorithm(
  "sagang:addrastervaluestopoints",
  SHAPES = random_points,
  GRIDS = qgis_extract_output(info, "aspect"),
  GRIDS = qgis_extract_output(info, "slope"),
  GRIDS = qgis_extract_output(info, "tcurvature"),
  RESAMPLING = 0)

... or you can pass to it all needed layers in one list. One could use the list() command but it is recommendended to use the qgis_list_input() function which is more robust, and therefore will also support non-JSON-input configurations (e.g. QGIS < 3.24).

rp_tp <- qgis_run_algorithm(
  "sagang:addrastervaluestopoints",
  SHAPES = random_points,
  GRIDS = qgis_list_input(
    qgis_extract_output(info, "aspect"),
    qgis_extract_output(info, "slope"),
    qgis_extract_output(info, "tcurvature")
  ),
  RESAMPLING = 0)

To verify that it worked, read in the output.

sf::st_as_sf(rp_tp)

Piping

qgis_process does not lend itself naturally to piping because its first argument is the name of a geoalgorithm instead of a data object. qgis_run_algorithm_p() circumvents this by accepting a .data object as its first argument, and pipes this data object into the first argument of a geoalgorithm assuming that the specified geoalgorithm needs a data input object as its first argument.

system.file("longlake/longlake_depth.gpkg", package = "qgisprocess") |>
  qgis_run_algorithm_p("native:buffer", DISTANCE = 50)

If .data is a qgis_result object, qgis_run_algorithm_p() automatically tries to select an element named OUTPUT. However, if the output has another name (e.g., DEM_PREPROC as in the example below) or if there are multiple output elements to choose from (e.g., sagang:sagawetnessindex has four output rasters, check with qgis_outputs("sagang:sagawetnessindex")), you can specify the wanted output object via the .select argument. Please note that we make sure that temporary output raster files, i.e., all output rasters we do not specifically name ourselves, should use SAGA's native raster file format by setting the qgisprocess.tmp_raster_ext option to .sdat. Using the default raster output format .tif might lead to trouble depending on the installed versions of third-party packages (GDAL, SAGA, etc.).

dem <- system.file("raster/dem.tif", package = "spDataLarge")
# in case you need to enable the SAGA next generation algorithms, run the following line:
# qgis_enable_plugins("processing_saga_nextgen")

oldopt <- options(qgisprocess.tmp_raster_ext = ".sdat")
qgis_run_algorithm(algorithm = "sagang:sinkremoval", DEM = dem,
                   METHOD = 1) |>
  qgis_run_algorithm_p("sagang:sagawetnessindex", .select = "DEM_PREPROC")

When piping, qgis_run_algorithm_p() automatically cleans up after you by deleting intermediate results. This avoids cluttering your system when running geoalgorithms on large spatial data files. To turn off this behavior, set .clean to FALSE.

Of course, you can also pipe to qgis_run_algorithm() by manually extracting the OUTPUT object and redirecting it to the appropriate input argument of the next processing step. This avoids ambiguity and allows for greater flexibility though it might not be as convenient as qgis_run_algorithm_p(). For example, intermediate results remain on disk for the duration of your R session, unless you manually call qgis_clean_result() on a result object.

result <- qgis_run_algorithm(algorithm = "sagang:sinkremoval", DEM = dem, 
                         METHOD = 1) |>
  qgis_extract_output("DEM_PREPROC") |>
  qgis_run_algorithm(algorithm = "sagang:sagawetnessindex",
                     DEM = _)

# or using an anonymous function
# result <- qgis_run_algorithm(algorithm = "sagang:sinkremoval", DEM = dem, 
#                          METHOD = 1) |>
#   (\(x) qgis_run_algorithm(algorithm = "sagang:sagawetnessindex",
#                            DEM = x$DEM_PREPROC[1])) ()

# set the default output raster format to .tif again
options(oldopt)