In tabrasel/WetlandTools: General utility functions for TerrainWorks

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(TerrainWorksUtils)

Predicting new data using a trained model

TerrainWorksUtils contains helper functions designed to simplify the process of applying a model to multiple DEMs, which is useful in managing data sets with a large expanse.

These functions are designed to work with the type of model produced by mlr3, but really can be used with any model that contains a function called $predict_newdata that accepts new data in the form of a data frame.

Define our inputs

First, we identify the inputs. For this example, a directory with predicting input should be defined as a parameter when stitching this file. We choose to put the output in a directory at the same level as the input directory, with the name "pred_output". If this directory does not yet exist, it will be automatically created by the predicting function.

input_files <- tibble(gradient = list.files(params$input_dir, "gradient", 
                                            recursive = TRUE, full.names = TRUE),
                      mean_curv = list.files(params$input_dir, "mean_curv", 
                                            recursive = TRUE, full.names = TRUE),
                      pca = list.files(params$input_dir, "pca", 
                                       recursive = TRUE, full.names = TRUE))
input_files <- as_tibble(t(input_files))
input_files

The list of basins is a list of all the sub-directories in the folder, where each sub-directory contains the elevation derivative files for a single DEM. The files in the folder should be named as following: 1. gradient.flt 2. mean_curve.flt 3. pca[...].flt

The ellipses indicates that the filename can have some other characters after it. In our case, the files are labeled with a number.

params$output_dir
out_label = gsub("\\D", "", input_files[1, ])
out_label

Load a trained model.

We use a pre-trained model for the example.

load(params$trained_model)

Now, predict the new data.

We put the input into the function.

The function reports the full path of the file that was created and how long it took to predict and write the file. The output of this function is minimal - a large matrix object which contains all of the predicted probabilities is created. The main purpose of this function is to create the .csv files that store the predictions.

This makes it easier to evaluate predictions from different models, without running time-intensive predicting processes multiple times.

out <- predict_multiple_dems(model = ls_model, 
                             files_in = input_files[, 1],
                             output_dir = params$output_dir, 
                             out_label = out_label[1],
                             mask_range = ranges,
                             scale_vals = scale_vals, 
                             write_covars = FALSE)