In DrylandEcology/rMultivariateMatching: Multivariate matching for site selection and interpolation using multi-dimensional datasets

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)

Introduction

This vignette describes a workflow that covers matching and interpolation for a case where subset cells were not determined using the kpoints function. This workflow is appropriate when the goals is to interpolate simulation results from subset cells to create high-resolution, continuous maps of the study area. Here, we illustrate the workflow using an example for the state of Wyoming, where subset cells were generated using a regular 10 x 10 km grid (see Bradford et al. (2019) for details). The code chunks in this document are meant to be run sequentially. Data relevant to this example are contained within the rMultivariateMatching package. The site-selection and interpolation methods are described in:

Renne, R. R., Schlaepfer, D. R., Palmquist, K. A., Lauenroth, W. K., & Bradford, J. B. In preparation. Estimating complex ecological variables at high spatial resolution using cost-effective multivariate matching algorithms.

Examples of the interpolation methods in print

In the interpolation example in Renne et al., our goal was to generate high-resolution, 30-arcsecond maps of SOILWAT2 simulation output with low spatial resolution (10 x 10 km). SOILWAT2 is a process-based ecohydrological simulation model (Schlaepfer et al., 2012; Schlaepfer & Andrews, 2018; Schlaepfer & Murphy, 2018). The two-step matching method described in Renne et al. was devised to interpolate simulation output for a series of studies investigating the potential impacts of climate change on ecohydological variables in western North American drylands. Interpolated datasets from these studies will soon be available from the US Geological Survey Sciencebase:

Citations to follow when available.

Overview and definitions

In this vignette, we assume that we have already determined appropriate matching variables and matching criteria and run simulations for the subset cells. Note that this workflow is distinct from a workflow where the simulated sites (subset cells) are selected using kpoints. A workflow for that scenario can be found in the Matching and interpolation with kpoints vignette. Furthermore, although the example used here is derived from the interpolation example in Renne et al. that used a two-step matching process to fit the experimental design described in Bradford et al., (2019), we limit matching in this example to a single step. A workflow that illustrates two-step matching and interpolation can be found in the Two step matching and interpolation without kpoints vignette.

The functions in the rMultivariateMatching package measure similarity between sites as the Euclidean distance of standardized (i.e., weighted) matching variables. Each variable is standardized by dividing it by the maximum amount of difference that would still allow two sites to be considered analogous for that variable. These maximum differences are called the matching criteria. For example, if two sites could be reasonably considered analogous if their mean annual precipitations (MAP) were within 50 mm of each other, the matching criteria for MAP could be set to 50 mm. Distances are calculated within the rMultivariateMatching function using the distances package (Savje 2019).

Definitions

subset cells A set of sites for simulation.
target cells All cells across a study area.
matching variables A set of variables relevant to the ecosystem and simulation model that are used to determine similarity between sites. Matching variables must be available as rasters covering the entire study area.
matching criteria Maximum allowable differences between Target and matched Subset cells for each matching variable such that sites can be considered analogous.

The figure below illustrates an overview of the site selection and interpolation methods and key terminology. In this vignette, we focus on interpolation. First, a set of matching variables are selected and standardized using user-defined matching criteria. Next, a set of subset cells are defined (not using the kpoints function) and all target cells are matched to one of the subset cells (matches are denoted by color in the central panel). Finally, point results (e.g., simulation output) for the subset cells are interpolated across the study area using multivariate matching to produce continuous, high-resolution maps of output, such as plant functional type biomass or ecohydrological variables.

# Load the rMultivariateMatching package
library(rMultivariateMatching)

Prepare inputs

The first step is to use the makeInputdata function to convert a rasterStack (see the raster package (Hijmans 2021) for details) of matching (and other) variables into the correct format to be used in subsequent functions.

# Load targetcells data for Target Cells
data(targetcells)
# Create data frame of potential matching variables for Target Cells
allvars <- makeInputdata(targetcells)

Determine matching criteria

The next step is to determine matching criteria, that is, the maximum allowable difference for each matching variable between target and matched subset cells such that the cells can be considered environmentally analogous. You may have some idea of what matching criteria should be based on your knowledge of the system and processes in question, but it can be useful to evaluate how the proportion of the study area that is represented by the subset cells changes with different matching criteria using the choose_criteria function.

# Restrict data to matching variables of interest
matchingvars <- allvars[,c("cellnumbers","x","y","bioclim_01","bioclim_04",
                           "bioclim_09","bioclim_12","bioclim_15","bioclim_18")]

# Create list of matching criteria to choose:
# Look at 2.5%, 5%, & 10% of range and standard deviation for each variable
range2.5pct <- apply(matchingvars[,4:ncol(matchingvars)],2,
                     function(x){(max(x)-min(x))*0.025})
range5pct <- apply(matchingvars[,4:ncol(matchingvars)],2,
                   function(x){(max(x)-min(x))*0.05})
range10pct <- apply(matchingvars[,4:ncol(matchingvars)],2,
                    function(x){(max(x)-min(x))*0.1})
stddev <- apply(matchingvars[,4:ncol(matchingvars)],2,sd)

# Create criteria_list
criteria_list <- list(range2.5pct, range5pct, range10pct, stddev)

Table displaying `criteria_list`

| bioclim_01 | bioclim_04 | bioclim_09 | bioclim_12 | bioclim_15 | bioclim_18 | |:----------:|:----------:|:----------:|:----------:|:----------:|:----------:| | 0.36 | 10.62 | 0.82 | 32.97 | 1.36 | 4.61 | | 0.73 | 21.23 | 1.64 | 65.93 | 2.72 | 9.21 | | 1.45 | 42.47 | 3.29 | 131.86 | 5.44 | 18.42 | | 2.49 | 60.37 | 7.41 | 170.74 | 13.53 | 34.27 |

The subsetcells dataset that is included in the rMultivariateMatching package is a subset of sites that were simulated in Bradford et al. (2019). The experimental design of that study selected sites using a 10 x 10 km grid across western North American drylands. For each site, they simulated ecohydrological conditions for five different soil types. Thus, we devised a two-step matching method (see the Two step matching and interpolation without kpoints vignette) to fit this experimental design. However, for the purposes of the current example in this vignette, we wanted to demonstrate matching and interpolation based on simple, one-step matching. The following code will simplify the subsetcells dataset to include only sites that were simulated with a silt loam soil with 30% sand and 18% clay (this corresponds to "site_ids" that end in ".2").

# Bring in Subset cell data
data(subsetcells)

# Remove duplicates (representing cells with same climate but different soils--
# we want to match on climate only)
subsetcells <- subsetcells[seq(2,10630, by = 5),]

# Pull out matching variables only, with site_id that identifies unique climate
subsetcells <- subsetcells[,c("site_id","X_WGS84","Y_WGS84","bioclim_01",
                              "bioclim_04","bioclim_09","bioclim_12",
                              "bioclim_15","bioclim_18")]

# Ensure that site_id will be values unique to subsetcells
subsetcells$site_id <- paste0("00",subsetcells$site_id)

Explanation of `choose_criteria` arguments

Please also refer to the choose_criteria documentation.

matchingvars A data frame representing all target cells used to calculate similarity between subset and target cells. This data frame can be created using the makeInputdata function.
criteria_list A list of vectors of matching criteria to be tested.
raster_template This is one of the rasters used to define the target cells and it is used as a template to determine the area represented for each set of matching criteria.
subset_in_target A boolean indicating if the subset cells are taken from within the target cells. This is FALSE in our example.
plot_coverage A boolean indicating if the proportion of the study area represented by each vector of matching criteria in the criteria_list should be displayed in a bar plot. (Set to TRUE in this example).
subsetcells Passed to multivarmatch function if subset_in_target is FALSE (as in this example). This should be a data frame of subset cells with column names corresponding exactly to those in matchingvars and row names should be unique identifiers.
matchingvars_id Name of the column in matchingvars that provides the unique identifiers for target cells.
subsetcells_id Name of the column in subsetcells that provides the unique identifiers for subset cells. Note that these unique identifiers should be numbers, but they may be classified as characters. In particular, if there are repeats among the matchingvars_ids and the subsetcells_ids, you can paste "00" before the subsetcells_ids to ensure they are unique from the matchingvars_ids (as we did in the previous code block).
matching_distance The maximum allowable matching quality value (weighted Euclidean distance) between target and subset cells, when subset_in_target is FALSE. Default value is 1 so that output will be comparable to output from choose_criteria when subset_in_target is TRUE.

# Run choose_criteria function to evaluate different matching criteria
coverage <- choose_criteria(matchingvars = matchingvars, 
                            criteria_list = criteria_list,
                            plot_coverage = TRUE,
                            raster_template = targetcells[[1]],
                            subset_in_target = FALSE,
                            subsetcells = subsetcells,
                            matchingvars_id = "cellnumbers",
                            subsetcells_id = "site_id",
                            matching_distance = 1)

Explanation of `choose_criteria` outputs

The output from the choose_criteria function is a named list where the first item ('totalarea') reports the total area in $km^2$ of the target cells and the second item ('solution_areas') reports the area represented (i.e., Euclidean distance of weighted matching variables is $\le 1$ between target and matched subset cells) for each set of matching criteria.

Perform matching

Use the multivarmatch function to perform multivariate matching. This function will match every target cell to one of the subset cells.

Explanation of `multivarmatch` arguments

Please also refer to the multivarmatch documentation.

matchingvars A data frame representing all target cells. This data frame can be created using the makeInputdata function.
subsetcells A data frame of subset cells with column names corresponding exactly to those in matchingvars. Rownames should be unique identifiers (unique means no repeats among all row names in targetcells and matchingvars.
criteria A vector of matching criteria that correspond to the matching variables. Matching criteria represent the maximum difference allowed for each variable between matched target and subset cells.
matchingvars_id The name of the column that contains the unique identifiers for each target cell. If matchingvars is derived from the makeInputData, this will be "cellnumbers".
subsetcells_id The name of the column that contains the unique identifiers for each subset cell. Note that these unique identifiers should be numbers, but they may be classified as characters. In particular, if there are repeats among the matchingvars_ids and the subsetcells_ids, you can paste "00" before the subsetcells_ids to ensure they are unique from the matchingvars_ids (as we did in this example).
addpoints A boolean indicating whether the subset cells should be added to the map of matching quality as points.
raster_template This is one of the rasters used to define the target cells and it is used as a template for creating a raster of matching quality.
subset_in_target A boolean indicating if the subset cells are taken from the target cells. (FALSE in this example).

Note: the function below does not specify several additional arguments that are left as default values. These include:

saveraster A boolean indicating if a raster of matching quality should be saved to file. (Defaults to FALSE).
plotraster A boolean indicating if a raster of matching quality should be plotted. (Defaults to TRUE).
filepath A destination for a raster of matching quality in the case where saveraster = TRUE. (Defaults to working directory).
overwrite A boolean indicating if the raster::writeRaster function is allowed to overwrite existing files. (Defaults to FALSE).

# Create vector of matching criteria
criteria <- c(0.7,42,3.3,66,5.4,18.4)

# Find matches and calculate matching quality
quals <- multivarmatch(matchingvars, subsetcells, criteria = criteria,
                       matchingvars_id = "cellnumbers",
                       subsetcells_id = "site_id",
                       raster_template = targetcells[[1]],
                       subset_in_target = FALSE, addpoints = FALSE)

Explanation of `multivarmatch` outputs

The output from the multivarmatch function is a data frame of target cells with coordinates ('x','y'), cellnumber of each target cell ('target_cell'), unique id of the matched subset cell ('subset_cell'), and matching quality ('matching_quality'). The function will save a raster of matching quality if saveraster is TRUE and plot a map of matching quality if plotraster is TRUE.

Evaluate matching

Now that we have completed matching, we should evaluate the success of matching. In Renne et al., we suggest four methods of evaluating matching.

Calculate matching quality (weighted Euclidean distance between target and matched subset cells)
Calculate the standard deviation of differences between target and matched subset cells for a set of variables relevant to the project
Calculate geographic distances between matched cells
Leave-one-out cross-validation

1. Calculate matching quality

In addition to identifying the matching subset cell for each target cell, the multivarmatch function also calculates the weighted Euclidean distance between matched cells, which can be interpreted as a continuous variable measuring matching quality. Distances less than or equal to one indicate high-quality matching (matching criteria have been met for all variables). Distances greater than one indicate that the difference between these target cells and their matched subset cell exceeds the matching criterion for one or more variables.

Matching quality can be used to exclude target cells for which there is insufficient matching, i.e., no simulated site is analogous, and to determine the spatial extent of the represented area. In Renne et al., we used a cut-off of 1.5 as a threshold between high/moderate quality and low quality matching, and excluded sites with matching quality >1.5 from the final interpolated datasets.

2. Calculate standard deviation of differences between target and matched subset cells

Matching can also be evaluated by using the evaluateMatching function to calculate the standard deviation of differences between target and subset cells for a set of variables relevant to the project. This provides an estimate of how much variability there is among the target cells matched to each subset cell and is particularly informative if the variables were not used as part of the matching process.

If the rownames of allvars and matches (see below for details) do not match, the evaluateMatching function will fail. Importantly, the function will remove cases with missing values from the allvars and matches data frames and if missing values occur for different cases, the function will fail because the rownames of the two data frames will no longer match.

Here we use 19 bioclim variables (six of which were used as matching variables) and two soil variables to evaluate matching in the current example.

Explanation of `evaluateMatching` arguments

Please also refer to the evaluateMatching documentation.

allvars A data frame generated using makeInputdata where column 1 and rownames are 'cellnumbers', columns 2 and 3 correspond to 'x' and 'y' coordinates, and additional columns correspond to various variables (which can include matching variables).
matches A data frame output from the multivarmatch function.
secondarymatch A boolean indicating if the matches data frame comes from the secondaryMatching function. (FALSE in this example).
quality_name Name of the column in the matches data frame that contains the matching quality variable to use to evaluate matching "matching_quality" or "matching_quality_secondary".
matchingvars_id The name of the column that contains the unique identifiers for each target cell. If matchingvars is derived from the makeInputData, this will be "cellnumbers"
subset_in_target A boolean indicating if the subset cells are taken from the target cells. (FALSE in this example).
matching_distance The maximum allowable weighted Euclidean distance between target and subset cells such that the matched cells can be considered analogous. (Set to 1.5 in this example).
plot_diffs A boolean indicating whether a bar plot of the standard deviation of differences for each variable should be plotted.

# Bring the full subsetcells data in again to get all variables
data(subsetcells)

# Remove duplicates (representing cells with same climate but different soils--
# we want to match on climate only)
subsetcells <- subsetcells[seq(2,10630, by = 5),]

# Get all variables for Subset cells now:
subsetcells <- subsetcells[,c("site_id","X_WGS84","Y_WGS84",
                              names(subsetcells)[8:28])]

# Convert sand and clay fraction to percentage in subsetcells
subsetcells$sand <- subsetcells$sand*100
subsetcells$clay <- subsetcells$clay*100 

# Run evaluateMatching
sddiffs <- evaluateMatching(allvars = allvars, 
                            subsetcells = subsetcells,
                            matches = quals,
                            quality_name = "matching_quality",
                            matchingvars_id = "cellnumbers",
                            subsetcells_id = "site_id",
                            subset_in_target = FALSE, matching_distance = 1.5,
                            plot_diffs = TRUE, secondarymatch = FALSE)

Explanation of `evaluateMatching` outputs

A data frame of the standard deviation of differences between target and matched subset cells for all variables supplied in allvars data frame. The first row corresponds to the standard deviation of differences between target and subset cells for all cells and the second row corresponds to the standard deviation of differences between target and subset cells for only those target cells with matching quality $\le$ matching_distance. Units are the same as the units for each variable in allvars.

3. Calculate geographic distances between matched cells

Importantly, the quality of matching will also depend on the extent to which spatio-temporal patterns are maintained between matched cells. Our methods do not explicitly incorporate geographic proximity. Thus, target cells may be matched to geographically distant subset cells and simulation output from subset cells that are geographically distant may be assigned to adjacent target cells.

Although these cells may be well matched using the matching variables, they may exhibit daily patterns (e.g., precipitation, soil moisture) that are much less similar than would be expected for adjacent cells. To estimate the extent to which spatio-temporal patterns may have been maintained during matching, two measures of distance can be calculated using the evaluateGeoDist function:

The distance between target cells and their matched subset cells
The average distance between the subset cell that is matched to a given target cell and the subset cells that are matched to the eight adjacent neighbors of that target cell.

Explanation of `evaluateGeoDist` arguments

Please also refer to the evaluateGeoDist documentation.

matches A data frame output from the multivarmatch function.
subsetcells A data frame of subset cells with column names corresponding exactly to those in matchingvars. Row names should be unique identifiers (unique means no repeats among all row names in targetcells and matchingvars.
subsetcells_id The name of the column that contains the unique identifiers for each subset cell. Note that these unique identifiers should be numbers, but they may be classified as characters. These should match the 'subset_cell' column in the matches data frame.
subset_in_target A boolean indicating if the subset cells are taken from within the target cells.
exclude_poor_matches A boolean indicating if target cells with poor matching quality (weighted Euclidean distance between target and matched subset cell is greater than matching_distance) should be excluded from calculations.
quality_name Name of the column in the matches data frame that contains the matching quality variable to use to evaluate matching. Must be either "matching_quality" or "matching_quality_secondary".
matching_distance The maximum allowable weighted Euclidean distance between target and subset cells such that the matched cells can be considered analogous. (Set to 1.5 in this example).
longlat A boolean to pass to an internal function. Indicates if the coordinates are in longitude and latitude format for calculating distances between points. Default value is TRUE and coordinates must be provided in this format.
raster_template This is one of the rasters used to define the target cells and it is used as a template for creating rasters of distances.
map_distances A boolean indicating whether to plot a map of distances between target and matched subset cells.
map_neighbor_distances A boolean indicating whether to plot a map of the average distance between the subset cell matched to each target cell and the subset cells matched to the eight adjacent neighbors of that cell.
which_distance A character indicating which measures of distance should be calculated. Set to "both" here, see documentation for other options.
saverasters A boolean indicating if the rasters of distances should be saved to file.

# Look at geographic distances
geodist <- evaluateGeoDist(matches = quals, subsetcells = subsetcells,
                          subsetcells_id = 'site_id', 
                          subset_in_target = FALSE, 
                          quality_name = "matching_quality",
                          exclude_poor_matches = TRUE, matching_distance = 1.5,
                          longlat = TRUE, raster_template = targetcells[[1]],
                          map_distances = TRUE, map_neighbor_distances = TRUE,
                          which_distance = "both",saverasters = FALSE)

Distance (km) between target and matched subset cells.

Average distance (km) between the subset cell that is matched to a given target cell and the subset cells that are matched to the eight adjacent neighbors of that target cell.

Explanation of `evaluateGeoDist` outputs

A data frame with the distance between target and matched subset cells ('target_to_subset_distance') and the average distance between the subset cell matched to each target cell and the subset cells matched to the eight adjacent target cells ('avgdistance_to_neighbors'). The first column and the rownames correspond to the unique identifiers for the target cells, and columns 2 and 3 correspond to the 'x' and 'y' coordinates of the target cells.

4. Leave-one-out cross-validation

Once simulations are complete for the subset cells, interpolation errors can be estimated using leave-one-out cross validation (LOOCV) with the loocv function. In LOOCV, each subset cell is matched to its nearest neighbor from among the remaining subset cells (using Euclidean distance of the weighted matching variables). Then, an estimate of the interpolation error can be calculated using the following equation:

$$CVerror=\sqrt{\frac{1}k∑_{i=1}^{k}(y_i-\hat y_i)^2}$$

Where $k$ is the number of subset cells, $y$ is the value of the simulated output variable, and $\hat y$ is the value of the matched output variable.

For our example here, we have simulation output generated by simulations used in Bradford et al. (2019).

# Bring in subsetcells dataset again to get output variables
data(subsetcells)

# Remove duplicates (representing cells with same climate but different soils--
# we want to match on climate only)
subsetcells <- subsetcells[seq(2,10630, by = 5),]

# Create a mock dataset of output results
output_results <- subsetcells[,c("site_id","Dryprop","CwetWinter","CdrySummer",
                                 "Cwet8","Dryall","Dryany")]
rownames(output_results) <- output_results$site_id

Explanation of `loocv` arguments

Please also refer to the loocv documentation.

matchingvars A data frame that includes all matching variables for the subset cells. Rownames should correspond to the unique identifiers for each subset cell. The first two columns correspond to 'x' and 'y' coordinates of the subset cells. The rest of the columns correspond to the matching variables.
output_results A data frame of simulation output results for all simulated sites (subset cells). The first column and the rownames should correspond to the unique identifiers for the subset cells, and must match the rownames in matchingvars exactly.
criteria1 A vector of matching criteria that correspond to the matching variables. Matching criteria represent the maximum difference allowed for each variable between matched target and subset cells, and should be the same criteria used in multivarmatch.
secondarymatch A boolean indicating if a secondary matching step should be completed. We only used one step of matching in this example, so it is set to FALSE.
n_neighbors The number of nearest neighbors to search for among the subset cells. To achieve leave-one-out cross-validation, this number must be set to 2. The nearest neighbor of each subset cells is itself, so the second nearest neighbor will correspond the closest non-self neighbor.

# Bring in Subset cell data
data(subsetcells)

# Remove duplicates (representing cells with same climate but different soils--
# we want to match on climate only)
subsetcells <- subsetcells[seq(2,10630, by = 5),]

# Pull out matching variables only, with site_id that identifies unique climate
subsetcells_matchingvars <- subsetcells[,c("site_id","X_WGS84","Y_WGS84",
                                           "bioclim_01","bioclim_04","bioclim_09",
                                           "bioclim_12","bioclim_15","bioclim_18")]
names(subsetcells_matchingvars)[2:3] <- c("x","y")

# Use site_id as rownames and remove
rownames(subsetcells_matchingvars) <- subsetcells_matchingvars$site_id
subsetcells_matchingvars <- subsetcells_matchingvars[,-1]

# Run leave-one-out cross validation of the output result
loocv_results <- loocv(matchingvars = subsetcells_matchingvars,
                       output_results = output_results,
                       criteria1 = criteria, 
                       secondarymatch = FALSE, n_neighbors = 2)

Explanation of `loocv` outputs

The output from the loocv function is a data frame of cross-validated matching, with coordinates ('x','y') and cellnumber of each subset cell ('target_cell'), unique id of matched non-self subset cell ('subset_cell') and matching quality ('matching_quality'). Subsequent columns give the difference between simulated output values and the matched output values for each cell.

The loocv function output can be used to visualize the distribution of the matching errors.

# Visualize distribution of matching errors
par(mfrow = c(3,2), mar = c(2,2,2,1), mgp = c(1,0.2,0), tcl = 0.2)
for (i in 6:11){
  hist(loocv_results[,i], breaks = 30,
       main = paste0("Matching errors: ", 
                     gsub("_diff","",names(loocv_results)[i])),
       xlab = "Simulated - Matched")
  abline(h = 0)
  legend("topright", 
         legend = c(paste0("cv-error = ", 
                           round(sqrt(mean(loocv_results[,i]^2)),2)),
                  paste0("mean = ", round(mean(loocv_results[,i]),2))),
         cex = 0.9, bty= "n", y.intersp = 0.7, x.intersp = 0.7)

  box()
}

Calculate matching errors

Finally, matching errors for each output variable can be estimated from the output of loocv using the cverrors function.

Explanation of `cverrors` arguments

loocv_output Output from loocv function.
first_output_column The column number in the loocv_output data frame that corresponds to the first column of output variables (after the various columns that provide information on matching).

# Estimate matching errors for each output variable
est_errors <- cverrors(loocv_output = loocv_results, 
                       first_output_column = 6)

   Dryprop  CwetWinter  CdrySummer       Cwet8      Dryall      Dryany 
0.06451686 14.26300399  8.56043378  6.01712207 32.49556186 32.21232009

Interpolate output variables

Finally, the iterpolatePoints function can be used to interpolate results from simulated subset cells to all target cells across the study area by assigning simulation results from the subset cells to their matched target cells.

For our example here, we will again use our set of output results from Bradford et al. (2019).

Explanation of `interpolatePoints` arguments

Please also refer to the interpolatePoints documentation.

matches A data frame output from the multivarmatch function.
output_results A data frame containing simulation output results for all simulated sites (subset cells). The first column and the rownames should correspond to the unique identifiers for the subset cells. Importantly, these identifiers need to match the identifiers in the 'subset_cell' column of the matches data frame.
exclude_poor_matches A boolean indicating if target cells with poor matching quality (weighted Euclidean distance between target and matched subset cell is greater than matching_distance) should be excluded from calculations.
subset_cell_names The name of the column in the matches data frame that contains the unique identifiers for the subset cells matched to each target cell.
quality_name The name of the column in the matches data frame that contains the matching quality for the subset cell matched to each target cell. Must be either "matching_quality" or "matching_quality_secondary".
matching_distance The maximum allowable weighted Euclidean distance between target and subset cells such that the matched cells can be considered analogous. (Set to 1.5 in this example).
raster_template This is one of the rasters used to define the target cells and it is used as a template for creating interpolated output variables.
plotraster A boolean indicating if a raster of matching quality should be plotted. (Defaults to TRUE).
overwrite A boolean indicating if the raster::writeRaster function is allowed to overwrite existing files. (Defaults to FALSE).

# Interpolate simulation output to rasters
interpolatePoints(matches = quals, output_results = output_results,
                  exclude_poor_matches = TRUE,
                  subset_cell_names = "subset_cell",
                  quality_name = "matching_quality",
                  matching_distance = 1.5, raster_template = targetcells[[1]],
                  plotraster = TRUE, overwrite = FALSE)

Interpolated output of DryPROP.

Interpolated output of CwetWinter.

Interpolated output of CdrySummer.

Interpolated output of Cwet8.

Interpolated output of Dryall.

Interpolated output of Dryany.

References

Bradford, J. B., Schlaepfer, D. R., Lauenroth, W. K., Palmquist, K. A., Chambers, J. C., Maestas, J. D., & Campbell, S. B. (2019). Climate-driven shifts in soil temperature and moisture regimes suggest opportunities to enhance assessments of dryland resilience and resistance. Front. Ecol. Evol. 7:358. https://doi.org/10.3389/fevo.2019.00358

Hijmans, R. J. (2021). raster: Geographic Data Analysis and Modeling. R package version 3.4-13. https://CRAN.R-project.org/package=raster

Schlaepfer, D. R., Lauenroth, W. K., & Bradford, J. B. (2012). Ecohydrological niche of sagebrush ecosystems. Ecohydrology 5: 453-466. https://doi.org/10.1002/eco.238

Schlaepfer, D. R., & Andrews, C. A. (2018). rSFSW2: Simulation Framework for SOILWAT2. R package version 3.0.0.

Schlaepfer, D. R., & Murphy, R. (2018). rSOILWAT2: An Ecohydrological Ecosystem-Scale Water Balance Simulation Model. R package version 2.3.2.

Savje, F. (2019). distances: Tools for Distance Metrics. R package version 0.1.8. https://CRAN.R-project.org/package=distances

DrylandEcology/rMultivariateMatching documentation built on Dec. 17, 2021, 5:30 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DrylandEcology/rMultivariateMatching
Multivariate matching for site selection and interpolation using multi-dimensional datasets

In DrylandEcology/rMultivariateMatching: Multivariate matching for site selection and interpolation using multi-dimensional datasets

Introduction

Examples of the interpolation methods in print

Overview and definitions

Definitions

Prepare inputs

Determine matching criteria

Table displaying `criteria_list`

Explanation of `choose_criteria` arguments

Explanation of `choose_criteria` outputs

Perform matching

Explanation of `multivarmatch` arguments

Explanation of `multivarmatch` outputs

Evaluate matching

1. Calculate matching quality

2. Calculate standard deviation of differences between target and matched subset cells

Explanation of `evaluateMatching` arguments

Explanation of `evaluateMatching` outputs

3. Calculate geographic distances between matched cells

Explanation of `evaluateGeoDist` arguments

Explanation of `evaluateGeoDist` outputs

4. Leave-one-out cross-validation

Explanation of `loocv` arguments

Explanation of `loocv` outputs

Calculate matching errors

Explanation of `cverrors` arguments

Interpolate output variables

Explanation of `interpolatePoints` arguments

References

R Package Documentation

Browse R Packages

We want your feedback!

DrylandEcology/rMultivariateMatching Multivariate matching for site selection and interpolation using multi-dimensional datasets

In DrylandEcology/rMultivariateMatching: Multivariate matching for site selection and interpolation using multi-dimensional datasets

Introduction

Examples of the interpolation methods in print

Overview and definitions

Definitions

Prepare inputs

Determine matching criteria

Table displaying criteria_list

Explanation of choose_criteria arguments

Explanation of choose_criteria outputs

Perform matching

Explanation of multivarmatch arguments

Explanation of multivarmatch outputs

Evaluate matching

1. Calculate matching quality

2. Calculate standard deviation of differences between target and matched subset cells

Explanation of evaluateMatching arguments

Explanation of evaluateMatching outputs

3. Calculate geographic distances between matched cells

Explanation of evaluateGeoDist arguments

Explanation of evaluateGeoDist outputs

4. Leave-one-out cross-validation

Explanation of loocv arguments

Explanation of loocv outputs

Calculate matching errors

Explanation of cverrors arguments

Interpolate output variables

Explanation of interpolatePoints arguments

References

R Package Documentation

Browse R Packages

We want your feedback!

DrylandEcology/rMultivariateMatching
Multivariate matching for site selection and interpolation using multi-dimensional datasets

Table displaying `criteria_list`

Explanation of `choose_criteria` arguments

Explanation of `choose_criteria` outputs

Explanation of `multivarmatch` arguments

Explanation of `multivarmatch` outputs

Explanation of `evaluateMatching` arguments

Explanation of `evaluateMatching` outputs

Explanation of `evaluateGeoDist` arguments

Explanation of `evaluateGeoDist` outputs

Explanation of `loocv` arguments

Explanation of `loocv` outputs

Explanation of `cverrors` arguments

Explanation of `interpolatePoints` arguments