secondaryMatching: Run a second level of matching

Description Usage Arguments Value Author(s) Examples

View source: R/secondaryMatching.R

Description

In a case where matching on different kinds of variables is required, it can be useful to run a second level of matching. This function is only useful for a case where experimental (simulation) design includes multiple "treatments" (e.g., soil types, aspect) for each set of environmental characteristics (e.g., climate).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
secondaryMatching(
  secondaryvars = NULL,
  matches = NULL,
  subsetcells = NULL,
  secondaryvars_id = "cellnumbers",
  reference_treatment = "1",
  subsetcells_id = NULL,
  criteria = 1,
  other_treatments = NULL,
  is_loocv = FALSE,
  raster_template,
  subset_in_target = FALSE,
  saveraster = FALSE,
  plotraster = TRUE,
  filepath = getwd(),
  overwrite = FALSE,
  ...
)

Arguments

secondaryvars

data frame generated using makeInputdata, or a subset of such a data frame, and/or formatted such that: column 1 and rownames are 'cellnumbers' extracted using the extract function, columns 2 and 3 correspond to x and y coordinates, and additional columns correspond to a secondary set of matching variables extracted using the rasterToPoints function. These data represent Target cells.

matches

data frame. Output returned from multivarmatch using primary matching variables (e.g., climate variables).

subsetcells

data frame with columns that correspond to those in secondaryvars. Currently, there is no functionality for subset_in_target = TRUE, so subset cells should represent a separate set of simulated (Subset) cells. This function is designed to handle an experimental design in which there is a "reference" treatment (which can either be the same for all Subset cells or site-specific (e.g., site-specific soils)) and a series of "other" treatments that are the same across all Subset cells. Thus, the subsetcells data frame should contain secondary variables that correspond to only one treatment.

secondaryvars_id

character or numeric. Refers to the column in matchingvarsthat provides the unique identifiers for Target cells. Defaults to "cellnumbers", which is the unique ID column created by makeInputdata.

reference_treatment

character. Designates the reference treatment identifier that will be pasted onto the subsetcells_id to generate unique identifiers for each treatment. Default value is "1".

subsetcells_id

character or numeric, but must be composed of numbers and convertable to numeric. Refers to the column in subsetcellsthat provides the unique identifiers for Subset cells. When subset_in_target is TRUE, these ids must be unique from matchingvars_ids. Note that if there are repeats between thematchingvars_ids and the subsetcells_ids, you can paste "00" before the subsetcells_ids to ensure they are unique from the matchingvars_ids. Defaults to NULL.

criteria

single value or vector of length equal to the number of secondary variables, where values correspond to the matching criterion for each secondary variable secondaryvars. If a single value, this will be used as matching criteria for all variables. Default value is 1, corresponding to using raw data for matching.

other_treatments

data frame. Provides secondary variables for "other" treatments that are common among all Subset cells (e.g., a set of soil types that are simulated for each site). The column names should correspond to the secondary variable names and the rownames will designate the treatment identifiers that will be pasted onto the subsetcells_id to generate unique identifiers for each treatment.

is_loocv

boolean. Indicates whether the function is being used as part of leave-one-out cross-validation. Usually only passed to this function from loocv.

raster_template

one of the raster layers used for input data.

subset_in_target

boolean. Defaults to FALSE. There is no functionality for TRUE at this time.

saveraster

boolean. Indicates if raster of matching quality should be saved to file. Default is FALSE.

plotraster

boolean. Indicates if raster of matching quality should be saved to file. Default is TRUE.

filepath

provides path for location where raster will be saved. Defaults to working directory.

overwrite

boolean. Indicates whether writeRaster should overwrite existing files with the same name in filepath. Defaults to FALSE.

...

additional parameters to pass to legendPlot.

Value

Data frame of Target cells with coordinates ('x','y'), cellnumber of Target cell ('target_cell'), unique id of matched Subset cell ('subset_cell'), matching quality ('matching_quality'), unique id of the Subset cell matched with secondary matching criteria ('subset_cell_secondary'), and matching quality of this secondary match ('matching_quality_secondary'). Will save a raster of matching quality if saveraster is TRUE and plot a map of matching quality if plotraster is TRUE.

Author(s)

Rachel R. Renne

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# Load targetcells data for Target Cells (from rMultivariateMatchingAlgorithms package)
data(targetcells)

#Create data frame of potential matching variables for Target Cells
allvars <- makeInputdata(targetcells)

# Restrict data to matching variables of interest
matchingvars <- allvars[,c("cellnumbers","x","y","bioclim_01",
                       "bioclim_04","bioclim_09","bioclim_12","bioclim_15",
                       "bioclim_18")]

# Create vector of matching criteria
criteria <- c(0.7,42,3.3,66,5.4,18.4)

###################################
# For example with subset_in_target = FALSE
# (There is no functionality for subset_in_target = TRUE at this time)

# Get points from solution to kpoints algorithm
data(subsetcells)

# Remove duplicates (representing cells with same climate but different soils--
# we want to match on climate only)
subsetcells <- subsetcells[!duplicated(subsetcells$site_id),]

# Pull out matching variables only, with site_id that identifies unique climate
subsetcells <- subsetcells[,c("site_id","X_WGS84","Y_WGS84","bioclim_01",
                           "bioclim_04","bioclim_09","bioclim_12",
                           "bioclim_15","bioclim_18")]

# Create raster_template
raster_template <- targetcells[[1]]

# Ensure that site_id will be values unique to subsetcells
subsetcells$site_id <- paste0("00",subsetcells$site_id)

# Find matches and calculate matching quality
quals <- multivarmatch(matchingvars, subsetcells=subsetcells,
                       criteria = criteria,
                       matchingvars_id = "cellnumbers",
                       subsetcells_id = "site_id",
                       raster_template = raster_template,
                       subset_in_target = FALSE)

# Subset to include only secondaryvars
secondaryvars <- allvars[,c("cellnumbers","x","y","sand","clay")]

# Remove previous subsetcells
rm(subsetcells)

# Bring in secondary id variable from subsetcells
data(subsetcells)

# Remove duplicates (keeping only site-specific soils with site_ids ending
# in ".1").
subsetcells <- subsetcells[!duplicated(subsetcells$site_id),]

# Pull out matching variables only, with site_id that identifies unique climate
subsetcells <- subsetcells[,c("site_id","X_WGS84","Y_WGS84",
                              "sand","clay"),]

# Convert sand and clay to percentage from fraction
subsetcells$sand <- subsetcells$sand*100
subsetcells$clay <- subsetcells$clay*100

# Make sure subsetcell ids are unique
subsetcells$site_id <- paste0("00",subsetcells$site_id)

# Bring in "other" treatments
data(setsoiltypes)
other_treatments = setsoiltypes

# Calculate criteria
criteria = c((max(secondaryvars$sand,na.rm = TRUE)-
                 min(secondaryvars$sand,na.rm = TRUE))/10,
             (max(secondaryvars$clay,na.rm = TRUE)-
                 min(secondaryvars$clay,na.rm = TRUE))/10)

# Run secondary matching on soils data
quals2 <- secondaryMatching(secondaryvars = secondaryvars, matches = quals,
                            subsetcells=subsetcells,subsetcells_id = "site_id",
                            subset_in_target = FALSE, criteria = criteria,
                            raster_template = raster_template,
                            reference_treatment = "1",
                            other_treatments = other_treatments)

DrylandEcology/rMultivariateMatching documentation built on Dec. 17, 2021, 5:30 p.m.