choose_criteria: Determine matching criteria

Description Usage Arguments Value Author(s) Examples

View source: R/setupVars.R

Description

Examine summary statistics of matching variables to determine matching criteria

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
choose_criteria(
  matchingvars = NULL,
  criteria_list = NULL,
  k = 200,
  plot_coverage = TRUE,
  raster_template = NULL,
  subset_in_target = TRUE,
  subsetcells = NULL,
  matchingvars_id = "cellnumbers",
  subsetcells_id = "site_id",
  matching_distance = 1,
  ...
)

Arguments

matchingvars

data frame created using makeInputdata or formatted such that: rownames are 'cellnumbers' extracted using the extract function, columns 2 and 3 correspond to x and y coordinates, and additional columns correspond to potential matching variables extracted using the rasterToPoints function.

criteria_list

list of matching criteria to test in the kpoints function. Each item in the list should be a vector of values (possible matching criteria) corresponding to each matching variable.

k

number of points to use to find solution using kpoints function. Default value is 200.

plot_coverage

boolean. Indicates whether the algorithm should display a barplot of the coverage for each set of criteria. Default is TRUE.

raster_template

one of the raster layers used for input data. See area. Note that 'cellnumbers' column must be present for kpoints function to work within this function.

subset_in_target

boolean. Indicates if Subset cells have been selected from Target cells using kpoints function

subsetcells

data frame. Passed to multivarmatch function if subset_in_target is FALSE. This should be a data frame of subset cells with column names corresponding exactly to those in matchingvars and row names should be unique identifiers.

matchingvars_id

character or numeric. Refers to the column in matchingvarsthat provides the unique identifiers for Target cells. Defaults to "cellnumbers", which is the unique ID column created by makeInputdata.

subsetcells_id

character or numeric, but must be composed of numbers and convertable to numeric. Refers to the column in subsetcellsthat provides the unique identifiers for Subset cells. When subset_in_target is TRUE, these ids must be unique from matchingvars_ids. Note that if there are repeats between the matchingvars_ids and the subsetcells_ids, you can paste "00" before the subsetcells_ids to ensure they are unique from the matchingvars_ids. Defaults to "site_id".

matching_distance

Gives the maximum allowable matching quality value (weighted Euclidean distance) between Target and Subset cells, when subset_in_target is FALSE. Default value is 1 so that output will be comparable to output from choose_criteria when subset_in_target is TRUE.

...

accepts additional parameters to kpoints function.

Value

The output from the choose_criteria function is a named list where the first item ('totalarea') reports the total area in km2 of the Target cells and the second item ('solution_areas') reports the area represented (Euclidean distance of weighted matching variables <= 1 between Target and matched subset cells) for each set of matching criteria.

Author(s)

Rachel R. Renne

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# Load targetcells data for Target Cells
data(targetcells)

# Create data frame of potential matching variables for Target Cells
allvars <- makeInputdata(targetcells)

# Restrict data to matching variables of interest
matchingvars <- allvars[,c("cellnumbers","x","y","bioclim_01","bioclim_04",
                      "bioclim_09","bioclim_12","bioclim_15","bioclim_18")]

# Create list of matching criteria to choose:
# Look at 2.5%, 5%, & 10% of range and standard deviation for each variable
range2.5pct <- apply(matchingvars[,4:ncol(matchingvars)],2,
                     function(x){(max(x)-min(x))*0.025})
range5pct <- apply(matchingvars[,4:ncol(matchingvars)],2,
                     function(x){(max(x)-min(x))*0.05})
range10pct <- apply(matchingvars[,4:ncol(matchingvars)],2,
                     function(x){(max(x)-min(x))*0.1})
stddev <- apply(matchingvars[,4:ncol(matchingvars)],2,sd)

# Create a list of criteria
criteria_list <- list(range2.5pct, range5pct, range10pct, stddev)

###################################
# First an example where subset_in_target = TRUE
# Compare coverage with various criteria

# Create raster_template
raster_template <- targetcells[[1]]

# Note: n_starts should be >= 10, it is 1 here to reduce run time.
results2 <- choose_criteria(matchingvars, criteria_list = criteria_list,
                            n_starts = 1, k = 200,
                            raster_template = raster_template,
                            subset_in_target = TRUE, plot_coverage = TRUE)

###################################
# Now an example where subset_in_target is FALSE
# Bring in Subset cell data
data(subsetcells)

# Remove duplicates (representing cells with same climate but different soils--
# we want to match on climate only)
subsetcells <- subsetcells[!duplicated(subsetcells$site_id),]

# Pull out matching variables only, with site_id that identifies unique climate
subsetcells <- subsetcells[,c("site_id","X_WGS84","Y_WGS84","bioclim_01",
                           "bioclim_04","bioclim_09","bioclim_12",
                           "bioclim_15","bioclim_18")]

# Ensure that site_id will be values unique to subsetcells
subsetcells$site_id <- paste0("00",subsetcells$site_id)

# Create raster_template
raster_template <- targetcells[[1]]

# Run choose_criteria function to evaluate different matching criteria
coverage <- choose_criteria(matchingvars = matchingvars,
                            criteria_list = criteria_list,
                            plot_coverage = TRUE,
                            raster_template = raster_template,
                            subset_in_target = FALSE,
                            subsetcells = subsetcells,
                            matchingvars_id = "cellnumbers",
                            subsetcells_id = "site_id")

DrylandEcology/rMultivariateMatching documentation built on Dec. 17, 2021, 5:30 p.m.