evaluateMatching: Evaluate matching for additional variables

Description Usage Arguments Value Author(s) Examples

View source: R/evaluateMatching.R

Description

Calculate the standard deviation of differences between Subset and Target cells for a set of variables relevant to the project. This is most informative if it includes variables not used for matching.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
evaluateMatching(
  allvars = NULL,
  subsetcells = NULL,
  matches = NULL,
  secondarymatch = FALSE,
  quality_name = "matching_quality",
  matchingvars_id = "cellnumbers",
  subsetcells_id = NULL,
  subset_in_target = TRUE,
  matching_distance = 1.5,
  plot_diffs = TRUE
)

Arguments

allvars

data frame generated using makeInputdata or formatted such that: column 1 and rownames are 'cellnumbers' extracted using the extract function, columns 2 and 3 correspond to x and y coordinates, and additional columns correspond to various variables (which can include matching variables) that have been extracted to points using the rasterToPoints function. These data represent Target cells (and may also represent Subset cells if subset_in_target is TRUE).

subsetcells

if subset_in_target is TRUE, this should be a data frame of coordinates (expects coordinates in columns named 'x' and 'y') for Subset cells. May be extracted from output from kpoints function or provided separately. Row names should be unique identifiers for each point (unique means no repeats in rownames of subsetcells if subset_in_target is TRUE). If subset_in_target is FALSE, this should be a data frame of subset cells with column names corresponding exactly to those in matchingvars and row names should be unique identifiers (unique means no repeats among all row names in targetcells and matchingvars if subset_in_target is FALSE). See subset_in_target.

matches

data frame output from the multivarmatch or secondaryMatching functions.

secondarymatch

boolean. Indicates if the matches data frame comes from the secondaryMatching function.

quality_name

character. Name of the column in the matches data frame that contains the matching quality variable to use to evaluate matching "matching_quality" or "matching_quality_secondary". Defaults to "matching_quality"

matchingvars_id

character or numeric. Refers to the column in matchingvarsthat provides the unique identifiers for target cells. Defaults to "cellnumbers", which is the unique ID column created by makeInputdata.

subsetcells_id

character or numeric, but must be composed of numbers and convertable to numeric. Refers to the column in subsetcellsthat provides the unique identifiers for Subset cells. When subset_in_target is TRUE, these ids must be unique from matchingvars_ids. Note that if there are repeats between thematchingvars_ids and the subsetcells_ids, you can paste "00" before the subsetcells_ids to ensure they are unique from the matchingvars_ids. Defaults to NULL.

subset_in_target

boolean. Indicates if Subset cells have been selected from Target cells using kpoints function

matching_distance

numeric. Gives the maximum allowable matching quality value (weighted Euclidean distance) between Target and Subset cells. Default value is 1.5.

plot_diffs

boolean. Indicates whether a barplot of differences should be displayed.

Value

Data frame of the standard deviation of differences between Target and their matched Subset cells for all variables supplied in allvars data frame. The first row corresponds to the standard deviation of differences between Target and Subset cells for all cells and the second row corresponds to the standard deviation of differences between Target and Subset cells for only those Target cells with matching quality <= matching_distance. Units are the same as the units for each variable in allvars.

Author(s)

Rachel R. Renne

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# Load targetcells data for Target Cells (from rMultivariateMatchingAlgorithms package)
data(targetcells)

# Create data frame of potential matching variables for Target Cells
allvars <- makeInputdata(targetcells)

# Subset to include only matching variables
matchingvars <- allvars[,c("cellnumbers","x","y","bioclim_01","bioclim_04",
                        "bioclim_09","bioclim_12","bioclim_15","bioclim_18")]

# Create raster_template
raster_template <- targetcells[[1]]

# Create vector of matching criteria
criteria <- c(0.7,42,3.3,66,5.4,18.4)

# Find solution for k = 200
# Note: n_starts should be >= 10, it is 1 here to reduce run time.
results1 <- kpoints(matchingvars,criteria = criteria,
                   klist = 200, n_starts = 1, min_area = 50, iter = 50,
                   raster_template = raster_template)


###################################
# First an example where subset_in_target = TRUE

# Get points from solution to kpoints algorithm
subsetcells <- results1$solutions[[1]]

# Find matches and calculate matching quality
quals <- multivarmatch(matchingvars, subsetcells,
                       criteria = criteria,
                       matchingvars_id = "cellnumbers",
                       raster_template = raster_template,
                       subset_in_target = TRUE)

# Run evaluateMatching
sddiffs <- evaluateMatching(allvars = allvars, matches = quals,
                            matchingvars_id = "cellnumbers",
                            secondarymatch = FALSE,
                            subset_in_target = TRUE,
                            matching_distance = 1.5,
                            plot_diffs = TRUE)

###################################
# Now an example where subset_in_target is FALSE
rm(subsetcells)
# Get subsetcells
data(subsetcells)

# Remove duplicates (representing cells with same climate but different
# soils--we want to match on climate only)
subsetcells <- subsetcells[!duplicated(subsetcells$site_id),]

# Pull out matching variables only, with site_id that identifies unique climate
subsetcells1 <- subsetcells[,c("site_id","X_WGS84","Y_WGS84","bioclim_01",
                               "bioclim_04","bioclim_09","bioclim_12",
                               "bioclim_15","bioclim_18")]

# Ensure that site_id will be values unique to subsetcells
subsetcells1$site_id <- paste0("00",subsetcells$site_id)

# Find matches and calculate matching quality
quals <- multivarmatch(matchingvars,
                       subsetcells=subsetcells1,
                       criteria = criteria,
                       matchingvars_id = "cellnumbers",
                       subsetcells_id = "site_id",
                       raster_templat = raster_template,
                       subset_in_target = FALSE)

# Remove previous subsetcells
rm(subsetcells)
# Get subsetcells
data(subsetcells)

# Remove duplicates (representing cells with same climate but different
# soils--we want to match on climate only)
subsetcells <- subsetcells[!duplicated(subsetcells$site_id),]

# Get all variables for Subset cells now:
subsetcells <- subsetcells[,c("site_id","X_WGS84","Y_WGS84",
                              names(allvars)[4:24])]

# Run evaluateMatching
sddiffs <- evaluateMatching(allvars = allvars[,c(1:24)],
                            subsetcells = subsetcells,
                            secondarymatch = FALSE,
                            quality_name = "matching_quality",
                            matches = quals,
                            matchingvars_id = "cellnumbers",
                            subsetcells_id = "site_id",
                            subset_in_target = FALSE,
                            matching_distance = 1.5,
                            plot_diffs = TRUE)

DrylandEcology/rMultivariateMatching documentation built on Dec. 17, 2021, 5:30 p.m.