Description Usage Arguments Details Value Author(s) Examples
Use leave-one-out cross-validation of simulated sites to evaluate matching errors. This function takes each Subset cell and finds its nearest neighbor from among the remaining Subset cells using weighted (standardized) Euclidean distance of the matching variables, then calculates the differences between the simulated value of output variables for each Subset cell and the simulated value of output variables from its nearest neighbor.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
matchingvars |
a data frame that includes all matching variables for the Subset cells. Rownames should correspond to the unique identifiers for each Subset cell. The first two columns correspond to 'x' and 'y' coordinates of the Subset cells (if none exist, use "NA"). The rest of the columns correspond to the matching variables. |
secondaryvars |
a data frame that includes the secondary matching variables
for the Subset cells. The first column should correspond to the unique identifier
for each Subset cell, and the next two columns should correspond to the 'x' and
'y' coordinates of the Subset cells (if non exist, use "NA"). The rest of the
columns correspond to the secondary matching variables. Only needed if
|
output_results |
data frame. Simulation output results for all simulated
sites (Subset cells). The first column and the rownames should correspond to
the unique identifiers for the Subsetcells. If |
criteria1 |
single value or vector of length equal to the number of matching variables, where values correspond to the matching criterion for each matching variable in 'matchingvars'. If a single value, this will be used as matching criteria for all variables. Default value is 1, corresponding to using raw data for matching. |
criteria2 |
single value or vector of length equal to the number of
secondary variables, where values correspond to the matching criterion for
each secondary variable |
secondarymatch |
boolean. Indicates whether the function should run secondary matching on the Subset cells. Defaults to TRUE |
secondaryvars_id |
character. Provides the column name for the unique
identifiers in the 'secondaryvars' data frame (should be the first column).
Defaults to "cellnumbers". Only needed if |
reference_treatment |
character. Designates a number to represent the
reference treatment. Default value is '1'. Only needed if |
n_neighbors |
numeric. The number of nearest neighbors to search for among the Subset cells. To achieve leave-one-out cross-validation, this number must be set to 2. The nearest neighbor of each Subset cells is itself, so the second nearest neighbor will correspond the closest non-self neighbor. Default value is 2. |
other_treatments |
a data frame that gives the secondary variables for the
set treatments. The rownames should correspond to unique identifiers for each
treatment (e.g., 2-total number of treatments if the |
... |
additional parameters to be passed to |
This function can be used for matching achieved with the multivarmatch
function or matching that uses two-step matching, first with
multivarmatch
, followed by secondaryMatching
. In
case of the latter, loocv
assumes a very specific experimental design. For
each Subsetcell cell, there are five different soil types. There is one site-specific
soil type that is unique to each Subset cell and four set soil soil types that
were simulated for all Subset cells. In this case, the first step of matching
uses only climate variables and the second step of matching identifies the
best soil type from among the 5 available for the Subset cell with the best
match based on the climate variables.
Data frame of Target cells with coordinates ('x','y'), cellnumber of
Target cell ('target_cell'), unique id of matched Subset cell ('subset_cell')
and matching quality ('matching_quality'). (If secondarymatch
is TRUE, it will
also include the unique id of the Subset cell matched with secondary matching
criteria ('subset_cell_secondary'), and matching quality of this secondary match
('matching_quality_secondary')). Additional columns correspond to the difference
between the simulated values of output variables for each Subset cell and the
simulated values of output variables from its nearest neighbor. These values
can be squared and averaged to get the average squared cross-validated
matching error for each output variable.
Rachel R. Renne
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | ########################
# First, an example where secondarymatch = FALSE
# Load targetcells data for Target Cells
data(targetcells)
# Create data frame of potential matching variables for Target Cells
allvars <- makeInputdata(targetcells)
# Restrict data to matching variables of interest
matchingvars <- allvars[,c("cellnumbers","x","y","bioclim_01","bioclim_04",
"bioclim_09","bioclim_12","bioclim_15","bioclim_18")]
# Create vector of matching criteria
criteria <- c(0.7,42,3.3,66,5.4,18.4)
# Create raster template
raster_template = targetcells[[1]]
# Find solution for k = 200
# Note: n_starts should be >= 10, it is 1 here to reduce run time.
results1 <- kpoints(matchingvars,criteria = criteria,klist = 200,
n_starts = 1,min_area = 50,iter = 50,
raster_template = raster_template)
# Get points from solution to kpoints algorithm
subsetcells <- results1$solutions[[1]]
# Create a mock dataset of output results
output_results <- allvars[rownames(subsetcells),c("cellnumbers","bioclim_02",
"bioclim_03","bioclim_16",
"bioclim_17")]
# Create dataset of matchingvars for subsetcells
subset_matchingvars <- allvars[rownames(subsetcells),-1]
subset_matchingvars <- subset_matchingvars[,c("x","y","bioclim_01","bioclim_04",
"bioclim_09","bioclim_12",
"bioclim_15","bioclim_18" )]
# Run leave-one-out cross validation of mock output results
loocv_results <- loocv(matchingvars = subset_matchingvars,
output_results = output_results,
criteria1 = criteria,
secondarymatch = FALSE, n_neighbors = 2)
########################
# Next, an example where secondarymatch = TRUE
# Remove previous subsetcells
rm(subsetcells)
# Get subsetcells
data(subsetcells)
# Pull out only matching variables and remove duplicates
matchingvars <- subsetcells[,c("site_id","X_WGS84","Y_WGS84","bioclim_01",
"bioclim_04","bioclim_09","bioclim_12",
"bioclim_15","bioclim_18")]
# Fix names
names(matchingvars) <- c("cellnumbers","x","y",names(matchingvars)[4:9])
# Remove duplicates (we will first match on climate only)
matchingvars <- matchingvars[!duplicated(matchingvars$cellnumbers),]
rownames(matchingvars) <- matchingvars$cellnumbers
# Remove cellnumbers column
matchingvars <- matchingvars[,-1]
# Pull out secondary vars and keep both identifiers
secondaryvars <- subsetcells[,c("site_id","X_WGS84","Y_WGS84","sand","clay")]
# Fix names
names(secondaryvars) <- c("cellnumbers","x","y",names(secondaryvars)[4:5])
# Convert sand and clay to percentage from fraction
secondaryvars$sand <- secondaryvars$sand*100
secondaryvars$clay <- secondaryvars$clay*100
# Remove duplicates
secondaryvars <- secondaryvars[!duplicated(secondaryvars$cellnumbers),]
# Set rownames as cellnumbers
rownames(secondaryvars) <- secondaryvars$cellnumbers
# Bring in "other" treatments
data(setsoiltypes)
other_treatments = setsoiltypes
# Create raster template
raster_template = targetcells[[1]]
# Set original criteria (from first-step matching)
criteria1 = c(0.7,42,3.3,66,5.4,18.4)
# Calculate criteria for secondary matching
criteria2 = c((max(subsetcells$sand,na.rm = TRUE)-
min(subsetcells$sand,na.rm = TRUE))/10*100,
(max(subsetcells$clay,na.rm = TRUE)-
min(subsetcells$clay,na.rm = TRUE))/10*100)
# Bring in simulation output results of interest
output_results = subsetcells[,c("site_ids","Dryprop","CwetWinter",
"CdrySummer","Cwet8","Dryall","Dryany")]
rownames(output_results) <- output_results$site_ids
# Run leave-one-out cross validation of output results
loocv_results <- loocv(matchingvars = matchingvars,
secondaryvars = secondaryvars,
output_results = output_results,
criteria1 = criteria1, criteria2 = criteria2,
secondarymatch = TRUE,
secondaryvars_id = "cellnumbers",
reference_treatment = "1", n_neighbors = 2,
other_treatments = other_treatments)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.