Nothing
#' Selection of survey sites maximizing uniformity in environmental space
#'
#' @description Selection of sites to be sampled in a survey, with the goal of
#' maximizing uniformity of points in environmental space.
#'
#' @param master master_matrix object derived from function
#' \code{\link{prepare_master_matrix}} or master_selection object derived
#' from functions \code{\link{random_selection}},
#' \code{\link{uniformG_selection}}, or \code{\link{EG_selection}}.
#' @param variable_1 (character or numeric) name or position of the first
#' variable (x-axis). If the function \code{\link{make_blocks}} was used in a
#' previous step, the default, NULL, will use the same two variables, otherwise
#' this argument must be defined.
#' @param variable_2 (character or numeric) name or position of the second
#' variable (y-axis). If the function \code{\link{make_blocks}} was used in a
#' previous step, the default, NULL, will use the same two variables, otherwise
#' this argument must be defined.
#' @param selection_from (character) set of points to perform the selection
#' from. Two options are available, "all_points" or "block_centroids". The
#' first option picks the points from all points in the environmental cloud,
#' and the second one selects points only from centroids of environmental
#' blocks. See \code{\link{make_blocks}}. Default = "all_points".
#' @param expected_points (numeric) number of survey points (sites) to be
#' selected.
#' @param guess_distances (logical) whether or not to use internal algorithm
#' to automatically select \code{initial_distance} and \code{increase}. Default
#' = TRUE. If FALSE, \code{initial_distance} and \code{increase} must be
#' defined.
#' @param initial_distance (numeric) Euclidean distance to be used for a first
#' process of thinning and detection of remaining points. Default = NULL.
#' @param increase (numeric) initial value to be added to or subtracted from
#' \code{initial_distance} until reaching the number of \code{expected_points}.
#' Default = NULL.
#' @param max_n_samplings (numeric) maximum number of samples to be chosen
#' after performing all thinning \code{replicates}. Default = 1.
#' @param replicates (numeric) number of thinning replicates. Default = 10.
#' @param use_preselected_sites (logical) whether to use sites that have been
#' defined as part of the selected sites previous any selection. Object in
#' \code{master} must contain the site(s) preselected in and element of name
#' "preselected_sites" for this argument to be effective. Default = TRUE.
#' See details for more information on the approach used.
#' @param median_distance_filter (character) optional argument to define a
#' median distance-based filter based on which sets of sampling sites will be
#' selected. The default, NULL, does not apply such a filter. Options are:
#' "max" and "min".
#' @param set_seed (numeric) integer value to specify a initial seed.
#' Default = 1.
#' @param verbose (logical) whether or not to print messages about the process.
#' Default = TRUE.
#' @param force (logical) whether to replace existing set of sites selected
#' with this method in \code{master}.
#'
#' @return
#' A \code{\link{master_selection}} object (S3) with an element called
#' selected_sites_E containing one or more sets of selected sites.
#'
#' @details
#' Survey sites are selected in ways in which points will be uniformly
#' dispersed in environmental space, helping to select sites that present
#' different environmental conditions across the area of interest. This type of
#' selection is very useful to include, in the selected sites, distinct
#' environmental combinations existent in the area of interest. However, as the
#' distribution of climatic or other environmental combinations is not uniform
#' in geography, the sites selected with this function could appear clustered
#' when looked in a map.
#'
#' Exploring the geographic and environmental spaces of the region of interest
#' would be a crucial first step before selecting survey sites. Such
#' explorations can be done using the function \code{\link{explore_data_EG}}.
#'
#' If \code{use_preselected_sites} = TRUE and such sites are included as an
#' element in the object in \code{master}, the approach for selecting uniform
#' sites in environmental space is different than what was described above.
#' User-preselected sites will always be part of the sites selected. Other
#' points are selected based on an algorithm that searches for sites that are
#' uniformly distributed in environmental space but at a distance from
#' preselected sites that helps in maintaining uniformity. Note that
#' preselected sites will not be processed; therefore, uniformity of such points
#' cannot be warrantied. As multiple sets could result from selection, the
#' argument of the function \code{median_distance_filter} could be used to
#' select the set of sites with the maximum ("max") or minimum ("min") median
#' distance among selected sites. Option "max" will increase the geographic
#' distance among sampling sites, which could be desirable if the goal is to
#' cover the region of interest more broadly. The other option, "min", could be
#' used in cases when the goal is to reduce resources and time needed to sample
#' such sites.
#'
#' @seealso
#' \code{\link{uniformG_selection}}, \code{\link{random_selection}},
#' \code{\link{EG_selection}}, \code{\link{make_blocks}},
#' \code{\link{plot_sites_EG}}
#'
#' @usage
#' uniformE_selection(master, variable_1 = NULL, variable_2 = NULL,
#' selection_from = "all_points", expected_points,
#' guess_distances = TRUE, initial_distance = NULL,
#' increase = NULL, max_n_samplings = 1,
#' replicates = 10, use_preselected_sites = TRUE,
#' median_distance_filter = NULL, set_seed = 1,
#' verbose = TRUE, force = FALSE)
#'
#' @export
#'
#' @examples
#' # Data
#' data("m_matrix", package = "biosurvey")
#'
#' # Making blocks for analysis
#' m_blocks <- make_blocks(m_matrix, variable_1 = "PC1",
#' variable_2 = "PC2", n_cols = 10, n_rows = 10,
#' block_type = "equal_area")
#'
#' # Checking column names
#' colnames(m_blocks$data_matrix)
#'
#' # Selecting sites uniformly in E space
#' # because the make_blocks function was used, the same two variables will be
#' # used by default.
#' selectionE <- uniformE_selection(m_blocks, selection_from = "block_centroids",
#' expected_points = 15, max_n_samplings = 1,
#' replicates = 5, set_seed = 1)
uniformE_selection <- function(master, variable_1 = NULL, variable_2 = NULL,
selection_from = "all_points", expected_points,
guess_distances = TRUE, initial_distance = NULL,
increase = NULL, max_n_samplings = 1,
replicates = 10, use_preselected_sites = TRUE,
median_distance_filter = NULL, set_seed = 1,
verbose = TRUE, force = FALSE) {
# Initial tests
if (missing(master)) {
stop("Argument 'master' is not defined.")
}
clsm <- class(master)[1]
if (clsm %in% c("master_matrix", "master_selection")) {
if (clsm == "master_selection") {
if (!is.null(master$selected_sites_E) & force == FALSE) {
stop("'master' already contains a selection of this type, use 'force' = TRUE to replace it")
}
}
} else {
stop("Argument 'master' must be of class 'master_matrix' or 'master_selection'")
}
if (is.null(master$data_matrix$Block)) {
if (is.null(variable_1)) {
stop("Argument 'variable_1' is not defined")
}
if (is.null(variable_2)) {
stop("Argument 'variable_2' is not defined")
}
} else {
sel_args <- attributes(master$data_matrix)
variable_1 <- sel_args$arguments$variable_1
variable_2 <- sel_args$arguments$variable_2
}
coln <- colnames(master$data_matrix)
if (!variable_1 %in% coln) {
stop(variable_1, " is not one o the columns in 'master$data_matrix'.")
}
if (!variable_2 %in% coln) {
stop(variable_2, " is not one o the columns in 'master$data_matrix'.")
}
if (missing(expected_points)) {
stop("Argument 'expected_points' is not defined.")
}
if (guess_distances == FALSE) {
if (missing(initial_distance)) {
stop("Argument 'initial_distance' is not defined.")
}
if (missing(increase)) {
stop("Argument 'increase' is not defined.")
}
}
if (max_n_samplings > replicates) {
stop("Argument 'replicates' must be larger than 'max_n_samplings'.")
}
if (use_preselected_sites == TRUE & is.null(master$preselected_sites)) {
if (verbose == TRUE) {
message("Element 'preselected_sites' in 'master' is NULL, setting\n'use_preselected_sites' = FALSE")
}
use_preselected_sites <- FALSE
}
# Arguments for attributes
other_args <- list(arguments = list(variable_1 = variable_1,
variable_2 = variable_2,
selection_from = selection_from,
expected_points = expected_points,
guess_distances = guess_distances,
max_n_samplings = max_n_samplings,
replicates = replicates,
use_preselected_sites = use_preselected_sites,
median_distance_filter = median_distance_filter,
set_seed = set_seed))
# Preparing data
if (!selection_from[1] %in% c("all_points", "block_centroids")) {
stop("Argument 'selection_from' is not valid, check function's help.")
} else {
# Preparing data
data <- master$data_matrix
if (selection_from[1] == "block_centroids") {
if (is.null(master$data_matrix$Block)) {
stop("Blocks are not defined in data_matrix, see function 'make_blocks'.")
}
# Preparing centroids
data <- closest_to_centroid(data, variable_1, variable_2, space = "E",
n = 1, id_column = "Block")
useb <- TRUE
} else {
useb <- FALSE
}
}
if (!is.null(median_distance_filter)) {
if (!median_distance_filter %in% c("max", "min")) {
stop("Argument 'median_distance_filter' is not valid, see function's help.")
}
}
# Selection depending on option of user points
if (use_preselected_sites == TRUE) {
# Using preselected sites to create mask and define distance
tst <- preselected_dist_mask(master, expected_points = expected_points,
space = "E", variable_1 = variable_1,
variable_2 = variable_2, use_blocks = useb,
verbose = verbose)
# Excluding close points from analysis
data <- sp::SpatialPointsDataFrame(data[, c(variable_1, variable_2)], data,
proj4string = sp::CRS("+init=epsg:4326"))
data <- data[is.na(sp::over(data, tst$mask)$ID), ]@data
# Npre
npre <- nrow(master$preselected_sites)
expected_points <- expected_points - npre
}
# Preparing selection variables
## Guessing distances
if (guess_distances == TRUE) {
if (use_preselected_sites == TRUE) {
dist <- round(tst$distance, 2)
} else {
ext <- apply(data[, c(variable_1, variable_2)], 2, range)
dist <- c(stats::dist(ext)) / expected_points
}
increase <- dist / 10
} else {
dist <- initial_distance
}
## Other variables
np <- nrow(data)
ininp <- np
pnp <- np
inin <- 1
count <- 1
# Condition
mess <- ifelse(selection_from[1] == "all_points",
"Number of available points using 'all_points'",
"Number of available points using 'block_centroids.
'")
if (np < expected_points) {
stop(mess, " is smaller than 'expected_points'.")
}
if (np == expected_points) {
if (verbose == TRUE) {
message(mess, " equals 'expected_points'")
}
if (use_preselected_sites == TRUE) {
data <- rbind(master$preselected_sites[, -1],
data[, colnames(data) != "Selected_blocks"])
}
thin <- list(selection_1 = data)
attributes(thin) <- c(attributes(thin), other_args)
return(new_master_selection(master$data_matrix, master$preselected_sites,
master$region, master$mask, master$raster_base,
master$PCA_results, master$selected_sites_random,
master$selected_sites_G, thin,
master$selected_sites_EG))
}
# Selection process
if (verbose == TRUE) {
message("Running algorithm for selecting sites, please wait...")
}
while (np > expected_points) {
# Thinning
thin <- point_thinning(data, variable_1, variable_2, dist, space = "E",
max_n_samplings, replicates, set_seed)
np <- nrow(thin[[1]])
if (verbose == TRUE) {
message(" Distance ", round(dist, 3), " resulted in ",
np, " points")
}
if (np <= expected_points) {
if (np == expected_points) {
# Selection done
break()
} else {
# Reducing initial distance
dist <- dist - increase
# Reducing increase distance
if (count > 1 & pnp > expected_points) {
increase <- increase / 10
}
}
} else {
dist <- dist + increase
}
# Starting again
pnp <- np
np <- ininp
count <- count + 1
}
# Returning results
## Adding preselected to selected
if (use_preselected_sites == TRUE) {
thin <- lapply(thin, function(x) {
rbind(master$preselected_sites[, -1], x[, colnames(x) != "Selected_blocks"])
})
}
## Filtering by distances success
if (length(thin) > 1 & !is.null(median_distance_filter)) {
thin <- distance_filter(thin, median_distance_filter)
}
## Naming and returning
names(thin) <- paste0("selection_", 1:length(thin))
if (verbose == TRUE) {
message("Total number of sites selected: ", nrow(thin[[1]]))
}
attributes(thin) <- c(attributes(thin), other_args)
if (class(master)[1] == "master_matrix") {
return(new_master_selection(master$data_matrix, master$preselected_sites,
master$region, master$mask, master$raster_base,
master$PCA_results, selected_sites_random = NULL,
selected_sites_G = NULL, thin,
selected_sites_EG = NULL))
} else {
return(new_master_selection(master$data_matrix, master$preselected_sites,
master$region, master$mask, master$raster_base,
master$PCA_results, master$selected_sites_random,
master$selected_sites_G, thin,
master$selected_sites_EG))
}
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.