View source: R/repeated_spatial_cluster_sample.R
repeated_spatial_cluster_sample | R Documentation |
Repeated spatial cluster sampling splits the data into V groups
using partitioning (kmeans
)/ hierarchical(hclust
)
clustering of some variables, typically spatial coordinates.
A resample of the analysis data works as in spatial_cluster_sample
but with repeats.
The number or resamples is equal to fold * repeats, resample sizes are
not equal across folds and repeats.
repeated_spatial_cluster_sample(
data = data,
v = 10,
repeats = 1,
coords = c("X", "Y"),
strata = NULL,
breaks = 4,
pool = 0.1,
spatial = FALSE,
clust_method = "kmeans",
dist_clust = NULL,
...
)
data |
data input data set one of sp, sf or data.frame with X and Y as variables |
v |
number of partitions of the data set or number of clusters |
repeats |
number of repetitions of partition of data set |
coords |
(vector) pair of coordinates if data type is aspatial or data.frame |
strata |
(character) strata variable; default is NULL, as it does not yield good results with stratification based on class/strata |
breaks |
(integer) A single number giving the number of bins desired to stratify a numeric stratification variable |
pool |
(numeric) A proportion of data used to determine if a
particular group is too small and should be pooled into another group.
Default is 0.1 |
spatial |
(logical) if data set is spatial (when sf or sp) or aspatial (data.frame) |
clust_method |
one of partitioning (default = kmeans) or
one of hierarchical methods( |
dist_clust |
the agglomeration method to be used. This should be one of “ward.D”, “ward.D2”, “single”, “complete”, “average” (= UPGMA), “mcquitty” (= WPGMA), “median” (= WPGMC) or “centroid” (= UPGMC). The dist_clust in the function is method in stats::hclust |
... |
currently not used |
The variables in the coords
argument, if input data is data.frame or
extracted from sp, or sf data are used for clustering of the data into
disjointed sets. These clusters are used as the folds for cross-validation.
Depending on how the data are distributed spatially.
The function is similar to repeated cross validation or v-fold cross
validation vfold_cv
but for spatial data with clustering.
A tibble with classes spatial_cv
, rset
, tbl_df
, tbl
, and data.frame.
The
results include a column for the data split objects and one or more
identification variables.
For a single repeat, there will be one column called id that has a character
string with the fold identifier. For repeats, id
is the repeat number and an
additional column called id2 that contains the fold information (within repeat).
A. Brenning, "Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest," 2012 IEEE International Geoscience and Remote Sensing Symposium, Munich, 2012, pp. 5372-5375, doi: 10.1109/IGARSS.2012.6352393.
Julia Silge (2021). spatialsample: Spatial Resampling Infrastructure. https://github.com/tidymodels/spatialsample, https://spatialsample.tidymodels.org.
Julia Silge, Fanny Chow, Max Kuhn and Hadley Wickham (2021). rsample: General Resampling Infrastructure. R package version 0.1.1. https://CRAN.R-project.org/package=rsample
## Not run:
data("landcover")
rscv<- repeated_spatial_cluster_sample(data = landcover,coords = NULL, v = 10,
repeats = 5, spatial = TRUE, clust_method = "kmeans",
dist_clust = NULL, breaks = 4, pool = 0.1)
rscv
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.