repeated_spatial_cluster_sample: Repeated Spatial Clustering of Point Data for Tidy Modeling
In suvedimukti/stdcab: Spatial Thinning, Dependency, Clustering, And Blocking of Point Data for Classification Problems

View source: R/repeated_spatial_cluster_sample.R

repeated_spatial_cluster_sample

R Documentation

Repeated Spatial Clustering of Point Data for Tidy Modeling

Description

Repeated spatial cluster sampling splits the data into V groups using partitioning (kmeans)/ hierarchical(hclust) clustering of some variables, typically spatial coordinates.

A resample of the analysis data works as in spatial_cluster_sample but with repeats. The number or resamples is equal to fold * repeats, resample sizes are not equal across folds and repeats.

Usage

repeated_spatial_cluster_sample(
  data = data,
  v = 10,
  repeats = 1,
  coords = c("X", "Y"),
  strata = NULL,
  breaks = 4,
  pool = 0.1,
  spatial = FALSE,
  clust_method = "kmeans",
  dist_clust = NULL,
  ...
)

Arguments

`data`	data input data set one of sp, sf or data.frame with X and Y as variables
`v`	number of partitions of the data set or number of clusters
`repeats`	number of repetitions of partition of data set
`coords`	(vector) pair of coordinates if data type is aspatial or data.frame
`strata`	(character) strata variable; default is NULL, as it does not yield good results with stratification based on class/strata
`breaks`	(integer) A single number giving the number of bins desired to stratify a numeric stratification variable
`pool`	(numeric) A proportion of data used to determine if a particular group is too small and should be pooled into another group. Default is 0.1 `vfold_cv`
`spatial`	(logical) if data set is spatial (when sf or sp) or aspatial (data.frame)
`clust_method`	one of partitioning (default = kmeans) or one of hierarchical methods(`hclust`)
`dist_clust`	the agglomeration method to be used. This should be one of “ward.D”, “ward.D2”, “single”, “complete”, “average” (= UPGMA), “mcquitty” (= WPGMA), “median” (= WPGMC) or “centroid” (= UPGMC). The dist_clust in the function is method in stats::hclust
`...`	currently not used

Details

The variables in the coords argument, if input data is data.frame or extracted from sp, or sf data are used for clustering of the data into disjointed sets. These clusters are used as the folds for cross-validation. Depending on how the data are distributed spatially. The function is similar to repeated cross validation or v-fold cross validation vfold_cv but for spatial data with clustering.

Value

A tibble with classes spatial_cv, rset, tbl_df, tbl, and data.frame. The results include a column for the data split objects and one or more identification variables. For a single repeat, there will be one column called id that has a character string with the fold identifier. For repeats, id is the repeat number and an additional column called id2 that contains the fold information (within repeat).

References

A. Brenning, "Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest," 2012 IEEE International Geoscience and Remote Sensing Symposium, Munich, 2012, pp. 5372-5375, doi: 10.1109/IGARSS.2012.6352393.

Julia Silge (2021). spatialsample: Spatial Resampling Infrastructure. https://github.com/tidymodels/spatialsample, https://spatialsample.tidymodels.org.

Julia Silge, Fanny Chow, Max Kuhn and Hadley Wickham (2021). rsample: General Resampling Infrastructure. R package version 0.1.1. https://CRAN.R-project.org/package=rsample

Examples

## Not run: 
data("landcover")

rscv<- repeated_spatial_cluster_sample(data = landcover,coords = NULL, v = 10,
      repeats = 5, spatial = TRUE, clust_method = "kmeans",
      dist_clust = NULL, breaks = 4, pool = 0.1)

rscv

## End(Not run)

suvedimukti/stdcab documentation built on Aug. 7, 2023, 2:28 p.m.

suvedimukti/stdcab index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

suvedimukti/stdcab
Spatial Thinning, Dependency, Clustering, And Blocking of Point Data for Classification Problems

repeated_spatial_cluster_sample: Repeated Spatial Clustering of Point Data for Tidy Modeling
In suvedimukti/stdcab: Spatial Thinning, Dependency, Clustering, And Blocking of Point Data for Classification Problems

Repeated Spatial Clustering of Point Data for Tidy Modeling

Description

Usage

Arguments

Details

Value

References

Examples

Related to repeated_spatial_cluster_sample in suvedimukti/stdcab...

R Package Documentation

Browse R Packages

We want your feedback!

suvedimukti/stdcab Spatial Thinning, Dependency, Clustering, And Blocking of Point Data for Classification Problems

repeated_spatial_cluster_sample: Repeated Spatial Clustering of Point Data for Tidy Modeling In suvedimukti/stdcab: Spatial Thinning, Dependency, Clustering, And Blocking of Point Data for Classification Problems

Repeated Spatial Clustering of Point Data for Tidy Modeling

Description

Usage

Arguments

Details

Value

References

Examples

Related to repeated_spatial_cluster_sample in suvedimukti/stdcab...

R Package Documentation

Browse R Packages

We want your feedback!

suvedimukti/stdcab
Spatial Thinning, Dependency, Clustering, And Blocking of Point Data for Classification Problems

repeated_spatial_cluster_sample: Repeated Spatial Clustering of Point Data for Tidy Modeling
In suvedimukti/stdcab: Spatial Thinning, Dependency, Clustering, And Blocking of Point Data for Classification Problems