cluster_locs: Detect (spatial) groundwater well clusters

View source: R/cluster.R

cluster_locsR Documentation

Detect (spatial) groundwater well clusters

Description

cluster_locs() accepts as input a dataframe with X/Y coordinates, or an sf object of geometry type POINT. The function adds an integer variable that defines cluster membership. The intention is to detect spatial groundwater well clusters; hence it uses a sensible method of spatial clustering and default euclidean distance to cut the cluster tree.

Usage

cluster_locs(
  input,
  max_dist = 2,
  output_var = "cluster_id",
  xvar = "x",
  yvar = "y"
)

Arguments

input

A dataframe with X/Y coordinates, or an sf object of geometry type POINT. A typical input dataframe is the collected output of get_locs.

max_dist

The maximum geospatial distance between two points to make them belong to the same cluster. The default value is sensible for many usecases, supposing meter is the unit of the coordinate reference system, as is the case for the 'Belge 1972 / Belgian Lambert 72' CRS (EPSG 31370).

output_var

Name of the new variable to be added to input.

xvar

String. The X coordinate variable name; only considered when input is a dataframe. Defaults to "x".

yvar

String. The Y coordinate variable name; only considered when input is a dataframe. Defaults to "y".

Details

The function performs agglomerative clustering with the complete linkage method. This way, the application of a tree cutoff (max_dist) means that each cluster is a collection of locations with a maximum distance - between any two locations of the cluster - not larger than the cutoff value. All locations that can be clustered under this condition, will be. Locations that can not be clustered receive a unique cluster value.

The function's code was partly inspired by unpublished code from Ivy Jansen.

Value

The original object with an extra variable added (by default: cluster_id) to define cluster membership.

Examples

library(dplyr)
set.seed(123456789)
mydata <-
  tibble(
    a = runif(10),
    x = rnorm(10, 155763, 2),
    y = rnorm(10, 132693, 2)
  )
cluster_locs(mydata) %>%
  arrange(cluster_id)
mydata %>%
  as_points(remove = TRUE) %>%
  cluster_locs %>%
  arrange(cluster_id)

## Not run: 
watina <- connect_watina()

clusters <-
  get_locs(watina,
           area_codes = "KBR",
           collect = TRUE) %>%
  cluster_locs

# inspect result:
clusters %>%
  select(loc_code, x, y, cluster_id) %>%
  arrange(cluster_id)

# frequency of cluster sizes:
clusters %>%
  count(cluster_id) %>%
  pull(n) %>%
  table

# Disconnect:
dbDisconnect(watina)

## End(Not run)


inbo/watina documentation built on Dec. 2, 2024, 4:02 a.m.