clusterByDistance: Add distance-clustering information to a dataframe
In MazamaScience/MazamaLocationUtils: Manage Spatial Metadata for Known Locations

clusterByDistance

R Documentation

Add distance-clustering information to a dataframe

Description

Distance clustering is used to identify unique deployments of a sensor in an environmental monitoring field study. GPS-reported locations can be jittery and result in a sensor self-reporting from a cluster of nearby locations. Clustering helps resolve this by assigning a single location to the cluster.

Standard kmeans clustering does not work well when clusters can have widely differing numbers of members. A much better result is acheived with the Partitioning Around Medoids method available in cluster::pam().

The value of clusterDiameter is compared with the output of cluster::pam(...)$clusinfo[,'av_diss'] to determine the number of clusters.

Usage

clusterByDistance(
  tbl,
  clusterDiameter = 1000,
  lonVar = "longitude",
  latVar = "latitude",
  maxClusters = 50
)

Arguments

`tbl`	Tibble with geolocation information.
`clusterDiameter`	Diameter in meters used to determine the number of clusters (see description).
`lonVar`	Name of longitude variable in the incoming tibble.
`latVar`	Name of the latitude variable in the incoming tibble.
`maxClusters`	Maximum number of clusters to try.

Value

Input tibble with additional columns: clusterLon, clusterLat, clusterID.

Note

In most applications, the table_addClustering function should be used as it implements two-stage clustering using clusterbyDistance().

References

When k-means clustering fails

Examples

library(MazamaLocationUtils)

# Fremont, Seattle 47.6504, -122.3509
# Magnolia, Seattle 47.6403, -122.3997
# Downtown Seattle 47.6055, -122.3370

fremont_x <- jitter(rep(-122.3509, 10), .0005)
fremont_y <- jitter(rep(47.6504, 10), .0005)

magnolia_x <- jitter(rep(-122.3997, 8), .0005)
magnolia_y <- jitter(rep(47.6403, 8), .0005)

downtown_x <- jitter(rep(-122.3370, 3), .0005)
downtown_y <- jitter(rep(47.6055, 3), .0005)

# Apply clustering
tbl <-
  dplyr::tibble(
    longitude = c(fremont_x, magnolia_x, downtown_x),
    latitude = c(fremont_y, magnolia_y, downtown_y)
  ) %>%
  clusterByDistance(
    clusterDiameter = 1000
  )

plot(tbl$longitude, tbl$latitude, pch = tbl$clusterID)

MazamaScience/MazamaLocationUtils documentation built on Dec. 8, 2024, 4:56 p.m.