tidyverse: Tidyverse methods for sf objects
In sf: Simple Features for R

tidyverse

R Documentation

Tidyverse methods for sf objects

Description

Tidyverse methods for sf objects. Geometries are sticky, use as.data.frame to let dplyr's own methods drop them. Use these methods after loading the tidyverse package with the generic (or after loading package tidyverse).

Usage

filter.sf(.data, ..., .dots)

arrange.sf(.data, ..., .dots)

group_by.sf(.data, ..., add = FALSE)

ungroup.sf(x, ...)

rowwise.sf(x, ...)

mutate.sf(.data, ..., .dots)

transmute.sf(.data, ..., .dots)

select.sf(.data, ...)

rename.sf(.data, ...)

rename_with.sf(.data, .fn, .cols, ...)

slice.sf(.data, ..., .dots)

summarise.sf(.data, ..., .dots, do_union = TRUE, is_coverage = FALSE)

distinct.sf(.data, ..., .keep_all = FALSE, exact = FALSE, par = 0)

gather.sf(
  data,
  key,
  value,
  ...,
  na.rm = FALSE,
  convert = FALSE,
  factor_key = FALSE
)

pivot_longer.sf(
  data,
  cols,
  names_to = "name",
  names_prefix = NULL,
  names_sep = NULL,
  names_pattern = NULL,
  names_ptypes = NULL,
  names_transform = NULL,
  names_repair = "check_unique",
  values_to = "value",
  values_drop_na = FALSE,
  values_ptypes = NULL,
  values_transform = NULL,
  ...
)

pivot_wider.sf(
  data,
  ...,
  id_cols = NULL,
  id_expand = FALSE,
  names_from = name,
  names_prefix = "",
  names_sep = "_",
  names_glue = NULL,
  names_sort = FALSE,
  names_vary = "fastest",
  names_expand = FALSE,
  names_repair = "check_unique",
  values_from = value,
  values_fill = NULL,
  values_fn = NULL,
  unused_fn = NULL
)

spread.sf(
  data,
  key,
  value,
  fill = NA,
  convert = FALSE,
  drop = TRUE,
  sep = NULL
)

sample_n.sf(tbl, size, replace = FALSE, weight = NULL, .env = parent.frame())

sample_frac.sf(
  tbl,
  size = 1,
  replace = FALSE,
  weight = NULL,
  .env = parent.frame()
)

group_split.sf(.tbl, ..., .keep = TRUE)

nest.sf(.data, ...)

separate.sf(
  data,
  col,
  into,
  sep = "[^[:alnum:]]+",
  remove = TRUE,
  convert = FALSE,
  extra = "warn",
  fill = "warn",
  ...
)

separate_rows.sf(data, ..., sep = "[^[:alnum:]]+", convert = FALSE)

unite.sf(data, col, ..., sep = "_", remove = TRUE)

unnest.sf(data, ..., .preserve = NULL)

drop_na.sf(x, ...)

inner_join.sf(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)

left_join.sf(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)

right_join.sf(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)

full_join.sf(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)

semi_join.sf(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)

anti_join.sf(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)

Arguments

`.data`	data object of class sf
`...`	other arguments
`.dots`	see corresponding function in package `dplyr`
`add`	see corresponding function in dplyr
`x`, `y`	A pair of data frames, data frame extensions (e.g. a tibble), or lazy data frames (e.g. from dbplyr or dtplyr). See Methods, below, for more details.
`.fn`, `.cols`	see original docs
`do_union`	logical; in case `summary` does not create a geometry column, should geometries be created by unioning using st_union, or simply by combining using st_combine? Using st_union resolves internal boundaries, but in case of unioning points, this will likely change the order of the points; see Details.
`is_coverage`	logical; if `do_union` is `TRUE`, use an optimized algorithm for features that form a polygonal coverage (have no overlaps)
`.keep_all`	see corresponding function in dplyr
`exact`	logical; if `TRUE` use st_equals_exact for geometry comparisons
`par`	numeric; passed on to st_equals_exact
`data`	see original function docs
`key`	see original function docs
`value`	see original function docs
`na.rm`	see original function docs
`convert`	see separate_rows
`factor_key`	see original function docs
`cols`	see original function docs
`names_to`, `names_pattern`, `names_ptypes`, `names_transform`	see `tidyr::pivot_longer()`
`names_prefix`, `names_sep`, `names_repair`	see original function docs.
`values_to`, `values_drop_na`, `values_ptypes`, `values_transform`	See `tidyr::pivot_longer()`
`id_cols`, `id_expand`, `names_from`, `names_sort`, `names_glue`, `names_vary`, `names_expand`	see `tidyr::pivot_wider()`
`values_from`, `values_fill`, `values_fn`, `unused_fn`	see `tidyr::pivot_wider()`
`fill`	see original function docs
`drop`	see original function docs
`sep`	see separate_rows
`tbl`	see original function docs
`size`	see original function docs
`replace`	see original function docs
`weight`	see original function docs
`.env`	see original function docs
`.tbl`	see original function docs
`.keep`	see original function docs
`col`	see separate
`into`	see separate
`remove`	see separate
`extra`	see separate
`.preserve`	see unnest
`by`	A join specification created with `join_by()`, or a character vector of variables to join by. If `NULL`, the default, `⁠*_join()⁠` will perform a natural join, using all variables in common across `x` and `y`. A message lists the variables so that you can check they're correct; suppress the message by supplying `by` explicitly. To join on different variables between `x` and `y`, use a `join_by()` specification. For example, `join_by(a == b)` will match `x$a` to `y$b`. To join by multiple variables, use a `join_by()` specification with multiple expressions. For example, `join_by(a == b, c == d)` will match `x$a` to `y$b` and `x$c` to `y$d`. If the column names are the same between `x` and `y`, you can shorten this by listing only the variable names, like `join_by(a, c)`. `join_by()` can also be used to perform inequality, rolling, and overlap joins. See the documentation at ?join_by for details on these types of joins. For simple equality joins, you can alternatively specify a character vector of variable names to join by. For example, `by = c("a", "b")` joins `x$a` to `y$a` and `x$b` to `y$b`. If variable names differ between `x` and `y`, use a named character vector like `by = c("x_a" = "y_a", "x_b" = "y_b")`. To perform a cross-join, generating all combinations of `x` and `y`, see `cross_join()`.
`copy`	If `x` and `y` are not from the same data source, and `copy` is `TRUE`, then `y` will be copied into the same src as `x`. This allows you to join tables across srcs, but it is a potentially expensive operation so you must opt into it.
`suffix`	If there are non-joined duplicate variables in `x` and `y`, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.

Details

select keeps the geometry regardless whether it is selected or not; to deselect it, first pipe through as.data.frame to let dplyr's own select drop it.

In case one or more of the arguments (expressions) in the summarise call creates a geometry list-column, the first of these will be the (active) geometry of the returned object. If this is not the case, a geometry column is created, depending on the value of do_union.

In case do_union is FALSE, summarise will simply combine geometries using c.sfg. When polygons sharing a boundary are combined, this leads to geometries that are invalid; see for instance https://github.com/r-spatial/sf/issues/681.

distinct gives distinct records for which all attributes and geometries are distinct; st_equals is used to find out which geometries are distinct.

nest assumes that a simple feature geometry list-column was among the columns that were nested.

Value

an object of class sf

Examples

if (require(dplyr, quietly = TRUE)) {
 nc = read_sf(system.file("shape/nc.shp", package="sf"))
 nc %>% filter(AREA > .1) %>% plot()
 # plot 10 smallest counties in grey:
 st_geometry(nc) %>% plot()
 nc %>% select(AREA) %>% arrange(AREA) %>% slice(1:10) %>% plot(add = TRUE, col = 'grey')
 title("the ten counties with smallest area")
 nc2 <- nc %>% mutate(area10 = AREA/10)
 nc %>% slice(1:2)
}
# plot 10 smallest counties in grey:
if (require(dplyr, quietly = TRUE)) {
 st_geometry(nc) %>% plot()
 nc %>% select(AREA) %>% arrange(AREA) %>% slice(1:10) %>% plot(add = TRUE, col = 'grey')
 title("the ten counties with smallest area")
}
if (require(dplyr, quietly = TRUE)) {
 nc$area_cl = cut(nc$AREA, c(0, .1, .12, .15, .25))
 nc %>% group_by(area_cl) %>% class()
}
if (require(dplyr, quietly = TRUE)) {
 nc2 <- nc %>% mutate(area10 = AREA/10)
}
if (require(dplyr, quietly = TRUE)) {
 nc %>% transmute(AREA = AREA/10) %>% class()
}
if (require(dplyr, quietly = TRUE)) {
 nc %>% select(SID74, SID79) %>% names()
 nc %>% select(SID74, SID79) %>% class()
}
if (require(dplyr, quietly = TRUE)) {
 nc2 <- nc %>% rename(area = AREA)
}
if (require(dplyr, quietly = TRUE)) {
 nc %>% slice(1:2)
}
if (require(dplyr, quietly = TRUE)) {
 nc$area_cl = cut(nc$AREA, c(0, .1, .12, .15, .25))
 nc.g <- nc %>% group_by(area_cl)
 nc.g %>% summarise(mean(AREA))
 nc.g %>% summarise(mean(AREA)) %>% plot(col = grey(3:6 / 7))
 nc %>% as.data.frame %>% summarise(mean(AREA))
}
if (require(dplyr, quietly = TRUE)) {
 nc[c(1:100, 1:10), ] %>% distinct() %>% nrow()
}
if (require(tidyr, quietly = TRUE) && require(dplyr, quietly = TRUE) && "geometry" %in% names(nc)) {
 nc %>% select(SID74, SID79) %>% gather("VAR", "SID", -geometry) %>% summary()
}
if (require(tidyr, quietly = TRUE) && require(dplyr, quietly = TRUE) && "geometry" %in% names(nc)) {
 nc$row = 1:100 # needed for spread to work
 nc %>% select(SID74, SID79, geometry, row) %>%
	gather("VAR", "SID", -geometry, -row) %>%
	spread(VAR, SID) %>% head()
}
if (require(tidyr, quietly = TRUE) && require(dplyr, quietly = TRUE)) {
 storms.sf = st_as_sf(storms, coords = c("long", "lat"), crs = 4326)
 x <- storms.sf %>% group_by(name, year) %>% nest
 trs = lapply(x$data, function(tr) st_cast(st_combine(tr), "LINESTRING")[[1]]) %>%
    st_sfc(crs = 4326)
 trs.sf = st_sf(x[,1:2], trs)
 plot(trs.sf["year"], axes = TRUE)
}

sf documentation built on June 8, 2025, 10:18 a.m.