# tidyverse: Tidyverse methods for sf objects (remove .sf suffix!) In sf: Simple Features for R

## Description

Tidyverse methods for sf objects. Geometries are sticky, use as.data.frame to let `dplyr`'s own methods drop them. Use these methods without the .sf suffix and after loading the tidyverse package with the generic (or after loading package tidyverse).

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85``` ```filter.sf(.data, ..., .dots) arrange.sf(.data, ..., .dots) group_by.sf(.data, ..., add = FALSE) ungroup.sf(x, ...) rowwise.sf(x, ...) mutate.sf(.data, ..., .dots) transmute.sf(.data, ..., .dots) select.sf(.data, ...) rename.sf(.data, ...) slice.sf(.data, ..., .dots) summarise.sf(.data, ..., .dots, do_union = TRUE, is_coverage = FALSE) distinct.sf(.data, ..., .keep_all = FALSE) gather.sf( data, key, value, ..., na.rm = FALSE, convert = FALSE, factor_key = FALSE ) spread.sf( data, key, value, fill = NA, convert = FALSE, drop = TRUE, sep = NULL ) sample_n.sf(tbl, size, replace = FALSE, weight = NULL, .env = parent.frame()) sample_frac.sf( tbl, size = 1, replace = FALSE, weight = NULL, .env = parent.frame() ) nest.sf(.data, ...) separate.sf( data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ... ) separate_rows.sf(data, ..., sep = "[^[:alnum:]]+", convert = FALSE) unite.sf(data, col, ..., sep = "_", remove = TRUE) unnest.sf(data, ..., .preserve = NULL) inner_join.sf(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) left_join.sf(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) right_join.sf(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) full_join.sf(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) semi_join.sf(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) anti_join.sf(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) ```

## Arguments

 `.data` data object of class sf `...` other arguments `.dots` see corresponding function in package `dplyr` `add` see corresponding function in dplyr `x` A pair of data frames, data frame extensions (e.g. a tibble), or lazy data frames (e.g. from dbplyr or dtplyr). See Methods, below, for more details. `do_union` logical; in case `summary` does not create a geometry column, should geometries be created by unioning using st_union, or simply by combining using st_combine? Using st_union resolves internal boundaries, but in case of unioning points, this will likely change the order of the points; see Details. `is_coverage` logical; if `do_union` is `TRUE`, use an optimized algorithm for features that form a polygonal coverage (have no overlaps) `.keep_all` see corresponding function in dplyr `data` see original function docs `key` see original function docs `value` see original function docs `na.rm` see original function docs `convert` see separate_rows `factor_key` see original function docs `fill` see original function docs `drop` see original function docs `sep` see separate_rows `tbl` see original function docs `size` see original function docs `replace` see original function docs `weight` see original function docs `.env` see original function docs `col` see separate `into` see separate `remove` see separate `extra` see separate `.preserve` see unnest `y` A pair of data frames, data frame extensions (e.g. a tibble), or lazy data frames (e.g. from dbplyr or dtplyr). See Methods, below, for more details. `by` A character vector of variables to join by. If `NULL`, the default, `*_join()` will perform a natural join, using all variables in common across `x` and `y`. A message lists the variables so that you can check they're correct; suppress the message by supplying `by` explicitly. To join by different variables on `x` and `y`, use a named vector. For example, `by = c("a" = "b")` will match `x\$a` to `y\$b`. To join by multiple variables, use a vector with length > 1. For example, `by = c("a", "b")` will match `x\$a` to `y\$a` and `x\$b` to `y\$b`. Use a named vector to match different variables in `x` and `y`. For example, `by = c("a" = "b", "c" = "d")` will match `x\$a` to `y\$b` and `x\$c` to `y\$d`. To perform a cross-join, generating all combinations of `x` and `y`, use `by = character()`. `copy` If `x` and `y` are not from the same data source, and `copy` is `TRUE`, then `y` will be copied into the same src as `x`. This allows you to join tables across srcs, but it is a potentially expensive operation so you must opt into it. `suffix` If there are non-joined duplicate variables in `x` and `y`, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.

## Details

`select` keeps the geometry regardless whether it is selected or not; to deselect it, first pipe through `as.data.frame` to let dplyr's own `select` drop it.

In case one or more of the arguments (expressions) in the `summarise` call creates a geometry list-column, the first of these will be the (active) geometry of the returned object. If this is not the case, a geometry column is created, depending on the value of `do_union`.

In case `do_union` is `FALSE`, `summarise` will simply combine geometries using c.sfg. When polygons sharing a boundary are combined, this leads to geometries that are invalid; see for instance https://github.com/r-spatial/sf/issues/681.

`distinct` gives distinct records for which all attributes and geometries are distinct; st_equals is used to find out which geometries are distinct.

`nest` assumes that a simple feature geometry list-column was among the columns that were nested.

## Value

an object of class sf

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37``` ```library(dplyr) nc = st_read(system.file("shape/nc.shp", package="sf")) nc %>% filter(AREA > .1) %>% plot() # plot 10 smallest counties in grey: st_geometry(nc) %>% plot() nc %>% select(AREA) %>% arrange(AREA) %>% slice(1:10) %>% plot(add = TRUE, col = 'grey') title("the ten counties with smallest area") nc\$area_cl = cut(nc\$AREA, c(0, .1, .12, .15, .25)) nc %>% group_by(area_cl) %>% class() nc2 <- nc %>% mutate(area10 = AREA/10) nc %>% transmute(AREA = AREA/10, geometry = geometry) %>% class() nc %>% transmute(AREA = AREA/10) %>% class() nc %>% select(SID74, SID79) %>% names() nc %>% select(SID74, SID79, geometry) %>% names() nc %>% select(SID74, SID79) %>% class() nc %>% select(SID74, SID79, geometry) %>% class() nc2 <- nc %>% rename(area = AREA) nc %>% slice(1:2) nc\$area_cl = cut(nc\$AREA, c(0, .1, .12, .15, .25)) nc.g <- nc %>% group_by(area_cl) nc.g %>% summarise(mean(AREA)) nc.g %>% summarise(mean(AREA)) %>% plot(col = grey(3:6 / 7)) nc %>% as.data.frame %>% summarise(mean(AREA)) nc[c(1:100, 1:10), ] %>% distinct() %>% nrow() library(tidyr) nc %>% select(SID74, SID79) %>% gather("VAR", "SID", -geometry) %>% summary() library(tidyr) nc\$row = 1:100 # needed for spread to work nc %>% select(SID74, SID79, geometry, row) %>% gather("VAR", "SID", -geometry, -row) %>% spread(VAR, SID) %>% head() storms.sf = st_as_sf(storms, coords = c("long", "lat"), crs = 4326) x <- storms.sf %>% group_by(name, year) %>% nest trs = lapply(x\$data, function(tr) st_cast(st_combine(tr), "LINESTRING")[[1]]) %>% st_sfc(crs = 4326) trs.sf = st_sf(x[,1:2], trs) plot(trs.sf["year"], axes = TRUE) ```

### Example output

```Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

Reading layer `nc' from data source `/usr/lib/R/site-library/sf/shape/nc.shp' using driver `ESRI Shapefile'
Simple feature collection with 100 features and 14 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
Warning message:
plotting the first 10 out of 14 attributes; use max.plot = 14 to plot all
[1] "sf"         "grouped_df" "tbl_df"     "tbl"        "data.frame"
[1] "sf"         "data.frame"
[1] "sf"         "data.frame"
[1] "SID74"    "SID79"    "geometry"
[1] "SID74"    "SID79"    "geometry"
[1] "sf"         "data.frame"
[1] "sf"         "data.frame"
Simple feature collection with 2 features and 15 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -81.74107 ymin: 36.23436 xmax: -80.90344 ymax: 36.58965
AREA PERIMETER CNTY_ CNTY_ID      NAME  FIPS FIPSNO CRESS_ID BIR74 SID74
1 0.114     1.442  1825    1825      Ashe 37009  37009        5  1091     1
2 0.061     1.231  1827    1827 Alleghany 37005  37005        3   487     0
NWBIR74 BIR79 SID79 NWBIR79    area_cl                       geometry
1      10  1364     0      19 (0.1,0.12] MULTIPOLYGON (((-81.47276 3...
2      10   542     3      12    (0,0.1] MULTIPOLYGON (((-81.23989 3...
`summarise()` ungrouping output (override with `.groups` argument)
Simple feature collection with 4 features and 2 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
# A tibble: 4 x 3
area_cl    `mean(AREA)`                                               geometry
<fct>             <dbl>                                     <MULTIPOLYGON [°]>
1 (0,0.1]          0.0760 (((-77.96073 34.18924, -77.96587 34.24229, -77.97528 …
2 (0.1,0.12]       0.112  (((-84.29104 35.21054, -84.22594 35.2616, -84.17973 3…
3 (0.12,0.1…       0.134  (((-76.54427 34.58783, -76.55515 34.61066, -76.53775 …
4 (0.15,0.2…       0.190  (((-78.02592 34.32877, -78.01131 34.31261, -78.00702 …
`summarise()` ungrouping output (override with `.groups` argument)
mean(AREA)
1    0.12626
[1] 100
VAR                 SID                  geometry
Length:200         Min.   : 0.000   MULTIPOLYGON :200
Class :character   1st Qu.: 2.000   epsg:4267    :  0
Mode  :character   Median : 5.000   +proj=long...:  0
Mean   : 7.515
3rd Qu.: 9.000
Max.   :57.000
Simple feature collection with 6 features and 3 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -81.74107 ymin: 36.07282 xmax: -75.77316 ymax: 36.58965