dbplyr-slice: Subset rows using their positions
In dbplyr: A 'dplyr' Back End for Databases

dbplyr-slice

R Documentation

Subset rows using their positions

Description

These are methods for the dplyr generics slice_min(), slice_max(), and slice_sample(). They are translated to SQL using filter() and window functions (ROWNUMBER, MIN_RANK, or CUME_DIST depending on arguments). slice(), slice_head(), and slice_tail() are not supported since database tables have no intrinsic order.

If data is grouped, the operation will be performed on each group so that (e.g.) slice_min(db, x, n = 3) will select the three rows with the smallest value of x in each group.

Usage

## S3 method for class 'tbl_lazy'
slice_min(
  .data,
  order_by,
  ...,
  n,
  prop,
  by = NULL,
  with_ties = TRUE,
  na_rm = TRUE
)

## S3 method for class 'tbl_lazy'
slice_max(
  .data,
  order_by,
  ...,
  n,
  by = NULL,
  prop,
  with_ties = TRUE,
  na_rm = TRUE
)

## S3 method for class 'tbl_lazy'
slice_sample(.data, ..., n, prop, by = NULL, weight_by = NULL, replace = FALSE)

Arguments

`.data`	A lazy data frame backed by a database query.
`order_by`	Variable or function of variables to order by.
`...`	Not used.
`n`, `prop`	Provide either `n`, the number of rows, or `prop`, the proportion of rows to select. If neither are supplied, `n = 1` will be used. If `n` is greater than the number of rows in the group (or `prop` > 1), the result will be silently truncated to the group size. If the proportion of a group size is not an integer, it is rounded down.
`by`	<`tidy-select`> Optionally, a selection of columns to group by for just this operation, functioning as an alternative to `group_by()`. For details and examples, see ?dplyr_by.
`with_ties`	Should ties be kept together? The default, `TRUE`, may return more rows than you request. Use FALSE to ignore ties, and return the first n rows.
`na_rm`	Should missing values in `order_by` be removed from the result? If `FALSE`, `NA` values are sorted to the end (like in `arrange()`), so they will only be included if there are insufficient non-missing values to reach `n`/`prop`.
`weight_by`, `replace`	Not supported for database backends.

Examples

library(dplyr, warn.conflicts = FALSE)

db <- memdb_frame(x = 1:3, y = c(1, 1, 2))
db %>% slice_min(x) %>% show_query()
db %>% slice_max(x) %>% show_query()
db %>% slice_sample() %>% show_query()

db %>% group_by(y) %>% slice_min(x) %>% show_query()

# By default, ties are includes so you may get more rows
# than you expect
db %>% slice_min(y, n = 1)
db %>% slice_min(y, n = 1, with_ties = FALSE)

# Non-integer group sizes are rounded down
db %>% slice_min(x, prop = 0.5)

dbplyr documentation built on May 29, 2024, 6:19 a.m.