knitr::opts_chunk$set( error = TRUE, collapse = TRUE, comment = "#>" ) tibble:::set_dftbl_hooks() options( lifecycle_disable_warnings = FALSE, lifecycle_verbose_soft_deprecation = TRUE, lifecycle_repeat_warnings = TRUE )
There are many, many ways to subset data frames and tibbles.
This vignette is an attempt to provide a comprehensive overview over the behavior of the subsetting operators $
, [[
and [
, highlighting where the tibble implementation differs from the data frame implementation.
library(tibble) new_df <- function() { df <- data.frame(a = 1:4) df$b <- letters[5:8] df$cd <- list(9, 10:11, 12:14, "text") df } new_tbl <- function() { as_tibble(new_df()) }
Results of the same code for data frames and tibbles are presented side by side:
new_df()
In the following, if the results are identical (after converting to a data frame if necessary), only the tibble result is shown, as in the example below. This allows to spot differences easier.
new_df()
Subsetting operations are read-only. The same objects are reused in all examples:
df <- new_df() tbl <- new_tbl()
With $
subsetting, accessing a missing column gives a warning.
Inexact matching is not supported:
df$a df$"a" df$a[2:3] df$cd df$c df$d
The exact
argument is not supported by tibbles.
df[["a"]] df[["cd", exact = TRUE]] df[["cd", exact = FALSE]] df[["c", exact = TRUE]] df[["c", exact = FALSE]]
With two indexes, a single element is returned.
List columns are not unpacked by tibbles, the [[
only unpacks columns.
df[[2, "a"]] df[[2, "cd"]] df[[1:2, "cd"]] df[[2, "c"]] df[[1:2, "c"]]
Exotic variants like recursive indexing are deprecated for tibbles.
df[[c(1, 2)]]
With [
subsetting, tibbles always return a tibble.
The drop
argument is supported but has different defaults:
df["a"] df["a", drop = FALSE] df["a", drop = TRUE] df[1] df[0] df[4] df[NA] df[NA_character_] df[NA_integer_]
The same examples are repeated for two-dimensional indexing when omitting the row index:
df[, "a"] df[, "a", drop = FALSE] df[, "a", drop = TRUE] df[, 1] df[, 0] df[, 4] df[, NA] df[, NA_character_] df[, NA_integer_]
Multiple columns can be queried by passing a vector of column indexes (names, positions, or even a logical vector). With the latter option, tibbles are a tad stricter:
df[c("a", "b")] df[character()] df[1:2] df[1:3] df[1:4] df[0:2] df[-1:2] df[-1] df[-(1:2)] df[integer()] df[TRUE] df[FALSE] df[c(TRUE, TRUE, FALSE)] df[c(FALSE, TRUE, FALSE)] df[c(FALSE, TRUE)] df[c(FALSE, TRUE, FALSE, TRUE)]
The same examples are repeated for two-dimensional indexing when omitting the row index:
df[, c("a", "b")] df[, character()] df[, 1:2] df[, 1:3] df[, 1:4] df[, 0:2] df[, -1:2] df[, -1] df[, -(1:2)] df[, integer()] df[, TRUE] df[, FALSE] df[, c(TRUE, TRUE, FALSE)] df[, c(FALSE, TRUE, FALSE)] df[, c(FALSE, TRUE)] df[, c(FALSE, TRUE, FALSE, TRUE)]
Row subsetting with integer indexes works almost identical.
Out-of-bounds subsetting is not recommended and may lead to an error in future versions.
Another special case is subsetting with [1, , drop = TRUE]
where the data frame implementation returns a list.
df[1, ] df[1, , drop = TRUE] df[1:2, ] df[0, ] df[integer(), ] df[5, ] df[4:5, ] df[-1, ] df[-1:2, ] df[NA, ] df[NA_integer_, ] df[c(NA, 1), ]
Row subsetting with logical indexes also works almost identical, the index vector must have length one or the number of rows with tibbles.
df[TRUE, ] df[FALSE, ] df[c(TRUE, FALSE), ] df[c(TRUE, FALSE, TRUE), ] df[c(TRUE, FALSE, TRUE, FALSE), ] df[c(TRUE, FALSE, TRUE, FALSE, TRUE), ]
Indexing both row and column works more or less the same, except for drop
:
df[1, "a"] df[1, "a", drop = FALSE] df[1, "a", drop = TRUE] df[1:2, "a"] df[1:2, "a", drop = FALSE] df[1:2, "a", drop = TRUE] df[1, c("a", "b")] df[1, c("a", "b"), drop = FALSE] df[1, c("a", "b"), drop = TRUE] df[1:2, c("a", "b")]
Indexes can be omitted altogether, no differences here:
df[] df[,]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.