nest_by: Nest by one or more variables

Description Usage Arguments Details Value Methods Examples

View source: R/nest_by.R

Description

\Sexpr[results=rd, stage=render]{lifecycle::badge("experimental")}

nest_by() is closely related to group_by(). However, instead of storing the group structure in the metadata, it is made explicit in the data, giving each group key a single row along with a list-column of data frames that contain all the other data.

nest_by() returns a rowwise data frame, which makes operations on the grouped data particularly elegant. See vignette("rowwise") for more details.

Usage

1
nest_by(.data, ..., .key = "data", .keep = FALSE)

Arguments

.data

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.

...

In group_by(), variables or computations to group by. In ungroup(), variables to remove from the grouping.

.key

Name of the list column

.keep

Should the grouping columns be kept in the list column.

Details

Note that df %>% nest_by(x, y) is roughly equivalent to

1
2
3
4
df %>%
  group_by(x, y) %>%
  summarise(data = list(cur_data())) %>%
  rowwise()

If you want to unnest a nested data frame, you can either use tidyr::unnest() or take advantage of summarise()s multi-row behaviour:

1
2
nested %>%
  summarise(data)

Value

A rowwise data frame. The output has the following properties:

A tbl with one row per unique combination of the grouping variables. The first columns are the grouping variables, followed by a list column of tibbles with matching rows of the remaining columns.

Methods

This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.

The following methods are currently available in loaded packages: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("nest_by")}.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# After nesting, you get one row per group
iris %>% nest_by(Species)
starwars %>% nest_by(species)

# The output is grouped by row, which makes modelling particularly easy
models <- mtcars %>%
  nest_by(cyl) %>%
  mutate(model = list(lm(mpg ~ wt, data = data)))
models

models %>% summarise(rsq = summary(model)$r.squared)
# This is particularly elegant with the broom functions
if (requireNamespace("broom", quietly = TRUE)) {
  models %>% summarise(broom::glance(model))
  models %>% summarise(broom::tidy(model))
}

# Note that you can also summarise to unnest the data
models %>% summarise(data)

Example output

Attaching package:dplyrThe following objects are masked frompackage:stats:

    filter, lag

The following objects are masked frompackage:base:

    intersect, setdiff, setequal, union

# A tibble: 3 x 2
# Rowwise:  Species
  Species                  data
  <fct>      <list<tbl_df[,4]>>
1 setosa               [50 × 4]
2 versicolor           [50 × 4]
3 virginica            [50 × 4]
# A tibble: 38 x 2
# Rowwise:  species
   species                  data
   <chr>     <list<tbl_df[,13]>>
 1 Aleena               [1 × 13]
 2 Besalisk             [1 × 13]
 3 Cerean               [1 × 13]
 4 Chagrian             [1 × 13]
 5 Clawdite             [1 × 13]
 6 Droid                [6 × 13]
 7 Dug                  [1 × 13]
 8 Ewok                 [1 × 13]
 9 Geonosian            [1 × 13]
10 Gungan               [3 × 13]
# … with 28 more rows
# A tibble: 3 x 3
# Rowwise:  cyl
    cyl                data model 
  <dbl> <list<tbl_df[,10]>> <list>
1     4           [11 × 10] <lm>  
2     6            [7 × 10] <lm>  
3     8           [14 × 10] <lm>  
`summarise()` regrouping output by 'cyl' (override with `.groups` argument)
# A tibble: 3 x 2
# Groups:   cyl [3]
    cyl   rsq
  <dbl> <dbl>
1     4 0.509
2     6 0.465
3     8 0.423
`summarise()` regrouping output by 'cyl' (override with `.groups` argument)
`summarise()` regrouping output by 'cyl' (override with `.groups` argument)
# A tibble: 6 x 6
# Groups:   cyl [3]
    cyl term        estimate std.error statistic    p.value
  <dbl> <chr>          <dbl>     <dbl>     <dbl>      <dbl>
1     4 (Intercept)    39.6      4.35       9.10 0.00000777
2     4 wt             -5.65     1.85      -3.05 0.0137    
3     6 (Intercept)    28.4      4.18       6.79 0.00105   
4     6 wt             -2.78     1.33      -2.08 0.0918    
5     8 (Intercept)    23.9      3.01       7.94 0.00000405
6     8 wt             -2.19     0.739     -2.97 0.0118    
`summarise()` regrouping output by 'cyl' (override with `.groups` argument)
# A tibble: 32 x 11
# Groups:   cyl [3]
     cyl   mpg  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     4  22.8 108      93  3.85  2.32  18.6     1     1     4     1
 2     4  24.4 147.     62  3.69  3.19  20       1     0     4     2
 3     4  22.8 141.     95  3.92  3.15  22.9     1     0     4     2
 4     4  32.4  78.7    66  4.08  2.2   19.5     1     1     4     1
 5     4  30.4  75.7    52  4.93  1.62  18.5     1     1     4     2
 6     4  33.9  71.1    65  4.22  1.84  19.9     1     1     4     1
 7     4  21.5 120.     97  3.7   2.46  20.0     1     0     3     1
 8     4  27.3  79      66  4.08  1.94  18.9     1     1     4     1
 9     4  26   120.     91  4.43  2.14  16.7     0     1     5     2
10     4  30.4  95.1   113  3.77  1.51  16.9     1     1     5     2
# … with 22 more rows

dplyr documentation built on June 19, 2021, 1:07 a.m.