expand: Expand data frame to include all possible combinations of...

Description Usage Arguments See Also Examples

View source: R/expand.R

Description

expand() generates all combination of variables found in a dataset. It is paired with nesting() and crossing() helpers. crossing() is a wrapper around expand_grid() that de-duplicates and sorts its inputs; nesting() is a helper that only finds combinations already present in the data.

expand() is often useful in conjunction with joins:

Usage

1
2
3
4
5
expand(data, ..., .name_repair = "check_unique")

crossing(..., .name_repair = "check_unique")

nesting(..., .name_repair = "check_unique")

Arguments

data

A data frame.

...

Specification of columns to expand. Columns can be atomic vectors or lists.

  • To find all unique combinations of x, y and z, including those not present in the data, supply each variable as a separate argument: expand(df, x, y, z).

  • To find only the combinations that occur in the data, use nesting: expand(df, nesting(x, y, z)).

  • You can combine the two forms. For example, expand(df, nesting(school_id, student_id), date) would produce a row for each present school-student combination for all possible dates.

When used with factors, expand() uses the full set of levels, not just those that appear in the data. If you want to use only the values seen in the data, use forcats::fct_drop().

When used with continuous variables, you may need to fill in values that do not appear in the data: to do so use expressions like year = 2010:2020 or year = full_seq(year,1).

.name_repair

Treatment of problematic column names:

  • "minimal": No name repair or checks, beyond basic existence,

  • "unique": Make sure names are unique and not empty,

  • "check_unique": (default value), no name repair, but check they are unique,

  • "universal": Make the names unique and syntactic

  • a function: apply custom name repair (e.g., .name_repair = make.names for names in the style of base R).

  • A purrr-style anonymous function, see rlang::as_function()

This argument is passed on as repair to vctrs::vec_as_names(). See there for more details on these terms and the strategies used to enforce them.

See Also

complete() to expand list objects. expand_grid() to input vectors rather than a data frame.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
fruits <- tibble(
  type   = c("apple", "orange", "apple", "orange", "orange", "orange"),
  year   = c(2010, 2010, 2012, 2010, 2010, 2012),
  size  =  factor(
    c("XS", "S",  "M", "S", "S", "M"),
    levels = c("XS", "S", "M", "L")
  ),
  weights = rnorm(6, as.numeric(size) + 2)
)

# All possible combinations ---------------------------------------
# Note that all defined, but not necessarily present, levels of the
# factor variable `size` are retained.
fruits %>% expand(type)
fruits %>% expand(type, size)
fruits %>% expand(type, size, year)

# Only combinations that already appear in the data ---------------
fruits %>% expand(nesting(type))
fruits %>% expand(nesting(type, size))
fruits %>% expand(nesting(type, size, year))

# Other uses -------------------------------------------------------
# Use with `full_seq()` to fill in values of continuous variables
fruits %>% expand(type, size, full_seq(year, 1))
fruits %>% expand(type, size, 2010:2012)

# Use `anti_join()` to determine which observations are missing
all <- fruits %>% expand(type, size, year)
all
all %>% dplyr::anti_join(fruits)

# Use with `right_join()` to fill in missing rows
fruits %>% dplyr::right_join(all)

Example output

# A tibble: 2 x 1
  type  
  <chr> 
1 apple 
2 orange
# A tibble: 8 x 2
  type   size 
  <chr>  <fct>
1 apple  XS   
2 apple  S    
3 apple  M    
4 apple  L    
5 orange XS   
6 orange S    
7 orange M    
8 orange L    
# A tibble: 16 x 3
   type   size   year
   <chr>  <fct> <dbl>
 1 apple  XS     2010
 2 apple  XS     2012
 3 apple  S      2010
 4 apple  S      2012
 5 apple  M      2010
 6 apple  M      2012
 7 apple  L      2010
 8 apple  L      2012
 9 orange XS     2010
10 orange XS     2012
11 orange S      2010
12 orange S      2012
13 orange M      2010
14 orange M      2012
15 orange L      2010
16 orange L      2012
# A tibble: 2 x 1
  type  
  <chr> 
1 apple 
2 orange
# A tibble: 4 x 2
  type   size 
  <chr>  <fct>
1 apple  XS   
2 apple  M    
3 orange S    
4 orange M    
# A tibble: 4 x 3
  type   size   year
  <chr>  <fct> <dbl>
1 apple  XS     2010
2 apple  M      2012
3 orange S      2010
4 orange M      2012
# A tibble: 24 x 3
   type  size  `full_seq(year, 1)`
   <chr> <fct>               <dbl>
 1 apple XS                   2010
 2 apple XS                   2011
 3 apple XS                   2012
 4 apple S                    2010
 5 apple S                    2011
 6 apple S                    2012
 7 apple M                    2010
 8 apple M                    2011
 9 apple M                    2012
10 apple L                    2010
# … with 14 more rows
# A tibble: 24 x 3
   type  size  `2010:2012`
   <chr> <fct>       <int>
 1 apple XS           2010
 2 apple XS           2011
 3 apple XS           2012
 4 apple S            2010
 5 apple S            2011
 6 apple S            2012
 7 apple M            2010
 8 apple M            2011
 9 apple M            2012
10 apple L            2010
# … with 14 more rows
# A tibble: 16 x 3
   type   size   year
   <chr>  <fct> <dbl>
 1 apple  XS     2010
 2 apple  XS     2012
 3 apple  S      2010
 4 apple  S      2012
 5 apple  M      2010
 6 apple  M      2012
 7 apple  L      2010
 8 apple  L      2012
 9 orange XS     2010
10 orange XS     2012
11 orange S      2010
12 orange S      2012
13 orange M      2010
14 orange M      2012
15 orange L      2010
16 orange L      2012
Joining, by = c("type", "size", "year")
# A tibble: 12 x 3
   type   size   year
   <chr>  <fct> <dbl>
 1 apple  XS     2012
 2 apple  S      2010
 3 apple  S      2012
 4 apple  M      2010
 5 apple  L      2010
 6 apple  L      2012
 7 orange XS     2010
 8 orange XS     2012
 9 orange S      2012
10 orange M      2010
11 orange L      2010
12 orange L      2012
Joining, by = c("type", "year", "size")
# A tibble: 18 x 4
   type    year size  weights
   <chr>  <dbl> <fct>   <dbl>
 1 apple   2010 XS       2.92
 2 orange  2010 S        3.30
 3 apple   2012 M        4.17
 4 orange  2010 S        3.39
 5 orange  2010 S        3.38
 6 orange  2012 M        5.60
 7 apple   2012 XS      NA   
 8 apple   2010 S       NA   
 9 apple   2012 S       NA   
10 apple   2010 M       NA   
11 apple   2010 L       NA   
12 apple   2012 L       NA   
13 orange  2010 XS      NA   
14 orange  2012 XS      NA   
15 orange  2012 S       NA   
16 orange  2010 M       NA   
17 orange  2010 L       NA   
18 orange  2012 L       NA   

tidyr documentation built on Sept. 27, 2021, 5:07 p.m.