complete: Complete a data frame with missing combinations of data
In tidyverse/tidyr: Tidy Messy Data

complete

R Documentation

Complete a data frame with missing combinations of data

Description

Turns implicit missing values into explicit missing values. This is a wrapper around expand(), dplyr::full_join() and replace_na() that's useful for completing missing combinations of data.

Usage

complete(data, ..., fill = list(), explicit = TRUE)

Arguments

`data`	A data frame.
`...`	<`data-masking`> Specification of columns to expand or complete. Columns can be atomic vectors or lists. To find all unique combinations of `x`, `y` and `z`, including those not present in the data, supply each variable as a separate argument: `expand(df, x, y, z)` or `complete(df, x, y, z)`. To find only the combinations that occur in the data, use `nesting`: `expand(df, nesting(x, y, z))`. You can combine the two forms. For example, `expand(df, nesting(school_id, student_id), date)` would produce a row for each present school-student combination for all possible dates. When used with factors, `expand()` and `complete()` use the full set of levels, not just those that appear in the data. If you want to use only the values seen in the data, use `forcats::fct_drop()`. When used with continuous variables, you may need to fill in values that do not appear in the data: to do so use expressions like `year = 2010:2020` or `year = full_seq(year,1)`.
`fill`	A named list that for each variable supplies a single value to use instead of `NA` for missing combinations.
`explicit`	Should both implicit (newly created) and explicit (pre-existing) missing values be filled by `fill`? By default, this is `TRUE`, but if set to `FALSE` this will limit the fill to only implicit missing values.

Grouped data frames

With grouped data frames created by dplyr::group_by(), complete() operates within each group. Because of this, you cannot complete a grouping column.

Examples

df <- tibble(
  group = c(1:2, 1, 2),
  item_id = c(1:2, 2, 3),
  item_name = c("a", "a", "b", "b"),
  value1 = c(1, NA, 3, 4),
  value2 = 4:7
)
df

# Combinations --------------------------------------------------------------
# Generate all possible combinations of `group`, `item_id`, and `item_name`
# (whether or not they appear in the data)
df %>% complete(group, item_id, item_name)

# Cross all possible `group` values with the unique pairs of
# `(item_id, item_name)` that already exist in the data
df %>% complete(group, nesting(item_id, item_name))

# Within each `group`, generate all possible combinations of
# `item_id` and `item_name` that occur in that group
df %>%
  dplyr::group_by(group) %>%
  complete(item_id, item_name)

# Supplying values for new rows ---------------------------------------------
# Use `fill` to replace NAs with some value. By default, affects both new
# (implicit) and pre-existing (explicit) missing values.
df %>%
  complete(
    group,
    nesting(item_id, item_name),
    fill = list(value1 = 0, value2 = 99)
  )

# Limit the fill to only the newly created (i.e. previously implicit)
# missing values with `explicit = FALSE`
df %>%
  complete(
    group,
    nesting(item_id, item_name),
    fill = list(value1 = 0, value2 = 99),
    explicit = FALSE
  )

tidyverse/tidyr documentation built on April 13, 2025, 11:51 a.m.