coalesce_dupes: Coalesce Information in Duplicate Rows

View source: R/utils.R

coalesce_dupesR Documentation

Coalesce Information in Duplicate Rows

Description

coalesce_dupes() sorts data, removes duplicates, and combines information in duplicate rows.

Usage

coalesce_dupes(data, ..., pre_sort = FALSE, post_sort = FALSE)

Arguments

data

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr)

...

Variables to use for sorting and determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be preserved. If omitted, simply calls distinct() with .keep_all = TRUE.

pre_sort

A logical indicating whether to sort using the input variables prior to coalescing. If no input variables are given, no sorting is performed, and this parameter is ignored.

post_sort

A logical indicating whether to sort using the input variables after coalescing. If no input variables are given, no sorting is performed, and this parameter is ignored.

Details

coalesce_dupes() can be thought of as an enhanced version of distinct. Like distinct(), coalesce_dupes() removes duplicates from a dataset based on a provided set of variables. Unlike distinct(), it sorts the data on those variables (by default) using arrange. It also tries to replace missing values with the first non-missing values in any duplicate rows using a modification of coalesce.


jesse-smith/coviData documentation built on Jan. 14, 2023, 11:08 a.m.