duplicates: Inspect duplicate rows

View source: R/duplicates.R

duplicatesR Documentation

Inspect duplicate rows

Description

Inspect duplicate rows in a data frame by sets of columns.

Usage

duplicates(x, by_list, check_all = FALSE)

Arguments

x

A data frame, or a data frame extension, like a tibble::tibble(), or a data.table::data.table().

by_list

A list, where each element is a character vector with columns names from x, indicating which combinations of columns from x to use for duplicate checks.

check_all

If TRUE, include number of duplicates by all columns in x.

Details

If x is a data.table, we use data.table's optimized data.table::uniqueN(), otherwise we use dplyr::distinct() to calculate the number of duplicates.

We use list columns in the results, with typically short lists, which in general can be printed nicely. As tibbles hide elements of list columns, we return a data.frame.

Value

A data.frame with columns by (list), N_unique and N_duplicated (integers), where each row corresponds to an element of by_list. If check_all == TRUE, we add a row to the bottom, where the value of by is NULL.


svraka/asmisc documentation built on June 12, 2025, 12:04 p.m.