duplicates: Inspect duplicate rows
In svraka/asmisc: Miscellaneous Utility Functions

duplicates

R Documentation

Inspect duplicate rows

Description

Inspect duplicate rows in a data frame by sets of columns.

Usage

duplicates(x, by_list, check_all = FALSE)

Arguments

`x`	A data frame, or a data frame extension, like a `tibble::tibble()`, or a `data.table::data.table()`.
`by_list`	A list, where each element is a character vector with columns names from `x`, indicating which combinations of columns from `x` to use for duplicate checks.
`check_all`	If `TRUE`, include number of duplicates by all columns in `x`.

Details

If x is a data.table, we use data.table's optimized data.table::uniqueN(), otherwise we use dplyr::distinct() to calculate the number of duplicates.

We use list columns in the results, with typically short lists, which in general can be printed nicely. As tibbles hide elements of list columns, we return a data.frame.

Value

A data.frame with columns by (list), N_unique and N_duplicated (integers), where each row corresponds to an element of by_list. If check_all == TRUE, we add a row to the bottom, where the value of by is NULL.