duplicate_count_colpair: Count duplicate values by column

View source: R/duplicate-count-colpair.R

duplicate_count_colpairR Documentation

Count duplicate values by column

Description

duplicate_count_colpair() takes a data frame and checks each combination of columns for duplicates. Results are presented in a tibble, ordered by the number of duplicates.

Usage

duplicate_count_colpair(data, ignore = NULL, show_rates = TRUE)

Arguments

data

Data frame.

ignore

Optionally, a vector of values that should not be checked for duplicates.

show_rates

Logical. If TRUE (the default), adds columns rate_x and rate_y. See value section. Set show_rates to FALSE for higher performance.

Value

A tibble (data frame) with these columns –

  • x and y: Each line contains a unique combination of data's columns, stored in the x and y output columns.

  • count: Number of "duplicates", i.e., values that are present in both x and y.

  • total_x, total_y, rate_x, and rate_y (added by default): total_x is the number of non-missing values in the column named under x. Also, rate_x is the proportion of x values that are duplicated in y, i.e., count / total_x. Likewise with total_y and rate_y. The two ⁠rate_*⁠ columns will be equal unless NA values are present.

Summaries with audit()

There is an S3 method for audit(), so you can call audit() following duplicate_count_colpair(). It returns a tibble with summary statistics.

See Also

  • duplicate_count() for a frequency table.

  • duplicate_tally() to show instances of a value next to each instance.

  • janitor::get_dupes() to search for duplicate rows.

  • corrr::colpair_map(), a versatile tool for pairwise column analysis which the present function wraps.

Examples

# Basic usage:
mtcars %>%
  duplicate_count_colpair()

# Summaries with `audit()`:
mtcars %>%
  duplicate_count_colpair() %>%
  audit()

scrutiny documentation built on Sept. 22, 2024, 9:06 a.m.