View source: R/duplicate-count-colpair.R
duplicate_count_colpair | R Documentation |
duplicate_count_colpair()
takes a data frame and checks each combination of
columns for duplicates. Results are presented in a tibble, ordered by the
number of duplicates.
duplicate_count_colpair(data, ignore = NULL, show_rates = TRUE)
data |
Data frame. |
ignore |
Optionally, a vector of values that should not be checked for duplicates. |
show_rates |
Logical. If |
A tibble (data frame) with these columns –
x
and y
: Each line contains a unique combination of data
's columns,
stored in the x
and y
output columns.
count
: Number of "duplicates", i.e., values that are present in both x
and y
.
total_x
, total_y
, rate_x
, and rate_y
(added by default): total_x
is the number of non-missing values in the column named under x
. Also,
rate_x
is the proportion of x
values that are duplicated in y
, i.e.,
count / total_x
. Likewise with total_y
and rate_y
. The two rate_*
columns will be equal unless NA
values are present.
audit()
There is an S3 method for audit()
,
so you can call audit()
following duplicate_count_colpair()
. It
returns a tibble with summary statistics.
duplicate_count()
for a frequency table.
duplicate_tally()
to show instances of a value next to each instance.
janitor::get_dupes()
to search for duplicate rows.
corrr::colpair_map()
, a versatile tool for pairwise column analysis which
the present function wraps.
# Basic usage:
mtcars %>%
duplicate_count_colpair()
# Summaries with `audit()`:
mtcars %>%
duplicate_count_colpair() %>%
audit()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.