View source: R/duplicate-count.R
duplicate_count | R Documentation |
duplicate_count()
returns a frequency table. When searching a
data frame, it includes values from all columns for each frequency count.
This function is a blunt tool designed for initial data checking. It is not too informative if many values have few characters each.
For summary statistics, call audit()
on the results.
duplicate_count(x, ignore = NULL, locations_type = c("character", "list"))
x |
Vector or data frame. |
ignore |
Optionally, a vector of values that should not be counted. |
locations_type |
String. One of |
If x
is a data frame or another named vector, a tibble with four
columns. If x
isn't named, only the first two columns appear:
value
: All the values from x
.
frequency
: Absolute frequency of each value in x
, in descending order.
locations
: Names of all columns from x
in which value
appears.
locations_n
: Number of columns named in locations
.
The tibble has the scr_dup_count
class, which is recognized by the
audit()
generic.
audit()
There is an S3 method for the
audit()
generic, so you can call audit()
following
duplicate_count()
. It returns a tibble with summary statistics for the
two numeric columns, frequency
and locations_n
(or, if x
isn't named,
only for frequency
).
duplicate_count_colpair()
to check each combination of columns for
duplicates.
duplicate_tally()
to show instances of a value next to each instance.
janitor::get_dupes()
to search for duplicate rows.
# Count duplicate values...
iris %>%
duplicate_count()
# ...and compute summaries:
iris %>%
duplicate_count() %>%
audit()
# Any values can be ignored:
iris %>%
duplicate_count(ignore = c("setosa", "versicolor", "virginica"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.