View source: R/duplicate-count.R
| duplicate_count | R Documentation |
duplicate_count() returns a frequency table. When searching a
data frame, it includes values from all columns for each frequency count.
This function is a blunt tool designed for initial data checking. It is not too informative if many values have few characters each.
For summary statistics, call audit() on the results.
duplicate_count(x, ignore = NULL, locations_type = c("character", "list"))
x |
Vector or data frame. |
ignore |
Optionally, a vector of values that should not be counted. |
locations_type |
String. One of |
If x is a data frame or another named vector, a tibble with four
columns. If x isn't named, only the first two columns appear:
value: All the values from x.
frequency: Absolute frequency of each value in x, in descending order.
locations: Names of all columns from x in which value appears.
locations_n: Number of columns named in locations.
The tibble has the scr_dup_count class, which is recognized by the
audit() generic.
audit()There is an S3 method for the
audit() generic, so you can call audit() following
duplicate_count(). It returns a tibble with summary statistics for the
two numeric columns, frequency and locations_n (or, if x isn't named,
only for frequency).
duplicate_count_colpair() to check each combination of columns for
duplicates.
duplicate_tally() to show instances of a value next to each instance.
janitor::get_dupes() to search for duplicate rows.
# Count duplicate values...
iris %>%
duplicate_count()
# ...and compute summaries:
iris %>%
duplicate_count() %>%
audit()
# Any values can be ignored:
iris %>%
duplicate_count(ignore = c("setosa", "versicolor", "virginica"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.