duplicate_count: Count duplicate values

View source: R/duplicate-count.R

duplicate_countR Documentation

Count duplicate values

Description

duplicate_count() returns a frequency table. When searching a data frame, it includes values from all columns for each frequency count.

This function is a blunt tool designed for initial data checking. It is not too informative if many values have few characters each.

For summary statistics, call audit() on the results.

Usage

duplicate_count(x, ignore = NULL, locations_type = c("character", "list"))

Arguments

x

Vector or data frame.

ignore

Optionally, a vector of values that should not be counted.

locations_type

String. One of "character" or "list". With "list", each locations value is a vector of column names, which is better for further programming. By default ("character"), the column names are pasted into a string, which is more readable.

Value

If x is a data frame or another named vector, a tibble with four columns. If x isn't named, only the first two columns appear:

  • value: All the values from x.

  • frequency: Absolute frequency of each value in x, in descending order.

  • locations: Names of all columns from x in which value appears.

  • locations_n: Number of columns named in locations.

The tibble has the scr_dup_count class, which is recognized by the audit() generic.

Summaries with audit()

There is an S3 method for the audit() generic, so you can call audit() following duplicate_count(). It returns a tibble with summary statistics for the two numeric columns, frequency and locations_n (or, if x isn't named, only for frequency).

See Also

  • duplicate_count_colpair() to check each combination of columns for duplicates.

  • duplicate_tally() to show instances of a value next to each instance.

  • janitor::get_dupes() to search for duplicate rows.

Examples

# Count duplicate values...
iris %>%
  duplicate_count()

# ...and compute summaries:
iris %>%
  duplicate_count() %>%
  audit()

# Any values can be ignored:
iris %>%
  duplicate_count(ignore = c("setosa", "versicolor", "virginica"))

scrutiny documentation built on Sept. 22, 2024, 9:06 a.m.