count_duplicates: Count the number of duplicate rows in a data frame

View source: R/count_duplicates.R

count_duplicatesR Documentation

Count the number of duplicate rows in a data frame

Description

Given a data frame, this will retun a data frame of the duplicate rows with a column for the number of times that it appears in the data.

Very similar and not as preferred to the get_dupes function in Sam Firke's janitor package. I did borrow some code from that one to deal with cases when variable are specified and when they are not (variables are arguments to ...).

Usage

count_duplicates(data, ...)

Arguments

data

A data frame or tibble

...

Unquoted variable names to search for duplicates.

Value

Returns a data.frame (actually a tbl_df) with the full records where the specified variables have duplicated values, as well as a variable dupe_count showing the number of rows sharing that combination of duplicated values.

References

https://stackoverflow.com/questions/18201074/find-how-many-times-duplicated-rows-repeat-in-r-data-frame

https://cran.r-project.org/web/packages/janitor/janitor.pdf

https://github.com/sfirke/janitor/blob/master/R/get_dupes.R

Examples

library(dplyr)

(DF <- data.frame(replicate(sequence(1:3), n = 4)))
count_duplicates(DF)
count_duplicates(DF, X2, X3)
# Pipeable also
DF %>%
  count_duplicates(.)

emilelatour/lamisc documentation built on Jan. 18, 2024, 4:55 a.m.