tag_duplicates: Tag Duplicates

View source: R/tag_duplicates.R

tag_duplicatesR Documentation

Tag Duplicates




tag_duplicates(..., .add_tags = TRUE)



Columns to use for identifying duplicates.


logical to return three indicator columns: .n_, .N_, and .dup_.


This function identifies and tags duplicate observations based on specified variables.

This function mimics the functionality of Stata's duplicates command in R. It calculates the number of duplicates and provides a report of duplicates based on the specified variables. The function utilizes the n_ and N_ functions for counting and grouping the observations.


A tibble with three columns: .n_, .N_, and .dup_.

  • .n_ represents the running counter within each group of variables, indicating the number of the current observation.

  • .N_ represents the total number of observations within each group of variables.

  • .dup_ is a logical column indicating whether the observation is a duplicate (TRUE) or not (FALSE).

See Also

Other Data Management: append(), codebook(), count_functions, cut()



# Example with a custom dataset
data <- data.frame(
  x = c(1, 1, 2, 2, 3, 4, 4, 5),
  y = letters[1:8]

# Identify and tag duplicates based on the "x" variable
data %>% mutate(tag_duplicates(x))

# Identify and tag duplicates based on multiple variables
data %>% mutate(tag_duplicates(x, y))

# Identify and tag duplicates based on all variables
data %>% mutate(tag_duplicates(everything()))

## Not run: 
## STATA example
dupxmpl <- haven::read_dta("https://www.stata-press.com/data/r18/dupxmpl.dta")
dupxmpl |> mutate(tag_duplicates(everything()))

## End(Not run)

myominnoo/mStats documentation built on Nov. 29, 2023, 2:36 a.m.