View source: R/tag_duplicates.R
tag_duplicates | R Documentation |
tag_duplicates(..., .add_tags = TRUE)
... |
Columns to use for identifying duplicates. |
.add_tags |
logical to return three indicator columns: |
This function identifies and tags duplicate observations based on specified variables.
This function mimics the functionality of Stata's duplicates
command in R.
It calculates the number of duplicates and provides a report of duplicates
based on the specified variables. The function utilizes the n_ and N_ functions
for counting and grouping the observations.
A tibble with three columns: .n_
, .N_
, and .dup_
.
.n_
represents the running counter within each group of variables,
indicating the number of the current observation.
.N_
represents the total number of observations within each group of variables.
.dup_
is a logical column indicating
whether the observation is a duplicate (TRUE) or not (FALSE).
Other Data Management:
append()
,
codebook()
,
count_functions
,
cut()
library(dplyr)
# Example with a custom dataset
data <- data.frame(
x = c(1, 1, 2, 2, 3, 4, 4, 5),
y = letters[1:8]
)
# Identify and tag duplicates based on the "x" variable
data %>% mutate(tag_duplicates(x))
# Identify and tag duplicates based on multiple variables
data %>% mutate(tag_duplicates(x, y))
# Identify and tag duplicates based on all variables
data %>% mutate(tag_duplicates(everything()))
## Not run:
## STATA example
dupxmpl <- haven::read_dta("https://www.stata-press.com/data/r18/dupxmpl.dta")
dupxmpl |> mutate(tag_duplicates(everything()))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.