tag_duplicates: Tag Duplicates
In myominnoo/mStats: Medical Statistics & Epidemiological Analysis

View source: R/tag_duplicates.R

tag_duplicates

R Documentation

Tag Duplicates

Description

\Sexpr[results=rd]{lifecycle::badge("stable")}

Usage

tag_duplicates(..., .add_tags = TRUE)

Arguments

`...`	Columns to use for identifying duplicates.
`.add_tags`	logical to return three indicator columns: `.n_`, `.N_`, and `.dup_`.

Details

This function identifies and tags duplicate observations based on specified variables.

This function mimics the functionality of Stata's duplicates command in R. It calculates the number of duplicates and provides a report of duplicates based on the specified variables. The function utilizes the n_ and N_ functions for counting and grouping the observations.

Value

A tibble with three columns: .n_, .N_, and .dup_.

.n_ represents the running counter within each group of variables, indicating the number of the current observation.
.N_ represents the total number of observations within each group of variables.
.dup_ is a logical column indicating whether the observation is a duplicate (TRUE) or not (FALSE).

Examples


library(dplyr)

# Example with a custom dataset
data <- data.frame(
  x = c(1, 1, 2, 2, 3, 4, 4, 5),
  y = letters[1:8]
)

# Identify and tag duplicates based on the "x" variable
data %>% mutate(tag_duplicates(x))

# Identify and tag duplicates based on multiple variables
data %>% mutate(tag_duplicates(x, y))

# Identify and tag duplicates based on all variables
data %>% mutate(tag_duplicates(everything()))

## Not run: 
## STATA example
dupxmpl <- haven::read_dta("https://www.stata-press.com/data/r18/dupxmpl.dta")
dupxmpl |> mutate(tag_duplicates(everything()))

## End(Not run)

myominnoo/mStats documentation built on Nov. 29, 2023, 2:36 a.m.