count_duplicates: Count duplicates for a given (single or combination) of...

Description Usage Arguments Value Examples

View source: R/dailyLogicChecks.R

Description

Counts the total number of duplicates for a given (single or combination) of columns. "Combination" refers to the case, in which there is more than one column, which in combination uniquely identify the dataset. In dplyr terminology, we are using more than one column to group our variables ( using the group_by family of functions.).

Usage

1
count_duplicates(df, uniq_identifier_col)

Arguments

df

dataset(tibble/data.frame) object from which the uniq_identifier_col is chosen.

uniq_identifier_col

a character vector of column name(s) that uniquely identifies the dataset. In here, tidyselect can be used to select columns. See examples below.

Value

a tibble containing the the variable(s) contained in uniq_identifier_col and their corresponding duplicate count and percentage (of total duplicates in data). If there is more than one variable in uniq_identifier_col, the tibble shows the duplicate count (and percentages) for those joint variables.

Examples

1
2
3
4
5
6
count_duplicates(df = dataObj, uniq_identifier_col = c("ID"))

count_duplicates(df = dataObj, uniq_identifier_col = c("ID", "Name"))

count_duplicates(df = dataObj, uniq_identifier_col
= tidyselect::contains("abc")) # tidyselection used to select columns.

AarshBatra/econR documentation built on Dec. 17, 2021, 6:45 a.m.