Description Usage Arguments Value Examples
View source: R/dailyLogicChecks.R
Counts the total number of duplicates for a given (single or combination)
of columns. "Combination" refers to the case, in which there is more than
one column, which in combination uniquely identify the dataset. In dplyr
terminology, we are using more than one column to group our variables (
using the group_by
family of functions.).
1 | count_duplicates(df, uniq_identifier_col)
|
df |
dataset( |
uniq_identifier_col |
a character vector of column name(s) that uniquely
identifies the dataset. In here, |
a tibble containing the the variable(s) contained in
uniq_identifier_col
and their corresponding duplicate count and
percentage (of total duplicates in data). If there is more than one variable in
uniq_identifier_col
, the tibble shows the duplicate count (and
percentages) for those joint variables.
1 2 3 4 5 6 | count_duplicates(df = dataObj, uniq_identifier_col = c("ID"))
count_duplicates(df = dataObj, uniq_identifier_col = c("ID", "Name"))
count_duplicates(df = dataObj, uniq_identifier_col
= tidyselect::contains("abc")) # tidyselection used to select columns.
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.