epi_clean_compare_dup_rows: Compare two rows which may be duplicated

View source: R/epi_clean_compare_dup_rows.R

epi_clean_compare_dup_rowsR Documentation

Compare two rows which may be duplicated

Description

Compare two rows which may have duplicated information. epi_clean_compare_dup_rows() uses compare::compare() for possibly duplicated rows. compare::compare allows all transformations, sorting, etc. so can be loose. This function is intended to make manual inspection easier, compare::compare can miss differences though so care is needed.

Usage

epi_clean_compare_dup_rows(
  df_dups = NULL,
  val_id = "1",
  col_id = "",
  sub_index_1 = 1,
  sub_index_2 = 2,
  allowAll = TRUE,
  ...
)

Arguments

df_dups

a data frame with duplicated entries to compare

val_id

is a value that is thought to be duplicated (eg a repeating row ID), passed as a string. Grep is used to search for duplicates without regex with fixed = TRUE

col_id

is a string to indicate an ID column

sub_index_1

default = 1

sub_index_2

default = 2

allowAll

compare::compare option

...

pass any other options from compare::compare()

Value

returns a list object with the differing columns ('differing_cols'), their names ('col_names') and the duplicated indices

Author(s)

Antonio Berlanga-Taylor <https://github.com/AntonioJBT/episcout>

See Also

compare, grepl

Examples


## Not run: 
# Data frame object with rows thought to have duplicated entries:
check_dups
# Specify the row ID to grep where duplicate values are expected:
val_id <- '2'
comp <- epi_clean_compare_dup_rows(check_dups, val_id, 'var_id', 1, 2)
comp
View(t(check_dups[comp$duplicate_indices, ]))
View(t(check_dups[comp$duplicate_indices, comp$differing_cols]))
## End(Not run)


AntonioJBT/episcout documentation built on Dec. 1, 2024, 4:07 a.m.