verify_ids: Verify record consistency across databases
In nyuglobalties/anara: Highlight Problems In Survey Data

View source: R/verify.R

verify_ids

R Documentation

Verify record consistency across databases

Description

Compares demographic information across datasets to determine if the entity identified with ID x is the same across all datasets.

Usage

verify_ids(
  dat_list,
  id_col,
  unique_id_col,
  file = NULL,
  database_col = "database",
  variables = NULL,
  tolerances = NULL,
  extra_metrics = NULL,
  extra_cols = NULL,
  verbose = TRUE,
  ...
)

Arguments

`dat_list`	A named list of `data.frames`
`id_col`	The name of the ID, or primary key, column. For consistency, should be the same across datasets.
`unique_id_col`	The name of the row ID, or surrogate key, column. For consistency, should be the same across datasets.
`file`	If not `NULL`, a path to where the output spreadsheet will be saved.
`database_col`	The column name to store the `dat_list` names
`variables`	A character vector of integer or character columns to be used for comparison across datasets.
`tolerances`	If not `NULL`, a `list` of parameters to be used as tolerances. The list names must be variable names provided to `variables`, and the type of tolerances depends on the variable: If the variable is an integer, the tolerance is the maximum difference allowed If the variable is a character, the tolerance is maximum dissimilarity allowed, measured between 0 and 1.
`extra_metrics`	A `metrics()` call that contains a collection of `metric()` calls
`extra_cols`	A character vector of columns to be included in the output verification spreadsheet, mainly for reference and support during manual inspection
`verbose`	Enables logging
`...`	Extra parameters passed to `anara::fix_format`

Value

A data.frame in the fix format

Examples

if (FALSE) {
  anara::verify_ids(
    list(
      database1 = dat_1,
      database2 = dat_2
    ),
    id_col = "participant_id",
    unique_id_col = "unique_id",
    variables = c("female", "grade", "teacher_name", "form"),
    tolerances = list(
      form = 0,
      teacher_name = 0.05
    ),
    extra_cols = c(
      "start", "end",
      "incdnt_01", "incdnt_01_o", "incdnt_02", "incdnt_02_o"
    ),
    file = file.path("path", "to", "issues.csv")
  )
}

nyuglobalties/anara documentation built on July 17, 2024, 4:05 p.m.