compare_two_df: Compare two data frames.

Description Usage Arguments Value Examples

View source: R/acquire_error.R

Description

compare_two_df compares the vars of data frames given an uniqueId.

Usage

1
compare_two_df(df1, df2, vars, uniqueId)

Arguments

df1

Data frame 1.

df2

Data frame 2.

vars

A list of vector of variables to be compared. In each vector, the first variable name belongs to df1, and the second variable name belongs to df2.

uniqueId

A string of unique ID that is used to matched df2 with df1.

Value

It returns a data frame of 7 variables:

  1. var.x: the name of the first variable name in each vector of vars;

  2. var.y: the name of the second variable name in each vector of vars;

  3. uniqueId: the unique ID given by uniqueId;

  4. values.x: the value of the first variable name in each vector of vars;

  5. values.y: the value of the second variable name in each vector of vars;

  6. row.x: the row of the values.x in df1;

  7. row.y: the row of the values.y in df2;

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
df <- data.frame(firstname_variant=character(100), lastname_variant=character(100))
df <- add_variable(df, "nhsid")
df <- add_variable(df, "firstname", country = "uk", gender_dependency= FALSE,
                   age_dependency = FALSE)
df <- add_variable(df, "lastname", country = "uk", gender_dependency= FALSE,
                   age_dependency = FALSE)
df$firstname_variant <-as.character(df$firstname_variant)
df$lastname_variant <-as.character(df$lastname_variant)
for (i in 1:nrow(df)){
  df$firstname_variant[i] = strsplit(get_transformation_name_variant(df$firstname[i]), ',')[[1]][1]
  df$lastname_variant[i] = strsplit(get_transformation_name_variant(df$lastname[i]), ',')[[1]][1]
}
df1 = df[c('nhsid', 'firstname', 'lastname')]
df2 = df[c('nhsid', 'firstname_variant', 'lastname_variant')]
df2[1:3, 'firstname_variant'] = NA
vars = list(c('firstname', 'firstname_variant'), c('lastname', 'lastname_variant'))
diffs.table = compare_two_df(df1, df2, vars, 'nhsid')

sdglinkage documentation built on April 27, 2020, 5:09 p.m.