diffdataframe: Performs a diff between two dataframes

Description Usage Arguments Value Examples

View source: R/general.R

Description

For two dataframes df1 and df2 with a unique identifier (key) and the same columns, performs a diff: it returns (invisibly) a dataframe with the rows in both dataframes which present some differences, and optionally, write a xlsx file (using the openxlsx package) with the three sheets. The first sheet contains the rows with highlighted differences, the second sheet contains the rows of df1 that have no match in df2, while the third sheet contains the rows of df2 with no match in df1. The differences may be associated to exact matches or a relative error greater than umbral for numerical columns.

Usage

1
diffdataframe(df1, df2, key, file = NULL, numcol = NULL, umbral = 0.05)

Arguments

df1,

df2 dataframes. They should have the same columns, even if they appear in a different order (rbind(df1, df2) should make sense)

key

String vector. A vector of columns names which serve as a unique identifier.

file

String. The name of an xlsx file where the differences between df1 and df2 are written.

numcol

string vector. A vector of columns names which are numeric and for which relative error should be checked. If the relative difference (w.r.t the value of first dataframe) is greater than umbral, it will be highlighted. Default NULL.

umbral

numeric. A value that serves as a threshold for the relative difference in numcol. It should be a number between 0 and 1. The default is 0.05 (5%).

Value

To access the dataframe that contains the differences, the output of diffdataframe should be assigned to a variable.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
library("dplyr")
## consider two dataframes
df1 <- data_frame(x = paste("id", c(1, 1, 2)),
                  year = c("2001", "2002", "2002"),
                  y = letters[1:3])
df2 <- data_frame(x = paste("id", 1:2),
                  year = rep("2002", 2),
                  y = c("a", "c"))
## writes to a xlsx file but does not produce a dataframe:
 diffdataframe(df1, df2, key = c("x", "year"), file = "diff-df1-vs-df2.xlsx")
## write the resulting dataframe to an object for further use:
diff_df1_df2 <- diffdataframe(df1, df2, key = c("x", "year"))
df1$z <- rep(100, 3)
df2$z <- c(95, 106, 100)
diffdataframe(df1, df2, key = c("x", "year"), numcol = "z", file = "diff-df1-vs-df2.xlsx")

mkesslerct/opadar documentation built on May 23, 2019, 2:01 a.m.