rCompare: Compare two data frames

Description Usage Arguments Value See Also Examples

View source: R/rc_rCompare.R

Description

Compare two data frames (or objects coercible to data frames) and produce a dataCompareR object containing details of the matching and mismatching elements of the data. See vignette("dataCompareR") for more details.

Usage

1
2
3
4
5
6
7
8
rCompare(
  dfA,
  dfB,
  keys = NA,
  roundDigits = NA,
  mismatches = NA,
  trimChars = FALSE
)

Arguments

dfA

data frame. The first data object. dataCompareR will attempt to coerce all data objects to data frames.

dfB

data frame. The second data object. dataCompareR will attempt to coerce all data objects to data frames.

keys

String. Name of identifier column(s) used to compare dfA and dfB. NA if no identifier (row order will be used instead), a character for a single column name, or a vector of column names to match of multiple columns

roundDigits

Integer. If NA, numerics are not rounded before comparison. If specified, numerics are rounded to the specified number of decimal places using round.

mismatches

Integer. The max number of mismatches to assess, after which dataCompareR will stop (without producing an dataCompareR object). Designed to improve performance for large data sets.

trimChars

Boolean. If true, strings and factors have whitespace trimmed before comparison.

Value

An dataCompareR object. An S3 object containing details of the comparison between the two data objects. Can be used with summary, print, saveReport and generateMismatchData

See Also

Other dataCompareR.functions: generateMismatchData(), print.dataCompareRobject(), saveReport(), summary.dataCompareRobject()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
iris2 <- iris
iris2 <- iris2[1:130,]
iris2[1,1] <- 5.2
iris2[2,1] <- 5.2
rCompare(iris,iris2,key=NA)
compDetails <- rCompare(iris,iris2,key=NA, trimChars = TRUE)
print(compDetails)
summary(compDetails)

pressure2 <- pressure
pressure2[2,2] <- pressure2[2,2] + 0.01
rCompare(pressure2,pressure2,key='temperature')
rCompare(pressure2,pressure2,key='temperature', mismatches = 10)

Example output

Running rCompare...
All columns were compared, 20 row(s) were dropped from comparison
There are  1 mismatched variables:
First and last 5 observations for the  1 mismatched variables
  rowNo valueA valueB     variable  typeA  typeB diffAB
1     1    5.1    5.2 SEPAL.LENGTH double double   -0.1
2     2    4.9    5.2 SEPAL.LENGTH double double   -0.3
Warning messages:
1: `select_()` is deprecated as of dplyr 0.7.0.
Please use `select()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
2: `funs()` is deprecated as of dplyr 0.8.0.
Please use a list of either functions or lambdas: 

  # Simple named list: 
  list(mean = mean, median = median)

  # Auto named with `tibble::lst()`: 
  tibble::lst(mean, median)

  # Using lambdas
  list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Running rCompare...
All columns were compared, 20 row(s) were dropped from comparison
There are  1 mismatched variables:
First and last 5 observations for the  1 mismatched variables
  rowNo valueA valueB     variable  typeA  typeB diffAB
1     1    5.1    5.2 SEPAL.LENGTH double double   -0.1
2     2    4.9    5.2 SEPAL.LENGTH double double   -0.3
dataCompareR is generating the summary...

Data Comparison
===============

Date comparison run: 2021-03-29 12:53:42  
Comparison run on R version 4.0.3 (2020-10-10)  
With dataCompareR version 0.1.3  


Meta Summary
============


|Dataset Name |Number of Rows |Number of Columns |
|:------------|:--------------|:-----------------|
|iris         |150            |5                 |
|iris2        |130            |5                 |


Variable Summary
================

Number of columns in common: 5  
Number of columns only in iris: 0  
Number of columns only in iris2: 0  
Number of columns with a type mismatch: 0  
No match key used, comparison is by row



Row Summary
===========

Total number of rows read from iris: 150  
Total number of rows read from iris2: 130    
Number of rows in common: 130  
Number of rows dropped from iris: 20  
Number of rows dropped from  iris2: 0  


Data Values Comparison Summary
==============================

Number of columns compared with ALL rows equal: 4  
Number of columns compared with SOME rows unequal: 1  
Number of columns with missing value differences: 0  

Columns with all rows equal : PETAL.LENGTH, PETAL.WIDTH, SEPAL.WIDTH, SPECIES

Summary of columns with some rows unequal: 



|Column       |Type (in iris) |Type (in iris2) | # differences|Max difference | # NAs|
|:------------|:--------------|:---------------|-------------:|:--------------|-----:|
|SEPAL.LENGTH |double         |double          |             2|0.3            |     0|



Unequal column details
======================



#### Column -  SEPAL.LENGTH



|   | SEPAL.LENGTH (iris)| SEPAL.LENGTH (iris2)|Type (iris) |Type (iris2) | Difference|
|:--|-------------------:|--------------------:|:-----------|:------------|----------:|
|1  |                 5.1|                  5.2|double      |double       |       -0.1|
|2  |                 4.9|                  5.2|double      |double       |       -0.3|


Running rCompare...
All columns were compared, all rows were compared 
All compared variables match 
 Number of rows compared: 19 
 Number of columns compared: 2Warning message:
`arrange_()` is deprecated as of dplyr 0.7.0.
Please use `arrange()` instead.
See vignette('programming') for more help
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Running rCompare...
All columns were compared, all rows were compared 
All compared variables match 
 Number of rows compared: 19 
 Number of columns compared: 2

dataCompareR documentation built on Nov. 23, 2021, 9:06 a.m.