title: "dataCompareR vignette"
author: "Rob Noble-Eddy"
date: "r Sys.Date()
"
output:
rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{dataCompareR vignette}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
Here we offer end-to-end examples of using dataCompareR and, for those who want to know, to provide details of how the package performs the comparison.
For the purpose of this vignette we'll intentionally modify iris to use for our comparison.
library(dataCompareR) # We'll use iris for our comparison head(iris) # Make a copy of iris iris2 <- iris # And change it, first by subsetting just the first 140 rows iris2 <- iris2[1:140,] # then removing the Petal.Width column iris2$Petal.Width <- NULL # And then changing some values iris2[1:10,1] <- iris2[1:10,1] + 1
And then run a comparison using the rCompare
function
# run the comparison compIris <- rCompare(iris, iris2)
rCompare
returns an S3 object which you can use with summary and print. Summary is a good way to check the results
# Check the results summary(compIris)
Or you save a copy of the report using saveReport
# Write the summary to a file saveReport(compIris, reportName = 'compIris')
In the first example, we compared our data based on it's order. What if want to match our data of a key? We'll produce another test data set based on the pressure dataset
# We'll use the pressure dataset for comparison head(pressure) # Make a copy of pressure pressure2 <- pressure # And change it, first by randomising the row order pressure2 <- pressure2[sample(nrow(pressure2)),] # then changing just one element, so for temperature of pressure2[5,1] # We modify pressure to be twice as large pressure2[5,2] <- pressure2[5,2] * 2
Run the comparison with rCompare
specifying that we want to match on temperature
# run the comparison compPressure <- rCompare(pressure, pressure2, keys = 'temperature')
And this time, we'll choose to get a shorter summary using print
# Check the results - use print for a quick summary print(compPressure)
We can also extract the mismatching data to explore further using generateMismatchData
which generates a list containing two data frames, each having the missing rows from the comparison.
library(dplyr)
# use generateMismatchData to pull out the mismatching rows from each table mismatches <- generateMismatchData(compPressure, pressure, pressure2) mismatches
It is possible to use the other functions not exposed to the end user through the 3 colons format like dataCompareR:::functionName
. Please take care when using them, as some of the checks are done up front, so they may make assumptions on the input.
The aspects of the dataCompareR::rCompare function that matter to the end user are:-
as.data.frame
. If you need more advanced coercion, please do this before calling dataCompareR.NA
and NaN
, which are handled in the following wayNA
, match is TRUE
NaN
, match is TRUE
NA
and the other NaN
, match is FALSE
NA
, and the other is a valid value, match is FALSE
NaN
, and the other is a valid value, match is FALSE
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.