cells | R Documentation |
Cell counts and differences for a series of datasets
cells(..., .list = NULL, compare = c("to_first", "sequential"))
... |
For |
.list |
A |
compare |
How to compare the datasets. |
An object of class cellComparison
, which is really an array
with a few extra attributes. It counts the total number of cells, the number of
missings, the number of altered values and changes therein as compared to
the reference defined in how
.
When comparing the contents of two data sets, the total number of cells in the current data set can be partitioned as in the following figure.
This function computes the partition for two or more
datasets, comparing the current set to the first (default) or to the
previous (by setting compare='sequential'
).
This function assumes that the datasets have the same dimensions and that both rows and columns are ordered similarly.
The figure is reproduced from MPJ van der Loo and E. De Jonge (2018) Statistical Data Cleaning with applications in R (John Wiley & Sons).
Other comparing:
as.data.frame,cellComparison-method
,
as.data.frame,validatorComparison-method
,
barplot,cellComparison-method
,
barplot,validatorComparison-method
,
compare()
,
match_cells()
,
plot,cellComparison-method
,
plot,validatorComparison-method
data(retailers)
# start with raw data
step0 <- retailers
# impute turnovers
step1 <- step0
step1$turnover[is.na(step1$turnover)] <- mean(step1$turnover,na.rm=TRUE)
# flip sign of negative revenues
step2 <- step1
step2$other.rev <- abs(step2$other.rev)
# create an overview of differences, comparing to the previous step
cells(raw = step0, imputed = step1, flipped = step2, compare="sequential")
# create an overview of differences compared to raw data
out <- cells(raw = step0, imputed = step1, flipped = step2)
out
# Graphical overview of the changes
plot(out)
barplot(out)
# transform data to data.frame (easy for use with ggplot)
as.data.frame(out)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.