compare.synds | R Documentation |
Compare synthesised data set with the original (observed) data set
using percent frequency tables and histograms. When more than one
synthetic data set has been generated (object$m > 1
), by
default pooled synthetic data are used for comparison.
This function can be also used with synthetic data NOT created by
syn()
, but then an additional parameter cont.na
might
need to be provided.
## S3 method for class 'synds' compare(object, data, vars = NULL, msel = NULL, stat = "percents", breaks = 20, nrow = 2, ncol = 2, rel.size.x = 1, utility.stats = c("pMSE", "S_pMSE", "df"), utility.for.plot = "S_pMSE", cols = c("#1A3C5A","#4187BF"), plot = TRUE, table = FALSE, ...) ## S3 method for class 'data.frame' compare(object, data, vars = NULL, cont.na = NULL, msel = NULL, stat = "percents", breaks = 20, nrow = 2, ncol = 2, rel.size.x = 1, utility.stats = c("pMSE", "S_pMSE", "df"), utility.for.plot = "S_pMSE", cols = c("#1A3C5A","#4187BF"), plot = TRUE, table = FALSE, ...) ## S3 method for class 'list' compare(object, data, vars = NULL, cont.na = NULL, msel = NULL, stat = "percents", breaks = 20, nrow = 2, ncol = 2, rel.size.x = 1, utility.stats = c("pMSE", "S_pMSE", "df"), utility.for.plot = "S_pMSE", cols = c("#1A3C5A","#4187BF"), plot = TRUE, table = FALSE, ...) ## S3 method for class 'compare.synds' print(x, ...)
object |
an object of class |
data |
an original (observed) data set. |
vars |
variables to be compared. If |
cont.na |
a named list of codes for missing values for continuous
variables if different from the |
msel |
index or indices of synthetic data copies for which a comparison
is to be made. If |
stat |
determines whether tables and plots present percentages
|
breaks |
the number of cells for the histogram. |
nrow |
the number of rows for the plotting area. |
ncol |
the number of columns for the plotting area. |
rel.size.x |
a number representing the relative size of x-axis labels. |
utility.stats |
a single string or a vector of strings that determines
which utility measures to print. Must be a selection from:
|
utility.for.plot |
a single string that determines which utility
measure to print in facet labels of the plot. Set to |
cols |
bar colors. |
plot |
a logical value with default set to |
table |
a logical value with default set to |
... |
additional parameters. |
x |
an object of class |
Missing data categories for numeric variables are plotted on the same plot
as non-missing values. They are indicated by miss.
suffix.
Numeric variables with fewer than 6 distinct values are changed to factors in order to make plots more readable.
An object of class compare.synds
which is a list including a list
of comparative frequency tables (tables
) and a ggplot object
(plots
) with bar charts/histograms. If multiple plots are produced
they and their corresponding frequency tables are stored as a list.
Nowok, B., Raab, G.M and Dibben, C. (2016). synthpop: Bespoke creation of synthetic data in R. Journal of Statistical Software, 74(11), 1-26. doi: 10.18637/jss.v074.i11.
multi.compare
ods <- SD2011[ , c("sex", "age", "edu", "marital", "ls", "income")] s1 <- syn(ods, cont.na = list(income = -8)) ### synthetic data provided as a 'synds' object compare(s1, ods, vars = "ls") compare(s1, ods, vars = "income", stat = "counts", table = TRUE, breaks = 10) ### synthetic data provided as 'data.frame' compare(s1$syn, ods, vars = "ls") compare(s1$syn, ods, vars = "income", cont.na = list(income = -8), stat = "counts", table = TRUE, breaks = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.