compare.synds | R Documentation |
Compare synthesised data set with the original (observed) data set
using percent frequency tables and histograms. When more than one
synthetic data set has been generated (object$m > 1
), by
default pooled synthetic data are used for comparison.
This function can be also used with synthetic data NOT created by
syn()
, but then an additional parameter cont.na
might
need to be provided.
## S3 method for class 'synds'
compare(object, data, vars = NULL,
msel = NULL, stat = "percents", breaks = 20, ngroups =5,
nrow = 2, ncol = 2, rel.size.x = 1,
utility.stats = c("pMSE", "S_pMSE", "df"),
utility.for.plot = "S_pMSE",
cols = c("#1A3C5A","#4187BF"),
plot = TRUE, table = FALSE,
print.flag = TRUE, ...)
## S3 method for class 'data.frame'
compare(object, data, vars = NULL, cont.na = NULL,
msel = NULL, stat = "percents", breaks = 20,ngroups = 5,
nrow = 2, ncol = 2, rel.size.x = 1,
utility.stats = c("pMSE", "S_pMSE", "df"),
utility.for.plot = "S_pMSE",
cols = c("#1A3C5A","#4187BF"),
plot = TRUE, table = FALSE,
print.flag = TRUE, compare.synorig = TRUE, ...)
## S3 method for class 'list'
compare(object, data, vars = NULL, cont.na = NULL,
msel = NULL, stat = "percents", breaks = 20,ngroups = 5,
nrow = 2, ncol = 2, rel.size.x = 1,
utility.stats = c("pMSE", "S_pMSE", "df"),
utility.for.plot = "S_pMSE",
cols = c("#1A3C5A","#4187BF"),
plot = TRUE, table = FALSE,
print.flag = TRUE, compare.synorig = TRUE, ...)
## S3 method for class 'compare.synds'
print(x, ...)
object |
an object of class |
data |
an original (observed) data set. |
vars |
variables to be compared. If |
cont.na |
a named list of codes for missing values for continuous
variables if different from the |
msel |
index or indices of synthetic data copies for which a comparison
is to be made. If |
stat |
determines whether tables and plots present percentages
|
breaks |
the number of cells for the histogram. |
ngroups |
the number of groups used to categorise numeric variables when calculating the one-way utility measures. |
nrow |
the number of rows for the plotting area. |
ncol |
the number of columns for the plotting area. |
rel.size.x |
a number representing the relative size of x-axis labels. |
utility.stats |
a single string or a vector of strings that determines
which utility measures to print. Must be a selection from:
|
utility.for.plot |
a single string that determines which utility
measure to print in facet labels of the plot. Set to |
cols |
bar colors. |
plot |
a logical value with default set to |
table |
a logical value with default set to |
print.flag |
a logical value with default set to |
compare.synorig |
a logical value to determine if the functions
|
... |
additional parameters. |
x |
an object of class |
Missing data categories for numeric variables are plotted on the same plot
as non-missing values. They are indicated by miss.
suffix.
Numeric variables with fewer than 6 distinct values are changed to factors in order to make plots more readable.
An object of class compare.synds
which is a list including a list
of comparative frequency tables (tables
) and a ggplot object
(plots
) with bar charts/histograms. If multiple plots are produced
they and their corresponding frequency tables are stored as a list.
Nowok, B., Raab, G.M and Dibben, C. (2016). synthpop: Bespoke creation of synthetic data in R. Journal of Statistical Software, 74(11), 1-26. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v074.i11")}.
multi.compare
ods <- SD2011[ , c("sex", "age", "edu", "marital", "ls", "income")]
s1 <- syn(ods, cont.na = list(income = -8))
### synthetic data provided as a 'synds' object
compare(s1, ods, vars = "ls")
compare(s1, ods, vars = "income", stat = "counts",
table = TRUE, breaks = 10)
### synthetic data provided as 'data.frame'
compare(s1$syn, ods, vars = "ls")
compare(s1$syn, ods, vars = "income", cont.na = list(income = -8),
stat = "counts", table = TRUE, breaks = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.