View source: R/utility.tables.r
utility.tables | R Documentation |
Calculates and plots tables of utility measures. The calculations of
utility measures are done by the function utility.tab
.
Options are all one-way tables, all two-way tables or three-way tables
for a specified third variable along with pairs of all other variables.
This function can be also used with synthetic data NOT created by
syn()
, but then an additional parameters not.synthesised
and cont.na
might need to be provided.
## S3 method for class 'synds' utility.tables(object, data, tables = "twoway", maxtables = 5e4, vars = NULL, third.var = NULL, useNA = TRUE, ngroups = 5, tab.stats = c("pMSE", "S_pMSE", "df"), plot.stat = "S_pMSE", plot = TRUE, print.tabs = FALSE, digits.tabs = 4, max.scale = NULL, min.scale = 0, plot.title = NULL, nworst = 5, ntabstoprint = 0, k.syn = FALSE, low = "grey92", high = "#E41A1C", n.breaks = NULL, breaks = NULL, ...) ## S3 method for class 'data.frame' utility.tables(object, data, cont.na = NULL, not.synthesised = NULL, tables = "twoway", maxtables = 5e4, vars = NULL, third.var = NULL, useNA = TRUE, ngroups = 5, tab.stats = c("pMSE", "S_pMSE", "df"), plot.stat = "S_pMSE", plot = TRUE, print.tabs = FALSE, digits.tabs = 4, max.scale = NULL, min.scale = 0, plot.title = NULL, nworst = 5, ntabstoprint = 0, k.syn = FALSE, low = "grey92", high = "#E41A1C", n.breaks = NULL, breaks = NULL, ...) ## S3 method for class 'list' utility.tables(object, data, cont.na = NULL, not.synthesised = NULL, tables = "twoway", maxtables = 5e4, vars = NULL, third.var = NULL, useNA = TRUE, ngroups = 5, tab.stats = c("pMSE", "S_pMSE", "df"), plot.stat = "S_pMSE", plot = TRUE, print.tabs = FALSE, digits.tabs = 4, max.scale = NULL, min.scale = 0, plot.title = NULL, nworst = 5, ntabstoprint = 0, k.syn = FALSE, low = "grey92", high = "#E41A1C", n.breaks = NULL, breaks = NULL, ...) ## S3 method for class 'utility.tables' print(x, print.tabs = NULL, digits.tabs = NULL, plot = NULL, plot.title = NULL, max.scale = NULL, min.scale = NULL, nworst = NULL, ntabstoprint = NULL, ...)
object |
an object of class |
data |
the original (observed) data set. |
cont.na |
a named list of codes for missing values for continuous
variables if different from the |
not.synthesised |
a vector of variable names for any variables that has been left unchanged in the synthetic data. |
tables |
defines the type of tables to produce. Options are
|
maxtables |
maximum number of tables that will be produced. If number of
tables is larger, then utility is only measured for a sample of size
|
.
vars |
a vector of strings with the names of variables to be used to form the table, or a vector of variable numbers in the original data. Defaults to all variables in both original and synthetic data. |
third.var |
when |
useNA |
determines if |
ngroups |
if numerical (non-factor) variables included with
|
tab.stats |
statistics to include in the table of results. Must be
a selection from: |
plot.stat |
statistics to plot. Choice is |
plot |
determines if plot will be produced when the result is printed. |
print.tabs |
logical value that determines if table of results is to be printed. |
digits.tabs |
number of digits to print for table, except for p-values that are always printed to 4 places. |
max.scale |
a numeric value for the maximum value used in calculating
the shading of the plots. If it is |
min.scale |
a numeric value for the minimum value used in calculating
the shading of the plots. If it is |
plot.title |
title for the plot. |
nworst |
a number of variable combinations with worst utility scores to be printed. |
ntabstoprint |
a number of tables to print for observed and synthetic data with the worst utility. |
k.syn |
a logical indicator as to whether the sample size itself has been synthesised. |
low |
colour for low end of the gradient. |
high |
colour for high end of the gradient. |
n.breaks |
a number of break points to create if breaks are not given directly. |
breaks |
breaks for a two colour binned gradient. |
... |
additional parameters |
x |
an object of class |
Calculates tables of observed and synthesised values for the variables
specified in vars
with the function utility.tab
and produces
tables and plots of one-way, two-way or
three-way utility measures formed from vars
. Several options for utility
measures can be selected for printing or plotting. Details are in help file
for utility.tab
.
The tables and variables with the worst utility scores are identified. Visualisations of the matrices of utility scores are plotted. For threeway tables a third variable can be defined to select all tables involving that variable for plotting. If it is not specified the variable with tables giving the worst utility is selected as the third variable.
An object of class utility.tab
which is a list with the following
components:
tabs |
a table with all the selected measures for all combinations of
variables defined by |
plot.stat |
measure used in |
tables |
see above. |
third.var |
see above. |
utility.plot |
plot of the selected utility measure. |
var.scores |
an average of utility scores for all combinations with other variables. |
plot |
see above. |
print.tabs |
see above. |
digits.tabs |
see above. |
plot.title |
see above. |
max.scale |
see above. |
min.scale |
see above. |
ntabstoprint |
see above. |
nworst |
see above. |
worstn |
variable combinations with |
worsttabs |
observed and synthetic cross-tabulations for |
Read, T.R.C. and Cressie, N.A.C. (1988) Goodness–of–Fit Statistics for Discrete Multivariate Data, Springer–Verlag, New York.
Voas, D. and Williamson, P. (2001) Evaluating goodness-of-fit measures for synthetic microdata. Geographical and Environmental Modelling, 5(2), 177-200.
utility.tab
ods <- SD2011[1:1000, c("sex", "age", "edu", "marital", "region", "income")] s1 <- syn(ods) ### synthetic data provided as a 'synds' object (t1 <- utility.tables(s1, ods, tab.stats = "all", print.tabs = TRUE)) ### synthetic data provided as a 'data.frame' object (t1 <- utility.tables(s1$syn, ods, tab.stats = "all", print.tabs = TRUE)) t2 <- utility.tables(s1, ods, tables = "twoway") print(t2, max.scale = 3) (t3 <- utility.tables(s1, ods, tab.stats = "all", tables = "threeway", third.var = "sex", print.tabs = TRUE)) (t4 <- utility.tables(s1, ods, tab.stats = "all", tables = "threeway", third.var = "sex", useNA = FALSE, print.tabs = TRUE)) (t5 <- utility.tables(s1, ods, tab.stats = "all", print.tabs = TRUE))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.