Description Usage Arguments Details Value References See Also Examples

Produce tables from observed and synthesized data and calculates utility measures to compare them with their expectation if the synthesising model is correct.

1 2 3 4 5 6 7 8 | ```
utility.tab(object, data, vars = NULL, ngroups = 5, useNA = TRUE,
print.tables = length(vars) < 4, print.stats = 'VW',
print.zdiff = FALSE, digits = 2, ...)
## S3 method for class 'utility.tab'
print(x, print.tables = x$print.tables,
print.zdiff = x$print.zdiff, print.stats = x$print.stats,
digits = x$digits, ...)
``` |

`object` |
an object of class |

`data` |
the original (observed) data set. |

`vars` |
a single string or a vector of strings with the names of variables to be used to form the table. |

`ngroups` |
if numerical (non-factor) variables are included they will be
classified into this number of groups to form tables. Classification is
performed using |

`useNA` |
determines if NA values are to be included in tables. |

`print.tables` |
a logical value that determines if tables of observed and synthesised are to be printed. |

`print.stats` |
Determines which chi-squred statistics to print to compare the observed and synthetic tables : 'VW' for Voas Williams, 'FT' for Freeman Tukey or c('VW','FT') for both. |

`print.zdiff` |
a logical value that determines if tables of Z scores for differences between observed and expected are to be printed. |

`digits` |
an integer indicating the number of decimal places
for printing statistics, |

`...` |
additional parameters; can be passed to classIntervals() function. |

`x` |
an object of class |

Forms tables of observed and synthesised values for the variables
specified in `vars`

. Two utility measures are calculated from the cells
of the tables, a measure of fit proposed by Voas and Williams
`sum((observed-synthesied)^2/[(observed + synthesised)/2)])`

and one
proposed by Freeman and Tukey `4*sum((observed^(0.5)-synthesised^(0.5))^2))`

.
In both cases those cells where observed and synthesised are both zero do not
contribute to the sum. If the synthesising model is correct both of these
measures should have chi-square distributions for large samples.

An object of class `utility.tab`

which is a list with the following
components:

`m` |
number of synthetic data sets in object, i.e. |

`tab.obs` |
a table from the observed data. |

`UtabFT` |
a vector with |

`UtabVW` |
a vector with |

`df` |
a vector of degrees of freedom for the chi-square tests which equal to one minus the number of cells in the table with any observed or synthesised counts. |

`ratioFT` |
a vector with ratios of |

`ratioVW` |
a vector with ratios of |

`pvalFT` |
a vector with |

`pvalVW` |
a vector with |

`nempty` |
a vector of length |

`tab.obs` |
a table from the observed data. |

`tab.syn` |
a table or a list of |

`tab.zdiff` |
a table or a list of |

`n` |
number of observation in the original dataset. |

Nowok, B., Raab, G.M and Dibben, C. (2016). synthpop: Bespoke
creation of synthetic data in R. *Journal of Statistical Software*,
**74**(11), 1-26. doi: 10.18637/jss.v074.i11.

Read, T.R.C. and Cressie, N.A.C. (1988) *Goodness–of–Fit Statistics for
Discrete Multivariate Data*, Springer–Verlag, New York.

Voas, D. and Williamson, P. (2001) Evaluating goodness-of-fit measures for
synthetic microdata. *Geographical and Environmental Modelling*,
**5**(2), 177-200.

1 2 3 4 5 6 7 8 9 | ```
ods <- SD2011[1:1000, c("sex", "age", "edu", "marital")]
s1 <- syn(ods, m = 10)
utility.tab(s1, ods, vars = c("marital", "sex"))
s2 <- syn(ods, m = 1)
utility.tab(s2, ods, vars = c("marital", "age"), ngroups = 3, print.tables = TRUE)
u2 <- utility.tab(s2, ods, vars = c("marital", "age"), style = "pretty")
print(u2, print.tables = TRUE, print.zdiff = TRUE)
``` |

bnowok/synthpop documentation built on May 27, 2019, 7:25 a.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.