ds.table | R Documentation |
Creates 1-dimensional, 2-dimensional and 3-dimensional
tables using the table
function in native R.
ds.table(
rvar = NULL,
cvar = NULL,
stvar = NULL,
report.chisq.tests = FALSE,
exclude = NULL,
useNA = "always",
suppress.chisq.warnings = FALSE,
table.assign = FALSE,
newobj = NULL,
datasources = NULL,
force.nfilter = NULL
)
rvar |
is a character string (in inverted commas) specifying the name of the variable defining the rows in all of the 2 dimensional tables that form the output. Please see 'details' above for more information about one-dimensional tables when a variable name is provided by <rvar> but <cvar> and <stvar> are both NULL |
cvar |
is a character string specifying the name of the variable defining the columns in all of the 2 dimensional tables that form the output. |
stvar |
is a character string specifying the name of the variable that indexes the separate two dimensional tables in the output if the call specifies a 3 dimensional table. |
report.chisq.tests |
if TRUE, chi-squared tests are applied to every 2 dimensional table in the output and reported as "chisq.test_table.name". Default = FALSE. |
exclude |
this argument is passed through to the |
useNA |
this argument is passed through to the |
suppress.chisq.warnings |
if set to TRUE, the default warnings are
suppressed that would otherwise be produced by the |
table.assign |
is a Boolean argument set by default to FALSE. If it is
FALSE the |
newobj |
this a character string providing a name for the output
table object to be written to the serverside if <table.assign> is TRUE.
If no explicit name for the table object is specified, but <table.assign>
is nevertheless TRUE, the name for the serverside table object defaults
to |
datasources |
a list of |
force.nfilter |
if <force.nfilter> is non-NULL it must be specified as
a positive integer represented as a character string: e.g. "173". This
the has the effect of the standard value of 'nfilter.tab' (often 1, 3, 5 or 10
depending what value the data custodian has selected for this particular
data set), to this new value (here, 173). CRUCIALLY, the |
The ds.table
function selects numeric, integer or factor
variables on the serverside which define a contingency table with up to
three dimensions. The native R table
function basically operates on
factors and if variables are specified that are integers or numerics
they are first coerced to factors. If the 1-dimensional, 2-dimensional or
3-dimensional table generated from a given study satisfies appropriate
disclosure-control criteria it can be returned directly to
the clientside where it is presented as a study-specific
table and is also included in a combined table across all studies.
The data custodian responsible for data security in a given study can specify the minimum non-zero cell count that determines whether the disclosure-control criterion can be viewed as having been met. If the count in any one cell in a table falls below the specified threshold (and is also non-zero) the whole table is blocked and cannot be returned to the clientside. However, even if a table is potentially disclosive it can still be written to the serverside while an empty representation of the structure of the table is returned to the clientside. The contents of the cells in the serverside table object are reflected in a vector of counts which is one component of that table object.
The true counts in the studyside vector
are replaced by a sequential set of cell-IDs running from 1:n
(where n is the total number of cells in the table) in the empty
representation of the structure of the potentially disclosive table
that is returned to the clientside. These cell-IDs reflect
the order of the counts in the true counts vector on the serverside.
In consequence, if the number 13 appears in a cell of the empty
table returned to the clientside, it means that the true count
in that same cell is held as the 13th element of the true count
vector saved on the serverside. This means that a data analyst
can still make use of the counts from a call to the ds.table
function to drive their ongoing analysis even when one or
more non-zero cell counts fall below the specified threshold
for potential disclosure risk.
Because the table object on the serverside cannot be visualised or transferred to the clientside, DataSHIELD ensures that although it can, in this way, be used to advance analysis, it does not create a direct risk of disclosure.
The <rvar> argument identifies the variable defining the rows in each of the 2-dimensional tables produced in the output.
The <cvar> argument identifies the variable defining the columns in the 2-dimensional tables produced in the output.
In creating a 3-dimensional table the
<stvar> ('separate tables') argument identifies the variable that
indexes the set of two dimensional tables in the output ds.table
.
As a minor technicality, it should be noted that if a 1-dimensional table is required, one only need specify a value for the <rvar> argument and any one dimensional table in the output is presented as a row vectors and so technically the <rvar> variable defines the columns in that 1 x n vector. However, the ds.table function deals with 1-dimensional tables differently to 2 and 3 dimensional tables and key components of the output for one dimensional tables are actually two dimensional: with rows defined by <rvar> and with one column for each of the studies.
The output list generated by ds.table
contains tables based on counts
named "table.name_counts" and other tables reporting corresponding
column proportions ("table.name_col.props") or row proportions
("table.name_row.props"). In one dimensional tables in the output the
output tables include _counts and _proportions. The latter are not
called _col.props or _row.props because, for the reasons noted
above, they are technically column proportions but are based on the
distribution of the <rvar> variable.
If the <report.chisq.tests> argument is set to TRUE, chisq tests are applied to every 2-dimensional table in the output and reported as "chisq.test_table.name". The <report.chisq.tests> argument defaults to FALSE.
If there is at least one expected cell counts < 5 in an output table, the native R <chisq.test> function returns a warning. Because in a DataSHIELD setting this often means that every study and several tables may return the same warning and because it is debatable whether this warning is really statistically important, the <suppress.chisq.warnings> argument can be set to TRUE to block the warnings. However, it is defaulted to FALSE.
Having created the requested table based on serverside data it is returned to the clientside for the analyst to visualise (unless it is blocked because it fails the disclosure control criteria or there is an error for some other reason).
The clientside output from
ds.table
includes error messages that identify when the creation of a
table from a particular study has failed and why. If table.assign=TRUE,
ds.table
also writes the requested table as an object named by
the <newobj> argument or set to 'newObj' by default.
Further information about the visible material passed to the clientside, and the optional table object written to the serverside can be seen under 'details' (above).
Paul Burton and Alex Westerberg for DataSHIELD Development Team, 01/05/2020
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.