library(knitr) library(statTools) opts_chunk$set(collapse = T, comment = "#>", eval = FALSE)
When performing a statistical analysis most of the times differences between two or more groups need to be assessed. bivarTable
function is thought to address this issue constructing a table with the results from the comparison. It is particularly useful when searching for confounding variables or simply assessing differences in the outcomes, adjusting to avoid confusion.
The aim is to provide a useful function in R to allow statisticians to perform this kind of analyses quicker and easier, saving a lot of time. However, bivarTable
is very flexible and can assess different hypothesis at a time, allowing different arguments to customize the tables.
The package statTools also includes an export
method, that exports the created tables to latex (via xtable), xlsx (via openxlsx) or csv.
bivarTable
constructs a matrix with the results of comparing variables by groups. The levels of the grouping variable will be displayed in columns, and in the rows all the variables in which the differences will be assessed. An extra column called All
will be always displayed, with the overall description of the sample. All this structure is summarized with a formula (argument X
), where the LHS indicates the variable displayed in the columns (i.e. grouping variable) and the RHS all the variables in the rows. To sum up:
Grouping Variable ~ Variable 1 + Variable 2 + ... + Variable n
or equivalently,
Columns ~ Row 1 + Row 2 + ... + Row n
Notice that ONLY one term is allowed in the LHS and interaction terms are not handled (only additive terms). In addition, since the names of the terms of the formula are assumed to be columns in the argument data
, the dot .
can be used to add all other variables in the data not mentioned in the formula. You can make use also of the -
operator to remove terms from the formula. When LHS is not provided, only a table with the overall description is produced. The following code, depicts the structure explained, using the iris dataset as example.
bivarTable(Species ~ ., iris)
An unquoted matrix is printed (internally returns a list where the first argument is the table printed), with the totals and the levels of Species
(LHS) as columns and all the remaining variables in iris
as rows. Notice also that:
This is one of the most basic tables that can be generated with bivartable
. Other arguments than can control the output produced by default:
margin = 1
instead of the default margin = 2
. It is passed to prop.table
. If it is a vector 1:2 or 2:1, both percentages are calculated.rounding
. By default only two decimals are printed: rounding = 2
. This argument reads in fact the element rounding
from options. condense.binary.factors
to FALSE
to display both levels (rows).factor
function. This removes unused categories. To change that, use drop.y = FALSE
.drop.x
to FALSE
. These agruments can change the number of rows and columns and must be used carefully because they can lead (and avoid) to errors. descr_var()
. If FALSE
no quantiles are calculated. If the quantiles are required, show.quantiles
will be passed to function quantile
as argument probs
.Most of these arguments can be set by default throughout all the session via options(argument = <value>)
. We will add some qualitative variables to the iris
dataset to better illustrate the effect of some of these arguments:
iris2 <- iris iris2$Sepal.Total.CAT2 <- cut(iris$Sepal.Width + iris$Sepal.Length, breaks = 2) iris2$Sepal.Total.CAT3 <- cut(iris$Sepal.Width + iris$Sepal.Length, breaks = 3) iris2$Sepal.Total.CAT5 <- cut(iris$Sepal.Width + iris$Sepal.Length, breaks = 5) bivarTable(Species ~ . + log(Sepal.Length), iris2, rounding = 1, margin = 2, condense.binary.factors = TRUE) bivarTable(Species ~ . + log(Sepal.Length), iris2, margin = 1, condense.binary.factors = FALSE) bivarTable(Species ~ . + log(Sepal.Length), iris2, margin = 1, condense.binary.factors = FALSE, show.quantiles = c(0, 0.5, 1))
Notice that the formula allows transformations of the variables like logarithm in this case.
Once the basic structure of the table is set, extra columns can be added with the p-values resulting from tests assessing differences between groups in each of the row variables. Argument test
controls this issue. By default, test = "none"
, no tests are performed. However, there are other options for this argument: "parametric"
, "non-parametric"
and "both"
. These options will perform the tests specified in the elements "parametric.tests"
and "non.parametric.tests"
from options. One parametric test and one non-parametric test are defined for every possible scenario respectively.
t.test
) or Mann-Whitney test (wilcox.test
).oneway.test
) or Kruskal-Wallis test (kruskal.test
).chisq.test
) or Fisher test (fisher.test
).When test = "both"
, both parametric and non-parametric tests are performed. Here you have an example of the use of both tests: two new columns are added.
bivarTable(Species ~ . + log(Sepal.Length), iris2, margin = 1, # structure test = "both") # tests
I like personally to use one line for the structure and options (two lines if many options are needed), another line for the tests. The other arguments left to explain are fit.model
, outcome
, FUN.model
and ...
, wich would be specified in new lines. These ones are explained in the next section.
Moreover, all the p-values will be rounded using the function given the first element in options()$p.value.rounding
. The other elements will be arguments passed to the first one called by do.call
. The default function used for this is pvalue
, which rounds significant p-values to the first significant digit and non-significant p-values to two decimal places. See ?pvalue
.
If wanted, models can be fitted for every row variable with the grouping variable. The argument fit.model
is a NAMED list of formulas that are passed to update.formula
. For every formula in this list, an extra columnt will be added with the p-values of the significance of the first variable in the model (not counting the intercept), using drop1
. To specify this formulas, make use of the .
to refer either the row variable or the grouping variable. The argument outcome
, controls whether the output variable is the one in the columns or in the rows in all the models performed (1 means the row variable is the outcome and 2 for the columns). Depending on that, a linear model is performed if the output is quantitative or a factor with more than two levels (using the order). If the output variable is a binary factor, a logistic regression model is fitted instead.
For instance, we could add the p-value of the model adjusted by Petal.Length
for every variable in iris2
(except for the one we are adjusting by) by doing:
bivarTable(Species ~ . - Petal.Length, iris2, test = "non-par", fit.model = list(model1 = ~ . + Petal.Length), output = 1)
Always write the dot first, because the p-value provided would come from that variable. In this case, the dot is exchangable for Species
, because the output are the variables in the rows in each model, so Species
is always the input. However, in the next example it is not.
bivarTable(Species ~ . - Petal.Length, iris2, test = "non-par", fit.model = list(adjusted = ~ . + Petal.Length, interaction = ~ . * Petal.Length), output = 2) bivarTable(Species ~ . - Petal.Length, iris2, test = "non-par", fit.model = list(adjusted = ~ . + Petal.Length, interaction = ~ . * Petal.Length), output = 2, pvalue.model = FALSE)
In this example, another model with the interaction is fitted, so two new columns are added. However, the p-value from the variable in the row is the only one that is provided by default. If you want to print the p-value from the interaction you need to use the FUN.model
argument which is explained in the next section.
Note that in the second call, the argument pvalue.model
controls whether to show or not the drop1
p-values from every model.
Once a model if fitted, it can be used as the argument to many functions. These functions have to be specified in the argument FUN.model
. FUN.model
a list whose elements are vectors of characters with the names of the functions to be applied to every model. The name of these elements indicates which of the models in fit.model
will be used. The model will be passed as the first argument to every function. You can use your own defined functions or the ones already in this package like:
getBetaSd
.getPval
.getORCI
.nagelkerke
or adjNagelkerke
for the adjusted.GoF
.When you want to display the results for every level of the row variable in its own row, use the vertical version of the functions: verticalgetBetaSd
, verticalgetPval
and verticalgetORCI
.
To change the arguments of these functions, add an argument (in ...
) with the same name of the model in which this functions will be applied. This argument will be a list whose elements will be lists with the arguments to pass to every function. The names of these lists indicates the function whose arguments belong to. For instance, let's write the code for displaying the AIC of the interaction model in the last example.
bivarTable(Species ~ . - Petal.Length, iris2, test = "non-par", fit.model = list(adjusted = ~ . + Petal.Length, # model adjusted by Petal.Length interaction = ~ . * Petal.Length), # model with the interaction output = 1, # row variables are the outputs FUN.model = list(interaction = "AIC"), # for the 'interaction' model # use 'AIC' with default args ) bivarTable(Species ~ . - Petal.Length, iris2, test = "non-par", fit.model = list(adjusted = ~ . + Petal.Length, # model adjusted by Petal.Length interaction = ~ . * Petal.Length), # model with the interaction output = 1, # row variables are the outputs FUN.model = list(interaction = c("AIC", "BIC"), # for the 'interaction' model adjusted = "AIC"), # use function 'AIC' and 'BIC' # just 'AIC' for the 'adjusted' interaction = list(AIC = list(k = 10)) # for the function 'AIC' from # the model 'interaction' set arg # k=10 and default args for 'BIC' )
The second call, returns the AIC from the model interaction
using k = 10
and the BIC with default parameters. From the model adjusted
returns the AIC with default arguments.
export
method allows to export and save bivarTable objects to xlsx with a fancy style. You need openxlsx for this.It allows to export or save in latex, using xtable, or csv. For xlsx automatically opens the file with EXCEL.exe.
bvt <- bivarTable(Species ~ . - Petal.Length, iris2, test = "non-par", fit.model = list(adjusted = ~ . + Petal.Length, interaction = ~ . * Petal.Length), output = 2) export(bvt, type = "xlsx") export(bvt, file = "myBivarTable.xlsx")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.