Nothing
#' Create Table One Describing Baseline Characteristics
#'
#' Create a table summarizing continuous, categorical and dichotomous
#' variables, optionally stratified by one or more variables, while performing
#' adequate statistical tests.
#'
#' In research the characteristics of study populations are often characterised
#' through some kind of a "Table 1", containing descriptives of the used
#' variables, as mean/standard deviation for continuous variables, and
#' proportions for categorical variables. In many cases, a comparison is made
#' between groups within the framework of the scientific question.
#'
#' \preformatted{ var Brent Camden Westminster n 474 (39.5%) 344 (28.7%) 381
#' (31.8%) temperature 51.1 (8.7) 47.4 (10.1) 44.3 (9.8) *** ' driver *** "'
#' Butcher 72 (15.2%) 1 (0.3%) 22 (5.8%) Carpenter 29 (6.1%) 19 (5.6%) 221
#' (58.2%) Carter 177 (37.4%) 47 (13.8%) 5 (1.3%) Farmer 19 (4.0%) 87 (25.5%)
#' 11 (2.9%) Hunter 128 (27.1%) 4 (1.2%) 24 (6.3%) Miller 6 (1.3%) 41 (12.0%)
#' 77 (20.3%) Taylor 42 (8.9%) 142 (41.6%) 20 (5.3%) rabate (= TRUE) 235
#' (50.3%) 172 (50.3%) 184 (48.7%) "' --- ') Kruskal-Wallis test, ") Fisher
#' exact test, "') Chi-Square test Signif. codes: 0 '***' 0.001 '**' 0.01 '*'
#' 0.05 '.' 0.1 ' ' 1 }
#'
#' Creating such a table can be very time consuming and there's a need for a
#' flexible function that helps us to solve the task. \code{TOne()} is designed
#' to be easily used with sensible defaults, and yet flexible enough to allow
#' free definition of the essential design elements.
#'
#' This is done by breaking down the descriptive task to three types of
#' variables: quantitative (numeric, integer), qualitative (factor, characters)
#' and dichotomous variables (the latter having exactly two values or levels).
#' Depending on the variable type, the descriptives and the according sensible
#' tests are chosen. By default mean/sd are chosen to describe numeric
#' variables. \preformatted{ FUN = function(x) gettextf("%s / %s",
#' Format(mean(x, na.rm = TRUE), digits = 1), Format(sd(x, na.rm = TRUE),
#' digits = 3)) }
#'
#' Their difference is tested with the Kruskal-Wallis test. For categorical
#' variables the absolute and relative frequencies are calculated and tested
#' with a chi-square test. \cr The tests can be changed with the argument
#' \code{TEST}. These must be organised as list containing elements named
#' \code{"num"}, \code{"cat"} and \code{"dich"}. Each of them must be a
#' function with arguments \code{(x, g)}, returning something similar to a
#' p-value. \preformatted{ TEST = list( num = list(fun = function(x,
#' g){summary(aov(x ~ g))[[1]][1, "Pr(>F)"]}, lbl = "ANOVA"), cat = list(fun =
#' function(x, g){chisq.test(table(x, g))$p.val}, lbl = "Chi-Square test"),
#' dich = list(fun = function(x, g){fisher.test(table(x, g))$p.val}, lbl =
#' "Fisher exact test")) } The legend text of the test, which is appended to
#' the table together with the significance codes, can be set with the variable
#' \code{lbl}.
#'
#' Great importance was attached to the free definition of the number formats.
#' By default, the optionally definable format templates of \bold{DescTools}
#' are used. Deviations from this can be freely passed as arguments to the
#' function. Formats can be defined for integers, floating point numbers,
#' percentages and for the p-values of statistical tests. All options of the
#' function \code{\link{Format}()} are available and can be provided as a list.
#' See examples which show several different implementations. \preformatted{
#' fmt = list(abs = Fmt("abs"), num = Fmt("num"), per = Fmt("per"), pval =
#' as.fmt(fmt = "*", na.form = " ")) }
#'
#' The function returns a character matrix as result, which can easily be
#' subset or combined with other matrices. An interface for
#' \code{\link{ToWrd}()} is available such that the matrix can be transferred
#' to MS-Word. Both font and alignment are freely selectable in the Word table.
#'
#'
#' @param x a data.frame containing all the variables to be included in the
#' table.
#' @param grp the grouping variable.
#' @param add.length logical. If set to \code{TRUE} (default), a row with the
#' group sizes will be inserted as first row of the table.
#' @param colnames a vector of column names for the result table.
#' @param vnames a vector of variable names to be placed in the first column
#' instead of the real names.
#' @param total logical (default \code{TRUE}), defines whether the results
#' should also be displayed for the whole, ungrouped variable.
#' @param align the character on whose position the strings will be aligned.
#' Left alignment can be requested by setting \code{sep = "\\l"}, right
#' alignment by \code{"\\r"} and center alignment by \code{"\\c"}. Mind the
#' backslashes, as if they are omitted, strings would be aligned to the
#' \bold{character} \bold{l}, \bold{r} or \bold{c} respectively. Default value
#' is \code{"\\l"}, thus left alignment.
#' @param FUN the function to be used as location and dispersion measure for
#' numeric (including integer) variables (\code{mean}/\code{sd} is default,
#' alternatives as \code{median}/\code{IQR} are possible by defining a
#' function). See examples.
#'
#' @param TEST a list of functions to be used to test the variables. Must be
#' named as \code{"num"}, \code{"cat"} and \code{"dich"} and be defined as
#' function with arguments \code{(x, g)}, generating something similar to a
#' p-value. Use \code{TEST=NA} to suppress test. (See examples.)
#'
#' @param intref one out of \code{"high"} (default) or \code{"low"}, defining
#' which value of a dichotomous numeric or logical variable should be reported.
#' Usually this will be \code{1} or \code{TRUE}. Setting it to \code{"low"}
#' will report the lower value \code{0} or \code{FALSE}.
#'
#' @param fmt format codes for absolute, numeric and percentage values, and for
#' the p-values of the tests.
#'
#' @return a character matrix
#'
#' @author Andri Signorell <andri@@signorell.net>
#'
#' @seealso \code{\link{WrdTable}()}, \code{\link{ToWrd.TOne}()}
#'
#' @keywords IO
#' @examples
#'
#' options(scipen = 8)
#' opt <- DescToolsOptions()
#'
#' # define some special formats for count data, percentages and numeric results
#' # (those will be supported by TOne)
#' Fmt(abs = as.fmt(digits = 0, big.mark = "'")) # counts
#' Fmt(per = as.fmt(digits = 1, fmt = "%")) # percentages
#' Fmt(num = as.fmt(digits = 1, big.mark = "'")) # numeric
#'
#' TOne(x = d.pizza[, c("temperature", "delivery_min", "driver", "wine_ordered")],
#' grp = d.pizza$quality)
#'
#' # the same but no groups now...
#' TOne(x = d.pizza[, c("temperature", "delivery_min", "driver", "wine_ordered")])
#'
#' # define median/IQR as describing functions for the numeric variables
#' TOne(iris[, -5], iris[, 5],
#' FUN = function(x) {
#' gettextf("%s / %s",
#' Format(median(x, na.rm = TRUE), digits = 1),
#' Format(IQR(x, na.rm = TRUE), digits = 3))
#' }
#' )
#'
#' # replace kruskal.test by ANOVA and report the p.value
#' # Change tests for all the types
#' TOne(x = iris[, -5], grp = iris[, 5],
#' FUN = function(x) gettextf("%s / %s",
#' Format(mean(x, na.rm = TRUE), digits = 1),
#' Format(sd(x, na.rm = TRUE), digits = 3)),
#'
#' TEST = list(
#' num = list(fun = function(x, g){summary(aov(x ~ g))[[1]][1, "Pr(>F)"]},
#' lbl = "ANOVA"),
#' cat = list(fun = function(x, g){chisq.test(table(x, g))$p.val},
#' lbl = "Chi-Square test"),
#' dich = list(fun = function(x, g){fisher.test(table(x, g))$p.val},
#' lbl = "Fisher exact test")),
#' fmt = list(abs = Fmt("abs"), num = Fmt("num"), per = Fmt("per"),
#' pval = as.fmt(fmt = "*", na.form = " "))
#' )
#'
#' t1 <- TOne(x = d.pizza[,c("temperature", "driver", "rabate")],
#' grp = d.pizza$area,
#' align = " ",
#' total = FALSE,
#'
#' FUN = function(x) gettextf("%s / %s (%s)",
#' Format(mean(x, na.rm = TRUE), digits = 1),
#' Format(sd(x, na.rm = TRUE), digits = 3),
#' Format(median(x, na.rm = TRUE), digits = 1)),
#'
#' TEST = NA,
#'
#' fmt = list(abs = as.fmt(big.mark = " ", digits=0),
#' num = as.fmt(big.mark = " ", digits=1),
#' per = as.fmt(fmt=function(x)
#' StrPad(Format(x, fmt="%", d=1), width=5, adj = "r")),
#' pval = as.fmt(fmt = "*", na.form = " "))
#' )
#' # add a userdefined legend
#' attr(t1, "legend") <- "numeric: mean / sd (median)), factor: n (n%)"
#'
#' t1
#'
#'
#' # dichotomous integer or logical values can be reported by the high or low value
#' x <- sample(x = c(0, 1), size = 100, prob = c(0.3, 0.7), replace = TRUE)
#' y <- sample(x = c(0, 1), size = 100, prob = c(0.3, 0.7), replace = TRUE) == 1
#' z <- factor(sample(x = c(0, 1), size = 100, prob = c(0.3, 0.7), replace = TRUE))
#' g <- sample(x = letters[1:4], size = 100, replace = TRUE)
#' d.set <- data.frame(x = x, y = y, z = z, g = g)
#'
#' TOne(d.set[1:3], d.set$g, intref = "low")
#'
#' TOne(d.set[1:3], d.set$g, intref = "high")
#'
#' # intref would not control factors, use relevel to change reported value
#' TOne(data.frame(z = relevel(z, "1")), g)
#'
#' TOne(data.frame(z = z), g)
#'
#' options(opt)
#'
#'
#' \dontrun{
#'
#' # Send the whole stuff to Word
#' wrd <- GetNewWrd()
#' ToWrd(
#' TOne(x = d.pizza[, c("temperature", "delivery_min", "driver", "wine_ordered")],
#' grp = d.pizza$quality,
#' fmt = list(num=Fmt("num", digits=1))
#' ),
#' font = list(name="Arial narrow", size=8),
#' align = c("l","r") # this will be recycled: left-right-left-right ...
#' )
#' }
#'
TOne <- function(x, grp = NA, add.length=TRUE,
colnames=NULL, vnames=NULL, total=TRUE,
align="\\l", FUN = NULL, TEST = NULL, intref="high",
fmt=list(abs = Fmt("abs"),
num = Fmt("num"), per=Fmt("per"),
pval = as.fmt(fmt = "*", na.form = " ")) ) {
# set the formats, take the provided fmt and combine with defaults
fmt <- c(fmt,
list(abs = Fmt("abs"),
num = Fmt("num"),
per=Fmt("per"),
pval = as.fmt(fmt = "*", na.form = " ")))
# use the first instance, so user defined formats are preferred
# and the standards come into effect if there are no user specifications
fmt <- fmt[!duplicated(fmt)]
# we could restrict the names here to c("abs","num","per","pval")
# set the variablenames per row
if(is.null(vnames)){
vnames <- if(is.null(colnames(x))) "Var1" else colnames(x)
default_vnames <- TRUE
} else {
default_vnames <- TRUE
}
# creates the table one in a study
if(is.null(FUN)){
num_fun <- function(x){
# the cell for numeric data
gettextf("%s (%s)",
Format(mean(x, na.rm=TRUE), fmt=fmt$num),
Format(sd(x, na.rm=TRUE), fmt=fmt$num))
}
} else {
num_fun <- FUN
}
if(identical(grp, NA)){
# no grouping factor, let's define something appropriate
grp <- rep(1, nrow(x))
TEST <- NA
}
if(identical(TEST, NA)){
TEST <- list(num=list(fun=function(x, g) 1, lbl="None"),
cat=list(fun=function(x, g) 1, lbl="None"),
dich=list(fun=function(x, g) 1, lbl="None"))
notest <- TRUE
} else {
# the default tests for quantitative and categorical data
TEST.def <- list(num=list(fun=function(x, g){kruskal.test(x, g)$p.val},
lbl="Kruskal-Wallis test"),
cat=list(fun=function(x, g){chisq.test(table(x, g))$p.val},
lbl="Chi-Square test"),
dich=list(fun=function(x, g){fisher.test(table(x, g))$p.val},
lbl="Fisher exact test"))
if(is.null(TEST)) # the defaults
TEST <- TEST.def
# define test for the single tests
if(is.null(TEST[["num"]]))
TEST[["num"]] <- TEST.def[["num"]]
if(is.null(TEST[["cat"]]))
TEST[["cat"]] <- TEST.def[["cat"]]
if(is.null(TEST[["dich"]]))
TEST[["dich"]] <- TEST.def[["dich"]]
notest <- FALSE
}
num_test <- TEST[["num"]]$fun
cat_test <- TEST[["cat"]]$fun
dich_test <- TEST[["dich"]]$fun
num_row <- function(x, g, total=TRUE, vname = deparse(substitute(x))){
if(!identical(g, NA)) {
res <- Format(num_test(x, g), fmt=fmt$pval)
num_test_label <- names(res)
} else {
res <- ""
}
return(
cbind(var=vname, total = num_fun(x), rbind(tapply(x, g, num_fun)),
paste(res, .FootNote(1)))
)
}
cat_mat <- function(x, g, vname=deparse(substitute(x))){
if(inherits(x, "character"))
x <- factor(x)
tab <- table(x, g)
ptab <- prop.table(tab, margin = 2)
tab <- addmargins(tab, 2)
ptab <- cbind(ptab, Sum=prop.table(table(x)))
# crunch tab and ptab
m <- matrix(NA, nrow=nrow(tab), ncol=ncol(tab))
m[,] <- gettextf("%s (%s)",
Format(tab, fmt=fmt$abs),
Format(ptab, fmt=fmt$per))
# totals to the left
m <- m[, c(ncol(m), 1:(ncol(m)-1))]
# set rownames
m <- cbind( c(vname, paste(" ", levels(x))),
rbind("", m))
# add test
if(nrow(tab)>1)
# p <- chisq.test(tab)$p.value
p <- cat_test(x, g)
else
p <- NA
m <- cbind(m, c(paste(Format(p, fmt=fmt$pval), ifelse(is.na(p), "", .FootNote(3))), rep("", nlevels(x))))
# this reduces binary categories to a single flag, which should not be necessary here,
# as it would be handled be dich_mat
# if(nrow(m) <=3) {
# m[2,1] <- gettextf("%s (= %s)", m[1, 1], row.names(tab)[1])
# m <- m[2, , drop=FALSE]
# }
colnames(m) <- c("var","total", head(colnames(tab), -1), "")
return(m)
}
dich_mat <- function(x, g, vname=deparse(substitute(x))){
tab <- table(x, g)
if(identical(dim(tab), c(2L,2L))){
# p <- fisher.test(tab)$p.value
p <- dich_test(x, g)
foot <- .FootNote(2)
} else {
# p <- chisq.test(tab)$p.value
p <- cat_test(x, g)
foot <- .FootNote(3)
}
ptab <- prop.table(tab, 2)
tab <- addmargins(tab, 2)
ptab <- cbind(ptab, Sum = prop.table(tab[,"Sum"]))
m <- matrix(NA, nrow=nrow(tab), ncol=ncol(tab))
m[,] <- gettextf("%s (%s)",
Format(tab, fmt=fmt$abs),
Format(ptab, fmt=fmt$per))
# totals to the left
m <- m[, c(ncol(m), 1:(ncol(m)-1)), drop=FALSE]
m <- rbind(c(vname, m[1,], paste(Format(p, fmt=fmt$pval), foot)))
colnames(m) <- c("var","total", head(colnames(tab), -1), "")
return(m)
}
if(!identical(x, NA)) {
# NA is handled as subtitle
intref <- match.arg(intref, choices = c("high", "low", "both"))
if(mode(x) %in% c("logical","numeric","complex","character"))
x <- data.frame(x)
# find description types
ctype <- sapply(x, class)
# should we add "identical type": only one value??
ctype[sapply(x, IsDichotomous, strict=TRUE, na.rm=TRUE)] <- "dich"
ctype[sapply(ctype, function(x) any(x %in% c("numeric","integer")))] <- "num"
ctype[sapply(ctype, function(x) any(x %in% c("factor","ordered","character")))] <- "cat"
lst <- list()
for(i in 1:ncol(x)){
if(ctype[i] == "num"){
lst[[i]] <- num_row(x[,i], grp, vname=vnames[i])
} else if(ctype[i] == "cat") {
lst[[i]] <- cat_mat(x[,i], grp, vname=vnames[i])
} else if(ctype[i] == "dich") {
if(intref=="both"){
lst[[i]] <- cat_mat(factor(x[,i]), grp, vname=vnames[i])
} else {
# refactor all types, numeric, logic but not factors and let user choose
# the level to be reported.
if(!is.factor(x[, i])) { # should only apply to boolean integer or numerics
xi <- factor(x[, i])
} else {
xi <- x[, i]
}
if(match.arg(intref, choices = c("high", "low", "both")) == "high")
xi <- relevel(xi, tail(levels(xi), 1))
if (default_vnames) {
lst[[i]] <- dich_mat(xi, grp, vname = gettextf("%s (= %s)", vnames[i], head(levels(xi), 1)))
} else {
lst[[i]] <- dich_mat(xi, grp, vname = gettextf("%s", vnames[i]))
}
}
} else {
lst[[i]] <- rbind(c(colnames(x)[i], rep(NA, nlevels(grp) + 2)))
}
}
} else {
m <- cat_mat(grp, grp, vnames)
lst <- list(c(vnames, rep("", ncol(m)-1)))
}
res <- do.call(rbind, lst)
if(add.length)
res <- rbind(c("n", c(Format(sum(!is.na(grp)), fmt=fmt$abs),
paste(Format(table(grp), fmt=fmt$abs), " (",
Format(prop.table(table(grp)), fmt=fmt$per), ")", sep=""), ""))
, res)
# align the table
if(align != "\\l")
res[,-c(1, ncol(res))] <- StrAlign(res[,-c(1, ncol(res))], sep = align)
if(all(grp==1)){
res <- res[, -3]
total <- TRUE
}
if(!total)
res <- res[, -2]
if(!is.null(colnames))
colnames(res) <- rep(colnames, length.out=ncol(res))
if(!notest)
attr(res, "legend") <- gettextf("%s) %s, %s) %s, %s) %s\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1",
.FootNote(1), TEST[["num"]]$lbl, .FootNote(2), TEST[["dich"]]$lbl, .FootNote(3), TEST[["cat"]]$lbl)
else {
attr(res, "legend") <- ""
res <- res[, -ncol(res)]
}
class(res) <- "TOne"
return(res)
}
.FootNote <- function(i){
# internal function, not exported
# x <- getOption("footnote")
x <- DescToolsOptions("footnote")
if(is.null(x))
x <- c("'", '"', '""')
return(x[i])
}
# Old, replaced by 0.99.54.6:
# print.TOne <- function(x, ...){
#
# cat("\n")
#
# write.table(format(rbind(colnames(x), x), justify="left"),
# row.names=FALSE, col.names=FALSE, quote=FALSE)
#
# if(!is.null(attr(x, "legend"))){
# cat("---\n")
# cat(attr(x, "legend"), "\n")
# }
# cat("\n")
#
# }
print.TOne <- function(x, ...){
cat("\n")
if(.has_color()){
t1 <- as.data.frame.matrix(x)
colnames(t1) <- colnames(x)
out <- capture.output(print((t1), right=FALSE, sep=" ",
print.gap=3, col.names=FALSE))
cat(cli::style_bold(out[1]))
print(unname(t1), right=FALSE, sep=" ", print.gap=3, col.names=FALSE)
if(!is.null(attr(x, "legend"))){
cat(cli::col_silver("---\n"))
cat(cli::col_silver(attr(x, "legend"), "\n"))
}
cat("\n")
} else {
write.table(format(rbind(colnames(x), x), justify="left"),
row.names=FALSE, col.names=FALSE, quote=FALSE)
if(!is.null(attr(x, "legend"))){
cat("---\n")
cat(attr(x, "legend"), "\n")
}
cat("\n")
}
}
# subsetting TOne
`[.TOne` <- function(x, i, j, ..., drop=FALSE) {
# subset main character matrix, don't drop structure by default
res <- unclass(x)[i, j, drop=drop]
# attribute dim should not be restore all relevant attributes
attr(res, "legend") <- attr(x, "legend")
attr(res, "class") <- attr(x, "class")
return(res)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.