Nothing
#' Compute Any Possible Table
#'
#' @description
#' [any_table()] produces any possible descriptive table in 'Excel' format. Any number
#' of variables can be nested and crossed. The output is an individually styled
#' 'Excel' table, which also receives named ranges, making it easier to read the data back in.
#'
#' @param data_frame A data frame in which are the variables to tabulate.
#' @param rows A vector that provides single variables or variable combinations that
#' should appear in the table rows. To nest variables use the form:
#' "var1 + var2 + var3 + ...".
#' @param columns A vector that provides single variables or variable combinations that
#' should appear in the table rows. To nest variables use the form:
#' "var1 + var2 + var3 + ...".
#' @param values A vector containing all variables that should be summarised.
#' @param statistics Available functions:
#' - "sum" -> Weighted and unweighted sum
#' - "sum_wgt" -> Sum of all weights
#' - "freq" -> Unweighted frequency
#' - "freq_g0" -> Unweighted frequency of all values greater than zero
#' - "pct_group" -> Weighted and unweighted percentages within the respective group
#' - "pct_value" -> Weighted and unweighted percentages between value variables
#' - "pct_total" -> Weighted and unweighted percentages compared to the grand total
#' - "mean" -> Weighted and unweighted mean
#' - "median" -> Weighted and unweighted median
#' - "mode" -> Weighted and unweighted mode
#' - "min" -> Minimum
#' - "max" -> Maximum
#' - "sd" -> Weighted and unweighted standard deviation
#' - "variance" -> Weighted and unweighted standard variance
#' - "first" -> First value
#' - "last" -> Last value
#' - "pn" -> Weighted and unweighted percentiles (any p1, p2, p3, ... possible)
#' - "missing" -> Missings generated by the value variables
#' @param pct_group If pct_group is specified in the statistics, this option is used to
#' determine which variable of the row and column variables should add up to 100 %.
#' Multiple variables can be specified in a vector to generate multiple group percentages.
#' @param pct_value If pct_value is specified in the statistics, you can pass a list here
#' which contains the information for a new variable name and between which of the value
#' variables percentages should be computed.
#' @param formats A list in which is specified which formats should be applied to which variables.
#' @param by Compute tables stratified by the expressions of the provided variables.
#' @param weight Put in a weight variable to compute weighted results.
#' @param order_by Determine how the columns will be ordered. "values" orders the results by the
#' order you provide the variables in values. "stats" orders them by the order under statistics.
#' "values_stats" is a combination of both. "columns" keeps the order as given in columns
#' and "interleaved" alternates the stats.
#' @param titles Specify one or more table titles.
#' @param footnotes Specify one or more table footnotes.
#' @param var_labels A list in which is specified which label should be printed for
#' which variable instead of the variable name.
#' @param stat_labels A list in which is specified which label should be printed for
#' which statistic instead of the statistic name.
#' @param box Provide a text for the upper left box of the table.
#' @param workbook Insert a previously created workbook to expand the sheets instead of
#' creating a new file.
#' @param style A list of options can be passed to control the appearance of excel outputs.
#' Styles can be created with [excel_output_style()].
#' @param output The following output formats are available: excel and excel_nostyle.
#' @param pre_summed FALSE by default. If TRUE this function works with pre summarised data. This can be
#' used, if not all the needed results can be calculated by [any_table()] and need to be prepared in
#' advance. Enabling you to still make use of the styled tabulation. For this to work, the values have to
#' carry the statistic extension (e.g. "_sum", "_pct") in the variable name.
#' @param na.rm FALSE by default. If TRUE removes all NA values from the variables.
#' @param print TRUE by default. If TRUE prints the output, if FALSE doesn't print anything. Can be used
#' if one only wants to catch the output data frame and workbook with meta information.
#' @param monitor FALSE by default. If TRUE, outputs two charts to visualize the functions time consumption.
#'
#' @details
#' [any_table()] is based on the 'SAS' procedure Proc Tabulate, which provides
#' efficient and readable ways to perform complex tabulations.
#'
#' With this function you can combine any number of variables in any possible way, all
#' at once. You just define which variables or variable combinations should end up in
#' the table rows and columns with a simple syntax. Listing variables in a vector like
#' c("var1", "var2", "var3",...) means to put variables below (in case of the
#' row variables) or besides (in case of the column variables) each other. Nesting variables
#' is as easy as putting a plus sign between them, e.g. c("var1 + var2", "var2" + "var3" + "var4", etc.).
#' And of course you can combine both versions.
#'
#' The real highlight is, that this function not only creates all the desired variable
#' combinations and exports them to an 'Excel' file, it prints a fully custom styled
#' table to a workbook. Setting up a custom, reusable style is as easy as setting up
#' options like: provide a color for the table header, set the font size for the row header,
#' should borders be drawn for the table cells yes/no, and so on. Merging doubled header texts,
#' happens automatically.
#'
#' With this function you basically can fully concentrate on designing a table, instead of
#' thinking hard about how to calculate where to put a border or to even manually prepare
#' a designed workbook.
#'
#' @return
#' Returns a list with the data table containing the results for the table, the formatted
#' 'Excel' workbook and the meta information needed for styling the final table.
#'
#' @seealso
#' Creating a custom table style: [excel_output_style()], [modify_output_style()],
#' [number_format_style()], [modify_number_formats()].
#'
#' Creating formats: [discrete_format()] and [interval_format()].
#'
#' Functions that can handle formats and styles: [frequencies()], [crosstabs()].
#'
#' Additional functions that can handle styles: [export_with_style()]
#'
#' Additional functions that can handle formats: [summarise_plus()], [recode()],
#' [recode_multi()]
#'
#' @examples
#' # Example data frame
#' my_data <- dummy_data(1000)
#' my_data[["person"]] <- 1
#'
#' # Formats
#' age. <- discrete_format(
#' "Total" = 0:100,
#' "under 18" = 0:17,
#' "18 to under 25" = 18:24,
#' "25 to under 55" = 25:54,
#' "55 to under 65" = 55:64,
#' "65 and older" = 65:100)
#'
#' sex. <- discrete_format(
#' "Total" = 1:2,
#' "Male" = 1,
#' "Female" = 2)
#'
#' education. <- discrete_format(
#' "Total" = c("low", "middle", "high"),
#' "low education" = "low",
#' "middle education" = "middle",
#' "high education" = "high")
#'
#' # Define style
#' my_style <- excel_output_style(column_widths = c(2, 15, 15, 15, 9))
#'
#' # Define titles and footnotes. If you want to add hyperlinks you can do so by
#' # adding "link:" followed by the hyperlink to the main text.
#' titles <- c("This is title number 1 link: https://cran.r-project.org/",
#' "This is title number 2",
#' "This is title number 3")
#' footnotes <- c("This is footnote number 1",
#' "This is footnote number 2",
#' "This is footnote number 3 link: https://cran.r-project.org/")
#'
#' # Output complex tables with different percentages
#' my_data |> any_table(rows = c("sex + age", "sex", "age"),
#' columns = c("year", "education + year"),
#' values = weight,
#' statistics = c("sum", "pct_group"),
#' pct_group = c("sex", "age", "education", "year"),
#' formats = list(sex = sex., age = age.,
#' education = education.),
#' style = my_style,
#' na.rm = TRUE)
#'
#' # If you want to get a clearer vision of what the result table looks like, in terms
#' # of the row and column categories, you can write the code like this, to make out
#' # the variable crossings and see the order.
#' my_data |> any_table(columns = c( "year", "education + year"),
#' rows = c("sex + age",
#' "sex",
#' "age"),
#' values = weight,
#' statistics = c("sum", "pct_group"),
#' pct_group = c("sex", "age", "education", "year"),
#' formats = list(sex = sex., age = age.,
#' education = education.),
#' style = my_style,
#' na.rm = TRUE)
#'
#' # Percentages based on value variables instead of categories
#' my_data |> any_table(rows = c("age + year"),
#' columns = c("sex"),
#' values = c(probability, person),
#' statistics = c("pct_value", "sum", "freq"),
#' pct_value = list(rate = "probability / person"),
#' weight = weight,
#' formats = list(sex = sex., age = age.),
#' style = my_style,
#' na.rm = TRUE)
#'
#' # Customize the visual appearance by adding titles, footnotes and variable
#' # and statistic labels.
#' # Note: You don't have to describe every element. Sometimes a table can be more
#' # readable with less text. To completely remove a variable label just put in an
#' # empty text "" as label.
#' my_data |> any_table(rows = c("age + year"),
#' columns = c("sex"),
#' values = weight,
#' statistics = c("sum", "pct_group"),
#' order_by = "interleaved",
#' formats = list(sex = sex., age = age.),
#' titles = titles,
#' footnotes = footnotes,
#' var_labels = list(age = "Age categories",
#' sex = "", weight = ""),
#' stat_labels = list(pct = "%"),
#' style = my_style,
#' na.rm = TRUE)
#'
#' # With individual styling
#' my_style <- my_style |> modify_output_style(header_back_color = "0077B6",
#' font = "Times New Roman")
#'
#' my_data |> any_table(rows = c("age + year"),
#' columns = c("sex"),
#' values = c(probability, person),
#' statistics = c("pct_value", "sum", "freq"),
#' pct_value = list(rate = "probability / person"),
#' weight = weight,
#' formats = list(sex = sex., age = age.),
#' style = my_style,
#' na.rm = TRUE)
#'
#' # Pass on workbook to create more sheets in the same file
#' my_style <- my_style |> modify_output_style(sheet_name = "age_sex")
#'
#' result_list <- my_data |>
#' any_table(rows = c("age"),
#' columns = c("sex"),
#' values = weight,
#' statistics = c("sum"),
#' formats = list(sex = sex., age = age.),
#' style = my_style,
#' na.rm = TRUE,
#' print = FALSE)
#'
#' my_style <- my_style |> modify_output_style(sheet_name = "edu_year")
#'
#' my_data |> any_table(workbook = result_list[["workbook"]],
#' rows = c("education"),
#' columns = c("year"),
#' values = weight,
#' statistics = c("pct_group"),
#' formats = list(education = education.),
#' style = my_style,
#' na.rm = TRUE)
#'
#' # Output multiple complex tables by expressions of another variable.
#' # If you specify the sheet name as "by" in the output style, the sheet
#' # names are named by the variable expressions of the by-variable. Otherwise
#' # the given sheet named gets a running number.
#' my_style <- my_style |> modify_output_style(sheet_name = "by")
#'
#' my_data |> any_table(rows = c("sex", "age"),
#' columns = c("education + year"),
#' values = weight,
#' by = state,
#' statistics = c("sum", "pct_group"),
#' pct_group = c("education"),
#' formats = list(sex = sex., age = age., state = state.,
#' education = education.),
#' titles = titles,
#' footnotes = footnotes,
#' style = my_style,
#' na.rm = TRUE)
#'
#' @export
any_table <- function(data_frame,
rows,
columns = "",
values,
statistics = c("sum"),
pct_group = c(),
pct_value = list(),
formats = list(),
by = c(),
weight = NULL,
order_by = "stats",
titles = c(),
footnotes = c(),
var_labels = list(),
stat_labels = list(),
box = "",
workbook = NULL,
style = excel_output_style(),
output = "excel",
pre_summed = FALSE,
na.rm = FALSE,
print = TRUE,
monitor = FALSE){
# Measure the time
start_time <- Sys.time()
monitor_df <- NULL |> monitor_start("Error handling", "Preparation")
# First convert data frame to data table
if (!data.table::is.data.table(data_frame)){
data_frame <- data.table::as.data.table(data_frame)
}
# Evaluate formats early, otherwise apply formats can't evaluate them in unit
# test situation.
formats_list <- as.list(substitute(formats))[-1]
formats <- stats::setNames(
lapply(formats_list, function(expression){
# Catch expression if passed as string
if (is.character(expression)) {
tryCatch(get(expression, envir = parent.frame()),
error = function(e) NULL)
}
# Catch expression if passed as symbol
else{
tryCatch(eval(expression, envir = parent.frame()),
error = function(e) NULL)
}
}),
names(formats_list))
# Look up variable names in format data frame to check whether there is an
# interval or discrete format
flag_interval <- FALSE
for (current_var in names(formats)){
format_df <- formats[[current_var]]
interval_variables <- c("from", "to")
actual_variables <- names(format_df)[1:2]
if (identical(interval_variables, actual_variables)){
flag_interval <- TRUE
break
}
}
###########################################################################
# Error handling
###########################################################################
# Get row variables from provided combinations
row_vars <- unique(trimws(unlist(strsplit(rows, "\\+"))))
invalid_rows <- row_vars[!row_vars %in% names(data_frame)]
row_vars <- row_vars[row_vars %in% names(data_frame)]
if (length(invalid_rows) > 0){
message(" X ERROR: The provided row variable '", paste(invalid_rows, collapse = ", "), "' is not part of\n",
" the data frame. Any table will be aborted.")
return(invisible(NULL))
}
if (length(rows) == 0){
message(" X ERROR: No valid row variables provided. Any table will be aborted.")
return(invisible(NULL))
}
if (length(rows) == 1){
if (rows == ""){
message(" X ERROR: No valid row variables provided. Any table will be aborted.")
return(invisible(NULL))
}
}
# Get row variables from provided combinations
col_vars <- unique(trimws(unlist(strsplit(columns, "\\+"))))
invalid_columns <- col_vars[!col_vars %in% names(data_frame)]
col_vars <- col_vars[col_vars %in% names(data_frame)]
if (length(invalid_columns) > 0){
message(" X ERROR: The provided column variable '", paste(invalid_columns, collapse = ", "), "' is not part of\n",
" the data frame. Any table will be aborted.")
return(invisible(NULL))
}
if (length(columns) == 0){
message(" X ERROR: No valid column variables provided. Any table will be aborted.")
return(invisible(NULL))
}
if (length(columns) == 1){
if (columns == ""){
# Create empty pseudo variable to let the rest of the program run as normal
data_frame[[".temp.var"]] <- 1
columns <- ".temp.var"
col_vars <- ".temp.var"
var_labels <- c(var_labels, ".temp.var" = "")
formats[[".temp.var"]] <- suppressMessages(discrete_format(" " = 1))
}
}
# Convert to character vectors
by_temp <- sub("^list\\(", "c(", gsub("\"", "", deparse(substitute(by), width.cutoff = 500L)))
if (substr(by_temp, 1, 2) == "c("){
by <- as.character(substitute(by))
}
else if (!is_error(by)){
# Do nothing. In this case variables already contains the substituted variable names
# while variables_temp is evaluated to the symbol passed into the function.
}
else{
by <- by_temp
}
# Remove extra first character created with substitution
by <- by[by != "c"]
provided_by <- by
invalid_by <- by[!by %in% names(data_frame)]
by <- by[by %in% names(data_frame)]
if (length(invalid_by) > 0){
message(" ! WARNING: The provided by variable '", paste(invalid_by, collapse = ", "), "' is not part of\n",
" the data frame. This variable will be omitted during computation.")
}
variables <- c(row_vars, col_vars)
invalid_by <- by[by %in% variables]
if (length(invalid_by) > 0){
message(" X ERROR: The provided by variable '", paste(invalid_by, collapse = ", "), "' is also part of\n",
" the row and column variables which is not allowed. Any table will be aborted.")
return(invisible(NULL))
}
if (length(by) == 1){
if (by == ""){
message(" X ERROR: No valid by variables provided. Any table will be aborted.")
return(invisible(NULL))
}
}
# Create temporary weight column if none is provided.
# Also get the name of the weight variable as string.
weight_temp <- sub("^list\\(", "c(", gsub("\"", "", deparse(substitute(weight), width.cutoff = 500L)))
if (weight_temp == "NULL" || substr(weight_temp, 1, 2) == "c("){
weight_var <- ".temp_weight"
data_frame[[".temp_weight"]] <- 1
if (substr(weight_temp, 1, 2) == "c("){
message(" ! WARNING: Only one variable for weight allowed. Evaluations will be unweighted.")
}
}
else if (!is_numeric(data_frame[[weight_temp]])){
weight_var <- ".temp_weight"
data_frame[[".temp_weight"]] <- 1
message(" ! WARNING: Provided weight variable is not numeric. Unweighted results will be computed.")
}
else{
weight_var <- weight_temp
# NA values in weight lead to errors therefor convert them to 0
if (anyNA(data_frame[[weight_temp]])){
message(" ~ NOTE: Missing values in weight variable '", weight_temp, "' will be converted to 0.")
}
data_frame[[weight_temp]] <- data.table::fifelse(is.na(data_frame[[weight_temp]]), 0, data_frame[[weight_temp]])
# @Hack: so I don't have to check if .temp_weight exists later on
data_frame[[".temp_weight"]] <- 1
}
# Convert to character vectors
values_temp <- sub("^list\\(", "c(", gsub("\"", "", deparse(substitute(values), width.cutoff = 500L)))
if (substr(values_temp, 1, 2) == "c("){
values <- as.character(substitute(values))
}
else if (!is_error(values)){
# Do nothing. In this case values already contains the substituted variable names
# while values_temp is evaluated to the symbol passed into the function.
}
else{
values <- values_temp
}
# Remove extra first character created with substitution
values <- values[values != "c"]
# If no value variables are provided abort
if (length(values) == 0){
message(" X ERROR: No values provided.")
return(invisible(NULL))
}
else if (length(values) == 1){
if (values == ""){
message(" X ERROR: No values provided.")
return(invisible(NULL))
}
}
# Make sure there is no class variable that is also a value variable.
invalid_class <- values[values %in% c(row_vars, col_vars)]
values <- values[!values %in% c(row_vars, col_vars)]
if (length(invalid_class) > 0){
message(" x ERROR: The provided row/column variable '", paste(invalid_class, collapse = ", "), "' is also part of\n",
" the analysis variables. Any table will be aborted.")
return(invisible(NULL))
}
invalid_values <- values[!values %in% names(data_frame)]
values <- values[values %in% names(data_frame)]
if (length(invalid_values) > 0){
message(" ! WARNING: The provided analysis variable '", paste(invalid_values, collapse = ", "), "' is not part of\n",
" the data frame. This variable will be omitted during computation.")
}
if (length(values) == 0){
message(" X ERROR: No valid analysis variables provided. Any table will be aborted.")
return(invisible(NULL))
}
provided_values <- values
values <- unique(values)
if (length(provided_values) > length(values)){
message(" ! WARNING: Some analysis variables are provided more than once. The doubled entries will be omitted.")
}
# Check for invalid output option
if (!tolower(output) %in% c("excel", "excel_nostyle")){
message(" ! WARNING: Output format '", output, "' not available. Using 'excel' instead.")
output <- "excel"
}
else{
output <- tolower(output)
}
# Correct nesting option if not set right
if (!tolower(order_by) %in% c("values", "stats", "values_stats", "columns", "interleaved")){
message(" ! WARNING: Order by option '", order_by, "' doesn't exist. Options 'values', 'stats', 'values_stats', 'columns'\n",
" and 'interleaved' are available. Order by will be set to 'stats'.")
order_by <- "stats"
}
# Remove missing variables from pct_group
if ("pct_group" %in% tolower(statistics)){
invalid_pct <- pct_group[!pct_group %in% c(row_vars, col_vars)]
if (length(invalid_pct) > 0){
message(" ! WARNING: The variable '", paste(invalid_pct, collapse = ", "), "' provided as pct_group is not part of the row and column variables.\n",
" The variable will be omitted.")
pct_group <- pct_group[pct_group %in% c(row_vars, col_vars)]
}
rm(invalid_pct)
}
# In case of using a pre summarised data frame, underscores are only allowed if they carry
# the statistics extension afterwards.
if (pre_summed){
if (!"TYPE" %in% names(data_frame)){
message(" X ERROR: The pre summarised data needs the TYPE variable generated by summarise_plus. Any table will be aborted.")
return(invisible(NULL))
}
# Check if value variables have statistics extension
extensions <- c("_sum", "_pct_group", "_pct_total", "_pct_value", "_pct", "_freq_g0",
"_freq", "_mean", "_median", "_mode", "_min", "_max", "_first",
"_last", "_sum_wgt", "_p[0-9]+$", "_sd", "_variance", "_missing")
pattern <- paste0("(", paste(extensions, collapse = "|"), ")$")
# If one of the value variables hasn't got any of the above extension abort
if (!all(grepl(pattern, values))){
message(" X ERROR: All value variables need to have the statistic extensions in their variable names.\n",
" Execution will be aborted.")
return(invisible(NULL))
}
# Set up options to make sure nothing errors below. Statistics is set to "mean"
# because then summarise_plus takes a route where factor variables are kept in order.
# With "sum" the order would messed up.
statistics <- "mean"
pct_group <- ""
pct_value <- ""
weight <- NULL
formats <- list()
rm(extensions, pattern)
}
rm(invalid_by, invalid_class, invalid_columns, invalid_rows, invalid_values,
provided_by, provided_values, weight_temp, values_temp, by_temp)
###########################################################################
# Any tabulation starts
###########################################################################
monitor_df <- monitor_df |> monitor_next("Summary", "Summary")
message("\n > Computing stats.")
# Put together vector of grouping variables
group_vars <- c(by, variables)
# In case of group percentages order group variable to the last position
if ("pct_group" %in% tolower(statistics) && length(pct_group) > 0){
group_vars <- c(setdiff(group_vars, pct_group[1]), pct_group[1])
}
# If pct_group is specified in statistics but no group is provided set
# last variable of group_vars as standard.
else if ("pct_group" %in% tolower(statistics) && length(pct_group) == 0){
pct_group <- group_vars[length(group_vars)]
}
# Put combinations in a single vector
combinations <- as.vector(outer(rows, columns, paste, sep = " + "))
# In case by variables are specified, add by to group variables and build
# additional combinations
if (length(by) > 0){
combinations <- as.vector(outer(by, combinations, paste, sep = "+"))
}
# Compute statistics
if (!pre_summed){
any_tab <- suppressMessages(data_frame |>
summarise_plus(class = group_vars,
values = values,
statistics = statistics,
formats = formats,
weight = weight_var,
nesting = "all",
types = combinations,
notes = FALSE,
na.rm = na.rm)) |>
rename_pattern("pct_group", paste0("pct_group_", pct_group[1])) |>
collapse::fsubset(TYPE != "total")
}
else{
# With pre summarised data just take the input data frame
any_tab <- data_frame
}
if (is.null(any_tab)){
message(" X ERROR: Any table could not be computed. Execution will be aborted.")
return(invisible(NULL))
}
# Underscores are not allowed in column variables because when constructing the
# table header later, the underscore is the sign by which the column names are split.
# Having additional underscores would mess up this part and lead to errors.
if (any(grepl("_", unlist(any_tab[col_vars])))){
message(" X ERROR: No underscores allowed in column variable values. Execution will be aborted.")
return(invisible(NULL))
}
# In case of pre summarised data frame remove the temporary weight variable
if (pre_summed){
any_tab <- any_tab |> dropp(".temp_weight")
}
# In case of by variables fuse them into one
if (length(by) > 0){
any_tab <- any_tab |> fuse_variables("by_vars", by)
# Normally extract the first variable from the TYPE combination into a
# separate BY variable.
if (!pre_summed){
any_tab[["BY"]] <- sub("\\+.*", "", any_tab[["TYPE"]])
any_tab[["TYPE"]] <- sub("^[^+]*\\+\\s*", "", any_tab[["TYPE"]])
}
# In case of pre summarised data frame it is not clear, at which position
# the by variable is in the TYPE.
else{
# Get TYPE variable as list of variable combinations
type_split <- strsplit(any_tab[["TYPE"]], "\\+")
# If by variable was found in a variable combination, put it into
# a separate variable.
any_tab[["BY"]] <- vapply(type_split, function(obs){
by_found <- obs[obs %in% by]
if (length(by_found) > 0){
by_found[1]
}
else{
""
}
}, character(1L))
# The TYPE variable itself is put back together as all variables which
# are no by-Variables.
any_tab[["TYPE"]] <- vapply(type_split, function(obs){
paste(obs[!obs %in% by], collapse = "+")
}, character(1L))
}
}
# In case multiple group percentages should be computed, evaluate them in a loop
# and join them to the main data frame.
if (length(pct_group) > 1){
monitor_df <- monitor_df |> monitor_next("Additional group pct", "Summary")
for (group in seq_along(pct_group)){
# First group was computed before so omit it here
if (group == 1){
next
}
# In case of group percentages order group variable to the last position
group_vars <- c(setdiff(group_vars, pct_group[group]), pct_group[group])
# Compute group percentages
group_tab <- suppressMessages(data_frame |>
summarise_plus(class = group_vars,
values = values,
statistics = "pct_group",
formats = formats,
weight = weight_var,
nesting = "all",
types = combinations,
notes = FALSE,
na.rm = na.rm)) |>
rename_pattern("pct_group", paste0("pct_group_", pct_group[group])) |>
drop_type_vars()
merge_vars <- variables
if (length(by) > 0){
group_tab <- group_tab |> fuse_variables("by_vars", by)
merge_vars <- c("by_vars", variables)
}
# Join percentages to the main data frame
any_tab <- any_tab |>
collapse::join(group_tab, on = merge_vars, how = "left",
verbose = FALSE, overid = 2)
}
rm(group, group_tab)
}
# In case only pct_value was selected as statistic
if ("pct_value" %in% tolower(statistics) && length(statistics) == 1){
message(" X ERROR: pct_value can only be computed in combination with statistic\n",
" 'sum'. Since no other statistic is provided any table will be aborted.")
return(invisible(NULL))
}
# In case percentages based on value variables should be computed
else if ("pct_value" %in% tolower(statistics) && length(pct_value) > 0){
for (i in seq_along(pct_value)){
value <- pct_value[[i]]
name <- names(pct_value)[i]
# Separate provided variables first
eval_vars <- trimws(strsplit(value, split = "/")[[1]])
# Compute percentages
if (paste0(eval_vars[1], "_sum") %in% names(any_tab) &&
paste0(eval_vars[2], "_sum") %in% names(any_tab)){
any_tab[[paste0(name, "_pct_value")]] <-
any_tab[[paste0(eval_vars[1], "_sum")]] * 100 /
any_tab[[paste0(eval_vars[2], "_sum")]]
}
# Without sum percentages can't be computed
else{
message(" ! WARNING: pct_value can only be computed in combination with statistic\n",
" 'sum'. Percentages for '", name, "' could not be evaluated.")
# Additional warnings for missing variables
if (!eval_vars[1] %in% names(data_frame)){
message(" ! WARNING: Variable '", eval_vars[1], "' not found in the data frame.")
}
if (!eval_vars[2] %in% names(data_frame)){
message(" ! WARNING: Variable '", eval_vars[2], "' not found in the data frame.")
}
}
}
rm(eval_vars, i, value)
}
rm(data_frame)
# Reorder variables according to statistics. This is necessary because pct_value
# can only be computed after summarise_plus and therefor isn't ordered.
any_tab <- any_tab |> setcolorder_by_pattern(statistics)
# Get value variable names
if (length(by) == 0){
value_vars <- any_tab |> inverse(c(variables, "TYPE", "TYPE_NR", "DEPTH"))
}
else{
value_vars <- any_tab |> inverse(c(variables, "TYPE", "TYPE_NR", "DEPTH", "by_vars", "BY"))
}
# Round values according to number formats
for (var_name in names(any_tab)){
if (!var_name %in% value_vars){
next
}
# Get stat from variable name
stat <- strsplit(var_name, split = "_")[[1]]
stat <- stat[length(stat)]
# Round values to the decimals places specified in the style
if (tolower(stat) %in% c("sum", "freq", "freq", "mean", "median", "mode",
"min", "max")){
any_tab[[var_name]] <- round(any_tab[[var_name]],
style[["number_formats"]][[paste0(stat, "_decimals")]])
}
else if(stat == "g0"){
any_tab[[var_name]] <- round(any_tab[[var_name]],
style[["number_formats"]][["freq_decimals"]])
}
else if(stat == "wgt"){
any_tab[[var_name]] <- round(any_tab[[var_name]],
style[["number_formats"]][["sum_decimals"]])
}
else if(grepl("^[0-9]$", substr(stat, 2, 2))){
any_tab[[var_name]] <- round(any_tab[[var_name]],
style[["number_formats"]][["p_decimals"]])
}
else{
any_tab[[var_name]] <- round(any_tab[[var_name]],
style[["number_formats"]][["pct_decimals"]])
}
}
# Tear apart the the summarised data frame by row and column combinations and
# transpose columns to generate user defined combination. At the end put all
# the pieces back together to form a fully printable result data frame.
monitor_df <- monitor_df |> monitor_next("Transform table", "Transform")
part_combi_list <- list()
header_combi_list <- list()
col_header_df <- list()
row_header_dimensions <- list()
last_number_of_rows <- 0
by_division <- 1
if (length(by) > 0){
by_division <- length(unique(any_tab[["by_vars"]]))
}
# Underscores are not allowed in column and values variables, because when constructing the
# table header later, the underscore is the symbol by which the variable names are split.
# Having additional underscores would mess up this part and lead to errors. Therefor the
# additional underscores will be temporarily replaced.
extensions <- c("_sum", "_pct_group_", "_pct_total", "_pct_value", "_pct", "_freq_g0",
"_freq", "_mean", "_median", "_mode", "_min", "_max", "_first",
"_last", "_p1", "_p2", "_p3", "_p4", "_p5", "_p6", "_p7", "_p8", "_p9",
"sum_wgt", "_sd", "_variance", "_missing", "by_vars", "TYPE_NR")
rows <- replace_except(rows, "_", "!!!", extensions)
columns <- replace_except(columns, "_", "!!!", extensions)
value_vars <- replace_except(value_vars, "_", "!!!", extensions)
value_sort <- replace_except(values, "_", "!!!", extensions)
names(any_tab) <- replace_except(names(any_tab), "_", "!!!", extensions)
any_tab[["TYPE"]] <- replace_except(any_tab[["TYPE"]], "_", "!!!", extensions)
# Sort type alphabetically to make finding the right combination easier
any_tab[["TYPE"]] <- reorder_combination(any_tab[["TYPE"]])
# Get number of column header variables by getting the maximum number of + signs in the
# column variables.
max_plus <- max(sapply(gregexpr("\\+", columns), function(var_to_test) {
if (var_to_test[1] == -1){
1
}
else{
length(var_to_test) + 1
}}))
index <- 1
any_header <- NULL
for (row_combi in rows){
combined_col_df <- NULL
# Get current single variables from row combination
row_combi_vars <- unique(trimws(unlist(strsplit(row_combi, "\\+"))))
for (col_combi in columns){
# Get current single variables from column combination
col_combi_vars <- unique(trimws(unlist(strsplit(col_combi, "\\+"))))
current_combi <- c(row_combi_vars, col_combi_vars)
# Sort combination alphabetically
sorted_combi <- paste(sort(current_combi), collapse = "+")
# Keep only necessary variables
if (length(by) > 0){
current_combi <- c("BY", "by_vars", current_combi)
}
combi_df <- any_tab |> keep(current_combi, "TYPE", value_vars)
# Combine variables to match TYPE variable and subset data frame by
# current combination
subset_type <- paste(sorted_combi, collapse = "+")
# Convert column variables to factor if necessary to retain value order
# after sorting
for (col_var in col_combi_vars){
if (is.character(combi_df[[col_var]])){
# Extract the number of labels from variable
label_levels <- combi_df[[col_var]] |>
unlist(use.names = FALSE) |>
unique() |>
stats::na.omit()
# Convert variable to factor
combi_df[[col_var]] <- factor(
combi_df[[col_var]],
levels = label_levels,
ordered = TRUE)
}
}
# Sort to have the correct order after pivoting
combi_df <- combi_df |>
collapse::fsubset(TYPE == subset_type) |>
data.table::setcolorder(current_combi, before = 1) |>
data.table::setorderv(col_combi_vars, na.last = TRUE) |>
data.table::setorderv(row_combi_vars)
# Rename the row variables to something neutral. In the next steps the
# data frame is puzzled back together, but the thing is, that the row
# variables can be individually in any order in the result table. If
# this step would be omitted the variables would all be in a fixed order
# in the result table.
new_row_names <- paste0("var", seq_along(row_combi_vars))
if (length(by) == 0){
names(combi_df)[seq_along(row_combi_vars)] <- new_row_names
id_vars <- new_row_names
}
else{
end_col <- length(row_combi_vars) + 2
names(combi_df)[3:end_col] <- new_row_names
id_vars <- new_row_names
id_vars <- c("BY", "by_vars", new_row_names)
}
# If there is only one value provided and only one statistic selected
# the pivoted variable names below only receive the column expressions
# as names. The needed format is "value_stat_expression". In the mentioned
# case this format is pre computed.
if (length(values) == 1 && length(statistics) == 1){
value_stat <- names(combi_df[ncol(combi_df)])
combi_df[[col_combi_vars[1]]] <-
paste0(value_stat, "_", combi_df[[col_combi_vars[1]]])
}
# Fallback check if the user happens to input only wrong statistics but one.
# In this case statistics is longer than 1 even though only one statistic was
# computed. This leads to the condition above being omitted.
else if (ncol(combi_df) - length(c(id_vars, col_combi_vars, "TYPE")) == 1){
value_stat <- names(combi_df[ncol(combi_df)])
combi_df[[col_combi_vars[1]]] <-
paste0(value_stat, "_", combi_df[[col_combi_vars[1]]])
}
# Pivot to wider format, which basically is the final format to print the data
combi_df <- combi_df |>
collapse::pivot(id = id_vars,
names = col_combi_vars,
values = value_vars,
how = "wider")
combi_df[id_vars] <- lapply(combi_df[id_vars], as.character)
# Replace NA values with text so that they can be differentiated from empty
# row header columns later on.
row_var_cols <- seq_along(id_vars)
combi_df[, row_var_cols][is.na(combi_df[, row_var_cols])] <- style[["na_symbol"]]
# Sort interleaved
if (tolower(order_by) == "interleaved"){
combi_df <- combi_df |> order_interleaved(statistics)
}
# Join different column results together
if (is.null(combined_col_df)){
row_labels <- c()
# Loop through all provided labels
for (variable in row_combi_vars){
# Revert back underscores
variable <- gsub("!!!", "_", variable)
# If there are no labels specified add the variable names as labels
if (!variable %in% names(var_labels)){
row_labels <- c(row_labels, variable)
next
}
# Replace stat texts with provided labels
row_labels <- c(row_labels, var_labels[[variable]])
}
# Insert labels as new variable and sort it to the front
if (!all(row_labels == "")){
combi_df[["row.label"]] <- paste(row_labels, collapse = " / ")
}
# In case only empty labels were provided, just add empty character
# so that the column can be identified as empty later on.
else{
combi_df[["row.label"]] <- " "
}
# Order row label to the front
if (length(by) == 0){
combi_df <- combi_df |>
data.table::setcolorder("row.label", before = 1)
}
else{
combi_df <- combi_df |>
data.table::setcolorder(c("BY", "by_vars", "row.label"), before = 1)
}
new_row_names <- c("row.label", new_row_names)
# First iteration
combined_col_df <- combi_df
}
# Following iterations
else{
combi_df <- suppressMessages(combi_df |>
dropp(new_row_names, "row.label", "BY", "by_vars"))
# Add row header variable
new_row_names <- c("row.label", new_row_names)
# Check for duplicate variable names. If any duplicate is found abort.
duplicates <- intersect(names(combined_col_df), names(combi_df))
if (length(duplicates) > 0) {
message(" X ERROR: Duplicate column names found: ", paste(duplicates, collapse = ", "), ".\n",
" If you are working with original values, consider making them unique by using formats.")
return(invisible(NULL))
}
# cbind current data frame to the iterations before
combined_col_df <- cbind(combined_col_df, combi_df)
}
# Build variable name table header for later use during formatting.
# Column header only needs to be built once because other iterations
# would just produce the same header.
if (index == 1){
# Get data frame variable names
col_header_df <- combi_df[0, ]
# Fill header rows with column variable names
for (var_index in seq_along(col_combi_vars)){
col_header_df[var_index, ] <- gsub("!!!", "_", col_combi_vars[[var_index]])
}
# If the header has fewer rows than the maximum header rows, fill up the header
# with additional empty rows.
header_diff <- max_plus - nrow(col_header_df)
if (header_diff > 0){
col_header_df <- rbind(col_header_df,
stats::setNames(data.table::as.data.table(
matrix("", header_diff, ncol(col_header_df))),
names(col_header_df)))
}
# First iteration
if (is.null(any_header)){
# Remove row header columns from column header data frame
row_header_var_count <- length(id_vars) + 1
any_header <- col_header_df[, -(1:row_header_var_count), drop = FALSE]
}
# Following iterations
else{
# cbind current header data frame to the iterations before
any_header <- cbind(any_header, col_header_df)
}
}
}
# Store combined column data frame in a list for later rbind
part_combi_list[[row_combi]] <- combined_col_df
# Get table dimensions for later use during formatting.
name <- paste0("header_end_", paste(row_combi_vars, collapse = "%%%"))
last_number_of_rows <- (nrow(combined_col_df) / by_division) + last_number_of_rows
row_header_dimensions[[name]] <- last_number_of_rows
name <- paste0("header_size_", paste(row_combi_vars, collapse = "%%%"))
row_header_dimensions[[name]] <- length(row_combi_vars)
index <- index + 1
}
# Put all computed data frames below each other to form a final result data frame
any_tab <- data.table::rbindlist(part_combi_list, fill = TRUE)
# Reorder variables according to statistics
if (tolower(order_by) == "stats" || tolower(order_by) == "values_stats"){
any_tab <- any_tab |> setcolorder_by_pattern(statistics)
any_header <- any_header |> setcolorder_by_pattern(statistics)
}
# Reorder variables by provided values
if (tolower(order_by) == "values" || tolower(order_by) == "values_stats"){
any_tab <- any_tab |> setcolorder_by_pattern(value_sort)
any_header <- any_header |> setcolorder_by_pattern(value_sort)
}
# After binding together the data frames it can happen, that some of the new var
# variables end up at the end of the data frame instead of the front. Therefor
# these columns need to be ordered to the front for safety.
ordered_cols <- grep("^var[0-9]+$", names(any_tab), value = TRUE)
any_tab <- any_tab |> data.table::setcolorder(c("row.label", ordered_cols), before = 1)
# If all row labels are empty, delete the row header column
if (all(any_tab[["row.label"]] == " ")){
any_tab <- any_tab |> dropp("row.label")
}
# If only some labels are empty the row label column is printed. Problem: the cell
# merging omits empty cells. Therefor convert empty cells into cells with a space
# to merge the empty parts correctly.
else{
any_tab["row.label" == ""] <- " "
}
# Get number of row header variables by getting the maximum number of + signs in the
# row variables.
max_plus <- max(sapply(gregexpr("\\+", rows), function(var_to_test) {
if (var_to_test[1] == -1){
1
}
else{
length(var_to_test) + 1
}}))
length_row_header <- max_plus + 1
if (length(by) != 0){
length_row_header <- length_row_header + 2
}
# Mark empty row header cells
row_var_cols <- 1:length_row_header
any_tab[, row_var_cols][is.na(any_tab[, row_var_cols])] <- ""
# In between clean up to get a better overview
rm(combi_df, combined_col_df, part_combi_list, col_combi, col_combi_vars,
combinations, current_combi, current_var, flag_interval, index,
last_number_of_rows, name, new_row_names, row_combi, row_combi_vars, sorted_combi,
subset_type, group_vars, length_row_header, col_header_df, header_diff,
row_header_var_count)
# Grab all information, which is necessary to format the workbook. This list will be
# returned at the end and can be grabbed by the workbook combine function.
meta <- mget(c("rows", "columns", "statistics",
"by", "titles", "footnotes", "var_labels", "stat_labels",
"box", "any_header", "row_header_dimensions",
"style", "na.rm"))
# Prepare table format for output
monitor_df <- monitor_df |> monitor_next("Excel prepare", "Format")
message(" > Formatting tables.")
# Setup styling in new workbook if no other is provided
if (is.null(workbook)){
workbook <- openxlsx2::wb_workbook() |>
prepare_styles(style)
}
# Update style options in provided workbook
else{
workbook <- workbook |>
prepare_styles(style)
}
monitor_df <- monitor_df |> monitor_end()
# In case no by variables are provided
if (length(by) == 0){
wb_list <- format_any_excel(workbook, any_tab, rows, columns, statistics,
by, titles, footnotes, var_labels, stat_labels,
box, any_header,
style, output, monitor_df = monitor_df)
wb <- wb_list[[1]]
monitor_df <- wb_list[[2]]
}
# In case there are by variables are provided
else{
wb_list <- format_any_by_excel(workbook, any_tab, rows, columns, statistics,
by, titles, footnotes, var_labels, stat_labels,
box, any_header,
style, output, na.rm, monitor_df)
wb <- wb_list[[1]]
monitor_df <- wb_list[[2]]
}
# Output formatted table into different formats
if (print){
monitor_df <- monitor_df |> monitor_next("Output tables", "Output tables")
if (is.null(style[["file"]])){
if(interactive()){
wb$open()
}
}
else{
wb$save(file = style[["file"]], overwrite = TRUE)
}
}
end_time <- round(difftime(Sys.time(), start_time, units = "secs"), 3)
message("\n- - - 'any_table' execution time: ", end_time, " seconds\n")
monitor_df <- monitor_df |> monitor_end()
monitor_df |> monitor_plot(draw_plot = monitor)
invisible(list("table" = any_tab,
"workbook" = wb,
"meta" = meta))
}
###############################################################################
# Format any table for excel output
###############################################################################
#' Format Any Table Output (Excel Based)
#'
#' @description
#' Format any table with the provided row and column variables. Statistics
#' can be anything available.
#'
#' @param wb An already created workbook to add more sheets to.
#' @param any_tab The data frame which contains the information for this cross
#' table.
#' @param rows The variable that appears in the table rows.
#' @param columns The variable that appears in the table columns.
#' @param statistics The user requested statistics.
#' @param by Separate the cross table output by the expressions of the provided variables.
#' @param titles Character vector of titles to display above the table.
#' @param footnotes Character vector of footnotes to display under the table.
#' @param var_labels List which contains column variable names and their respective labels.
#' @param stat_labels List which contains statistic names and their respective labels.
#' @param box The text that should appear in the upper left box of the table.
#' @param any_header The column header carrying the variable names.
#' @param style A list containing the styling elements.
#' @param output Determines whether to style the output or to just quickly paste
#' the data.
#' @param by_info Text which contains the information which variable with which
#' expression is computed at the moment.Used for computation with by variables.
#' @param index Index of the current variable expression. Used for computation with
#' by variables.
#' @param monitor_df Data frame which stores the monitoring values.
#'
#' @return
#' Returns a list containing a formatted Excel workbook as well as the monitoring
#' data frame.
#'
#' @noRd
format_any_excel <- function(wb,
any_tab,
rows,
columns,
statistics,
by,
titles,
footnotes,
var_labels,
stat_labels,
box,
any_header,
style,
output,
by_info = NULL,
index = NULL,
monitor_df){
monitor_df <- monitor_df |> monitor_start("Excel prepare", "Format")
# Cut down percentage names to just "pct"
names(any_tab) <- gsub("pct_group_", "pct group ", names(any_tab))
names(any_tab) <- gsub("pct_total", "pct total", names(any_tab))
names(any_tab) <- gsub("pct_value", "pct value", names(any_tab))
# Replace underscore in the following stats to preserve them
names(any_tab) <- gsub("sum_wgt", "weight_sum.wgt", names(any_tab))
names(any_tab) <- gsub("freq_g0", "freq.g0", names(any_tab))
# Build header from variable names
multi_header <- build_multi_header(names(any_tab), any_header, var_labels, style)
# Remove empty statistics rows, but keep multi_header with statistics because the information
# is needed below for applying the correct number formats.
column_header <- multi_header |> set_statistic_labels(stat_labels)
column_header <- column_header[rowSums(column_header == "") != ncol(column_header), , drop = FALSE]
stats_row <- multi_header[nrow(multi_header), , drop = FALSE]
# Get table ranges
any_ranges <- get_any_tab_ranges(any_tab, column_header, stats_row,
titles, footnotes, style)
# Add empty columns to the header for the top left box at the beginning
blank_columns <- matrix("", nrow = nrow(column_header), ncol = any_ranges[["cat_col.width"]])
column_header <- cbind(blank_columns, column_header)
# Add box text
if (box != ""){
column_header[1, 1] <- box
}
# If no box text provided put in variable names of row headers
else{
column_header[1, 1] <- gsub("!!!", "_", paste(rows, collapse = "\n"))
}
# If function is called with by variables the sheet names have to be differentiated
# and by info has to be written above the table.
if (!is.null(by_info)){
if (style[["sheet_name"]] == "by"){
wb$add_worksheet(by,
grid_lines = style[["grid_lines"]])
}
else{
wb$add_worksheet(paste0(style[["sheet_name"]], index),
grid_lines = style[["grid_lines"]])
}
}
else{
wb$add_worksheet(style[["sheet_name"]], grid_lines = style[["grid_lines"]])
}
# Rename the following stats back to match number formats in style element
names(any_tab) <- gsub("sum.wgt", "sum_wgt", names(any_tab))
names(any_tab) <- gsub("freq.g0", "freq_g0", names(any_tab))
# Add table data and format according to style options
monitor_df <- monitor_df |> monitor_next("Excel data", "Format")
wb$add_data(x = any_tab,
start_col = style[["start_column"]],
start_row = any_ranges[["table.row"]],
col_names = FALSE,
na.strings = style[["na_symbol"]])
# Add column header above table.
wb$add_data(x = column_header,
start_col = style[["start_column"]],
start_row = any_ranges[["header.row"]],
col_names = FALSE)
# Format titles and footnotes if there are any
monitor_df <- monitor_df |> monitor_next("Excel titles/footnotes", "Format")
wb <- wb |>
format_titles_foot_excel(titles, footnotes, any_ranges, style, output)
# Only do the formatting when user specified it. With the excel_nostyle
# option this whole part gets omitted to get a very quick unformatted
# excel output.
if (output == "excel"){
# Merge top left box
wb$merge_cells(dims = any_ranges[["box_range"]])
# Merge column and row headers
monitor_df <- monitor_df |> monitor_next("Excel format col headers", "Format")
wb <- wb |>
handle_col_header_merge(column_header[, -c(1:any_ranges[["cat_col.width"]]), drop = FALSE], any_ranges)
monitor_df <- monitor_df |> monitor_next("Excel format row headers", "Format")
wb <- wb |>
handle_row_header_merge(any_tab[, 1:any_ranges[["cat_col.width"]]], any_ranges)
# Style table
monitor_df <- monitor_df |> monitor_next("Excel cell styles", "Format")
wb <- wb |> handle_cell_styles(any_ranges, style)
monitor_df <- monitor_df |> monitor_next("Excel number formats", "Format")
# Set up inner table number formats
col_index <- 1
chunks <- sub("p[0-9]+$", "p", rle(stats_row)$values)
for (type in chunks){
wb$add_cell_style(dims = any_ranges[[paste0("any_col_ranges", col_index)]],
apply_number_format = TRUE,
num_fmt_id = wb$styles_mgr$get_numfmt_id(paste0(type, "_numfmt")))
col_index <- col_index + 1
}
# Freeze headers. If both options are true they have to be set together, otherwise one
# option would overwrite the other.
if (style[["freeze_col_header"]] && style[["freeze_row_header"]]){
wb$freeze_pane(first_active_col = any_ranges[["header.column"]] + any_ranges[["cat_col.width"]],
first_active_row = any_ranges[["table.row"]])
}
else if (style[["freeze_col_header"]]){
wb$freeze_pane(first_active_col = any_ranges[["header.column"]] + any_ranges[["cat_col.width"]])
}
else if (style[["freeze_row_header"]]){
wb$freeze_pane(first_active_row = any_ranges[["table.row"]])
}
# Adjust table dimensions
monitor_df <- monitor_df |> monitor_next("Excel widths/heights", "Format")
wb <- wb |> handle_col_row_dimensions(any_ranges,
ncol(any_tab) + (style[["start_column"]] - 1),
nrow(any_tab) + nrow(multi_header) + (style[["start_row"]] - 1),
style) |>
handle_any_auto_dimensions(any_ranges, style) |>
handle_header_table_dim(any_ranges, style)
wb$add_ignore_error(dims = any_ranges[["header_range"]], number_stored_as_text = TRUE)
wb$add_ignore_error(dims = any_ranges[["cat_col_range"]], number_stored_as_text = TRUE)
wb$add_named_region(dims = any_ranges[["whole_tab_range"]], name = "table", local_sheet = TRUE)
wb$add_named_region(dims = any_ranges[["table_range"]], name = "data", local_sheet = TRUE)
}
monitor_df <- monitor_df |> monitor_end()
# Return workbook
list(wb, monitor_df)
}
#' Insert Statistic Labels
#'
#' @description
#' Give the statistics in the column header a custom label.
#'
#' @param column_header The complete column multi header.
#' @param stat_labels A list in which is specified which label should be printed for
#' which statistic instead of the statistic name.
#'
#' @return
#' Returns a multi layered column header with replaced statistic texts.
#'
#' @noRd
set_statistic_labels <- function(column_header, stat_labels){
if (length(stat_labels) == 0){
return(column_header)
}
# Loop through all provided labels
for (i in seq_along(stat_labels)){
name <- names(stat_labels)[i]
label <- stat_labels[[i]]
# Omit label with missing variable name
if (is.null(name) || name == ""){
next
}
# Replace stat texts with provided labels
column_header[nrow(column_header), ] <- gsub(name, label, column_header[nrow(column_header), ])
}
column_header
}
#' Insert Variable Labels
#'
#' @description
#' Give the variables in the column header a custom label.
#'
#' @param column_header The complete column multi header.
#' @param var_labels A list in which is specified which label should be printed for
#' which variable instead of the variable name.
#'
#' @return
#' Returns a multi layered column header with replaced variable texts.
#'
#' @noRd
set_col_variable_labels <- function(column_header, var_labels){
if (length(var_labels) == 0){
return(column_header)
}
# Loop through all provided labels
for (i in seq_along(var_labels)){
name <- names(var_labels)[i]
label <- var_labels[[i]]
# Omit label with missing variable name
if (is.null(name) || name == ""){
next
}
# Replace variable texts with provided labels
column_header[,] <- gsub(name, label, as.matrix(column_header))
}
# If header only consists of one row it gets converted to a vector when using gsub above.
# In case this happens, convert the vector back into a one row data frame.
if (!is.matrix(column_header)) {
column_header <- data.table::as.data.table(as.list(column_header), stringsAsFactors = FALSE)
}
column_header
}
#' Build a Multi Header from Variable Names
#'
#' @description
#' Build a multi layered header from variable names of any_tab data frame. Analysis
#' variable name, stat and variable expressions are layerd into rows.
#'
#' @param var_names Variable names from any_tab.
#' @param any_header The column header carrying the variable names.
#' @param var_labels A list in which is specified which label should be printed for
#' which variable instead of the variable name.
#' @param style A list containing the styling elements.
#'
#' @return
#' Returns a data table with a multi layered header.
#'
#' @noRd
build_multi_header <- function(var_names,
any_header,
var_labels,
style){
# Replace variable texts with custom labels
any_header <- any_header |> set_col_variable_labels(var_labels)
col_var_headers <- data.table::as.data.table(any_header, stringsAsFactors = FALSE)
# Split up variable name into different parts
header_parts <- strsplit(var_names, "_")
max_parts <- max(lengths(header_parts))
# Remove row header variable names
for (part in seq_along(header_parts)){
if (length(header_parts[[part]]) > 1){
break
}
header_parts[[part]] <- ""
}
# Loop through header parts
header_matrix <- sapply(header_parts, function(parts){
# Prepare empty header with max size
multi_header <- rep("", max_parts)
if (parts[1] != ""){
# Identify different parts
var_name <- parts[1]
stat <- strsplit(parts[2], " ")[[1]][1]
expressions <- parts[-c(1, 2)]
# Put multi header together: variable name, expressions, stat
multi_header[1] <- var_name
multi_header[seq(2, length(expressions) + 1)] <- expressions
multi_header[max_parts] <- stat
}
multi_header
})
# Revert back underscores
header_matrix <- gsub("!!!", "_", header_matrix)
# Replace variable texts with custom labels
header_matrix <- header_matrix |>
set_col_variable_labels(var_labels)
# Make sure header_matrix is treated as a matrix, even though there can be only one row
if (is.null(dim(header_matrix))) {
dim(header_matrix) <- c(1, length(header_matrix))
}
# Drop completely empty rows
if (nrow(header_matrix) > 1){
header_matrix <- data.table::as.data.table(
header_matrix[rowSums(header_matrix != "" & header_matrix != " ") > 0,
colSums(header_matrix != "" & header_matrix != " ") > 0, drop = FALSE])
}
else{
header_matrix <- data.table::as.data.table(
header_matrix[, colSums(header_matrix != "" & header_matrix != " "), drop = FALSE])
}
# Replace NA values
header_matrix[is.na(header_matrix) | header_matrix == "NA"] <- style[["na_symbol"]]
# Inject column variable headers into the multi header
merge_headers(header_matrix, col_var_headers)
}
#' Inject Variable Header into Multi Header
#'
#' @description
#' The multi layered headers based on variable values and variable names are generated
#' separately and need to be put together afterwards.
#'
#' @param value_header Header based on variable values.
#' @param variable_header Header based on variable names which will be injected.
#'
#' @return
#' Returns a data table with a complete multi layered header.
#'
#' @noRd
merge_headers <- function(value_header, variable_header){
row_list <- list()
# Loop over the value header rows to inject the variable headers into them
for (i in seq_len(nrow(value_header))){
# Take a row from the value header
row_list[[length(row_list) + 1]] <- value_header[i, , drop = FALSE]
# If there are still rows in the variable header check whether to inject them
if (i <= nrow(variable_header)){
variable_row <- variable_header[i, , drop = FALSE]
# Only inject variable row, if the row isn't completely empty
if (!all(variable_row == "")) {
row_list[[length(row_list) + 1]] <- variable_row
}
}
}
# Output as data table
data.table::rbindlist(row_list, use.names=FALSE)
}
###############################################################################
# Format grouped by tables for excel output
###############################################################################
#' Format Any Table Output with by Variables (Excel Based)
#'
#' @description
#' Format any table with the provided row and column variables. Statistics
#' can be anything available.
#'
#' @param wb An already created workbook to add more sheets to.
#' @param any_tab The data frame which contains the information for this cross
#' table.
#' @param rows The variable that appears in the table rows.
#' @param columns The variable that appears in the table columns.
#' @param statistics The user requested statistics.
#' @param by Separate the cross table output by the expressions of the provided variables.
#' @param titles Character vector of titles to display above the table.
#' @param footnotes Character vector of footnotes to display under the table.
#' @param var_labels List which contains column variable names and their respective labels.
#' @param stat_labels List which contains statistic names and their respective labels.
#' @param box The text that should appear in the upper left box of the table.
#' @param any_header The column header carrying the variable names.
#' @param style A list containing the styling elements.
#' @param output Determines whether to style the output or to just quickly paste
#' the data.
#' @param na.rm If TRUE removes all NA values from the tabulation.
#' @param monitor_df Data frame which stores the monitoring values.
#'
#' @return
#' Returns a list containing a formatted Excel workbook as well as the monitoring
#' data frame.
#'
#' @noRd
format_any_by_excel <- function(wb,
any_tab,
rows,
columns,
statistics,
by,
titles,
footnotes,
var_labels,
stat_labels,
box,
any_header,
style,
output,
na.rm,
monitor_df){
# Loop through all by variables
index <- 1
for (by_var in by){
monitor_df <- monitor_df |> monitor_start(paste0("Excel prepare (", by_var, ")"), "Format by")
# Select by variables one by one
any_by <- any_tab |>
collapse::fsubset(any_tab[["BY"]] == by_var)
# Extract unique values
if (anyNA(any_by[["by_vars"]])){
values <- c(unique(stats::na.omit(any_by[["by_vars"]])), NA)
}
else{
values <- unique(any_by[["by_vars"]])
}
monitor_df <- monitor_df |> monitor_end()
# Loop through all unique values to generate frequency tables per expression
for (value in values){
# In case NAs are removed
if (is.na(value) && na.rm){
next
}
monitor_df <- monitor_df |> monitor_start(paste0("Excel (", by_var, "_", value, ")"), "Format by")
message(" + ", paste0(by_var, " = ", value))
# Put additional by info together with the information which by variable
# and which value is currently filtered.
by_info <- paste0(by_var, " = ", value)
# Filter table by current by variable and value
if (!is.na(value)){
any_temp <- any_by |>
collapse::fsubset(any_by[["by_vars"]] == value)
}
else{
any_temp <- any_by |>
collapse::fsubset(is.na(any_by[["by_vars"]]))
}
any_temp <- any_temp |> dropp("BY", "by_vars")
# Add by info below the titles
if (length(titles) > 0){
titles_temp <- c(titles, "", by_info)
}
# Or on top if there are no titles
else{
titles_temp <- by_info
}
# Generate frequency tables as normal but base is filtered data frame
wb_list <- format_any_excel(wb,
any_temp,
rows,
columns,
statistics,
value,
titles_temp,
footnotes,
var_labels,
stat_labels,
box,
any_header,
style,
output,
by_info,
index,
NULL)
index <- index + 1
wb <- wb_list[[1]]
monitor_df <- monitor_df |> monitor_end()
}
monitor_df <- monitor_df |> monitor_end()
}
# Return workbook
list(wb, monitor_df)
}
###############################################################################
# Combine multiple tables into one workbook
###############################################################################
#' Combine Multiple Tables Into One Workbook
#'
#' @description
#' Combines any number of tables created with [any_table()] into one workbook
#' and styles them according to their meta information.
#'
#' @param ... Provide any number of result lists output by [any_table()].
#' @param file If NULL, opens the output as temporary file. If a filename with path
#' is specified, saves the output to the specified path.
#' @param output The following output formats are available: excel and excel_nostyle.
#' @param print TRUE by default. If TRUE prints the output, if FALSE doesn't print anything. Can be used
#' if one only wants to catch the combined workbook.
#' @param monitor FALSE by default. If TRUE outputs two charts to visualize the functions time consumption.
#'
#' @return
#' A fully styled workbook containing the provided tables.
#'
#' @examples
#' # Example data frame
#' my_data <- dummy_data(1000)
#' my_data[["person"]] <- 1
#'
#' # Formats
#' age. <- discrete_format(
#' "Total" = 0:100,
#' "under 18" = 0:17,
#' "18 to under 25" = 18:24,
#' "25 to under 55" = 25:54,
#' "55 to under 65" = 55:64,
#' "65 and older" = 65:100)
#'
#' sex. <- discrete_format(
#' "Total" = 1:2,
#' "Male" = 1,
#' "Female" = 2)
#'
#' education. <- discrete_format(
#' "Total" = c("low", "middle", "high"),
#' "low education" = "low",
#' "middle education" = "middle",
#' "high education" = "high")
#'
#' # Define style
#' my_style <- excel_output_style(column_widths = c(2, 15, 15, 15, 9))
#'
#' # Define titles and footnotes. If you want to add hyperlinks you can do so by
#' # adding "link:" followed by the hyperlink to the main text.
#' titles <- c("This is title number 1 link: https://cran.r-project.org/",
#' "This is title number 2",
#' "This is title number 3")
#' footnotes <- c("This is footnote number 1",
#' "This is footnote number 2",
#' "This is footnote number 3 link: https://cran.r-project.org/")
#'
#' # Catch the output and additionally use the options:
#' # pint = FALSE and output = "excel_nostyle".
#' # This skips the styling and output part, so that the function runs faster.
#' # The styling is done later on.
#' my_style <- my_style |> modify_output_style(sheet_name = "big table")
#'
#' tab1 <- my_data |> any_table(rows = c("sex + age", "sex", "age"),
#' columns = c("year", "education + year"),
#' values = weight,
#' statistics = c("sum", "pct_group"),
#' pct_group = c("sex", "age", "education", "year"),
#' formats = list(sex = sex., age = age.,
#' education = education.),
#' style = my_style,
#' na.rm = TRUE,
#' print = FALSE,
#' output = "excel_nostyle")
#'
#' my_style <- my_style |> modify_output_style(sheet_name = "age_sex")
#'
#' tab2 <- my_data |> any_table(rows = c("age"),
#' columns = c("sex"),
#' values = weight,
#' statistics = c("sum"),
#' formats = list(sex = sex., age = age.),
#' style = my_style,
#' na.rm = TRUE,
#' print = FALSE,
#' output = "excel_nostyle")
#'
#' my_style <- my_style |> modify_output_style(sheet_name = "edu_year")
#'
#' tab3 <- my_data |> any_table(rows = c("education"),
#' columns = c("year"),
#' values = weight,
#' statistics = c("pct_group"),
#' formats = list(education = education.),
#' style = my_style,
#' na.rm = TRUE,
#' print = FALSE,
#' output = "excel_nostyle")
#'
#' # Every of the above tabs is a list, which contains the data table, an unstyled
#' # workbook and the meta information needed for the individual styling. These
#' # tabs can be input into the following function, which reads the meta information,
#' # styles each table individually and combines them as separate sheets into a single workbook.
#' combine_into_workbook(tab1, tab2, tab3)
#'
#' @export
combine_into_workbook <- function(...,
file = NULL,
output = "excel",
print = TRUE,
monitor = FALSE){
monitor_df <- NULL |> monitor_start("Prepare combine", "Prepare")
# Measure the time
start_time <- Sys.time()
tables <- list(...)
tab_names <- as.character(substitute(list(...)))[-1]
wb <- openxlsx2::wb_workbook()
i <- 1
message(" > Formatting tables")
for (table in tables){
monitor_df <- monitor_df |> monitor_next(paste0("Format ", tab_names[i]), "Format tables")
message(paste0(" + ", tab_names[i]))
meta <- table[["meta"]]
wb <- wb |> prepare_styles(meta[["style"]])
# In case no by variables are provided
if (length(meta[["by"]]) == 0){
wb_list <- suppressMessages(
format_any_excel(wb, table[["table"]], meta[["rows"]], meta[["columns"]],
meta[["statistics"]], meta[["by"]], meta[["titles"]],
meta[["footnotes"]], meta[["var_labels"]], meta[["stat_labels"]],
meta[["box"]], meta[["any_header"]],
meta[["style"]], output, monitor_df = monitor_df))
wb <- wb_list[[1]]
}
# In case there are by variables are provided
else{
wb_list <- suppressMessages(
format_any_by_excel(wb, table[["table"]], meta[["rows"]], meta[["columns"]],
meta[["statistics"]], meta[["by"]], meta[["titles"]],
meta[["footnotes"]], meta[["var_labels"]], meta[["stat_labels"]],
meta[["box"]], meta[["any_header"]],
meta[["style"]], output, meta[["na.rm"]], monitor_df))
wb <- wb_list[[1]]
}
i <- i + 1
}
# Output formatted table into different formats
if (print){
monitor_df <- monitor_df |> monitor_next("Output tables", "Output tables")
if (is.null(file)){
wb$open()
}
else if (!dir.exists(dirname(file))){
message(" ! WARNING: Directory '", dirname(file), "' does not exist. File won't be saved.")
wb$open()
}
else{
wb$save(file = file, overwrite = TRUE)
}
}
end_time <- round(difftime(Sys.time(), start_time, units = "secs"), 3)
message("\n- - - 'combine_into_workbook' execution time: ", end_time, " seconds\n")
monitor_df <- monitor_df |> monitor_end()
monitor_df |> monitor_plot(draw_plot = monitor)
invisible(wb)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.