R/any_table.R

Defines functions combine_into_workbook format_any_by_excel merge_headers build_multi_header set_col_variable_labels set_statistic_labels format_any_excel any_table

Documented in any_table combine_into_workbook

#' Compute Any Possible Table
#'
#' @description
#' [any_table()] produces any possible descriptive table in 'Excel' format. Any number
#' of variables can be nested and crossed. The output is an individually styled
#' 'Excel' table, which also receives named ranges, making it easier to read the data back in.
#'
#' @param data_frame A data frame in which are the variables to tabulate.
#' @param rows A vector that provides single variables or variable combinations that
#' should appear in the table rows. To nest variables use the form:
#' "var1 + var2 + var3 + ...".
#' @param columns A vector that provides single variables or variable combinations that
#' should appear in the table rows. To nest variables use the form:
#' "var1 + var2 + var3 + ...".
#' @param values A vector containing all variables that should be summarised.
#' @param statistics Available functions:
#' - "sum"       -> Weighted and unweighted sum
#' - "sum_wgt"   -> Sum of all weights
#' - "freq"      -> Unweighted frequency
#' - "freq_g0"   -> Unweighted frequency of all values greater than zero
#' - "pct_group" -> Weighted and unweighted percentages within the respective group
#' - "pct_value" -> Weighted and unweighted percentages between value variables
#' - "pct_total" -> Weighted and unweighted percentages compared to the grand total
#' - "mean"      -> Weighted and unweighted mean
#' - "median"    -> Weighted and unweighted median
#' - "mode"      -> Weighted and unweighted mode
#' - "min"       -> Minimum
#' - "max"       -> Maximum
#' - "sd"        -> Weighted and unweighted standard deviation
#' - "variance"  -> Weighted and unweighted standard variance
#' - "first"     -> First value
#' - "last"      -> Last value
#' - "pn"        -> Weighted and unweighted percentiles (any p1, p2, p3, ... possible)
#' - "missing"   -> Missings generated by the value variables
#' @param pct_group If pct_group is specified in the statistics, this option is used to
#' determine which variable of the row and column variables should add up to 100 %.
#' Multiple variables can be specified in a vector to generate multiple group percentages.
#' @param pct_value If pct_value is specified in the statistics, you can pass a list here
#' which contains the information for a new variable name and between which of the value
#' variables percentages should be computed.
#' @param formats A list in which is specified which formats should be applied to which variables.
#' @param by Compute tables stratified by the expressions of the provided variables.
#' @param weight Put in a weight variable to compute weighted results.
#' @param order_by Determine how the columns will be ordered. "values" orders the results by the
#' order you provide the variables in values. "stats" orders them by the order under statistics.
#' "values_stats" is a combination of both. "columns" keeps the order as given in columns
#' and "interleaved" alternates the stats.
#' @param titles Specify one or more table titles.
#' @param footnotes Specify one or more table footnotes.
#' @param var_labels A list in which is specified which label should be printed for
#' which variable instead of the variable name.
#' @param stat_labels A list in which is specified which label should be printed for
#' which statistic instead of the statistic name.
#' @param box Provide a text for the upper left box of the table.
#' @param workbook Insert a previously created workbook to expand the sheets instead of
#' creating a new file.
#' @param style A list of options can be passed to control the appearance of excel outputs.
#' Styles can be created with [excel_output_style()].
#' @param output The following output formats are available: excel and excel_nostyle.
#' @param pre_summed FALSE by default. If TRUE this function works with pre summarised data. This can be
#' used, if not all the needed results can be calculated by [any_table()] and need to be prepared in
#' advance. Enabling you to still make use of the styled tabulation. For this to work, the values have to
#' carry the statistic extension (e.g. "_sum", "_pct") in the variable name.
#' @param na.rm FALSE by default. If TRUE removes all NA values from the variables.
#' @param print TRUE by default. If TRUE prints the output, if FALSE doesn't print anything. Can be used
#' if one only wants to catch the output data frame and workbook with meta information.
#' @param monitor FALSE by default. If TRUE, outputs two charts to visualize the functions time consumption.
#'
#' @details
#' [any_table()] is based on the 'SAS' procedure Proc Tabulate, which provides
#' efficient and readable ways to perform complex tabulations.
#'
#' With this function you can combine any number of variables in any possible way, all
#' at once. You just define which variables or variable combinations should end up in
#' the table rows and columns with a simple syntax. Listing variables in a vector like
#' c("var1", "var2", "var3",...) means to put variables below (in case of the
#' row variables) or besides (in case of the column variables) each other. Nesting variables
#' is as easy as putting a plus sign between them, e.g. c("var1 + var2", "var2" + "var3" + "var4", etc.).
#' And of course you can combine both versions.
#'
#' The real highlight is, that this function not only creates all the desired variable
#' combinations and exports them to an 'Excel' file, it prints a fully custom styled
#' table to a workbook. Setting up a custom, reusable style is as easy as setting up
#' options like: provide a color for the table header, set the font size for the row header,
#' should borders be drawn for the table cells yes/no, and so on. Merging doubled header texts,
#' happens automatically.
#'
#' With this function you basically can fully concentrate on designing a table, instead of
#' thinking hard about how to calculate where to put a border or to even manually prepare
#' a designed workbook.
#'
#' @return
#' Returns a list with the data table containing the results for the table, the formatted
#' 'Excel' workbook and the meta information needed for styling the final table.
#'
#' @seealso
#' Creating a custom table style: [excel_output_style()], [modify_output_style()],
#' [number_format_style()], [modify_number_formats()].
#'
#' Creating formats: [discrete_format()] and [interval_format()].
#'
#' Functions that can handle formats and styles: [frequencies()], [crosstabs()].
#'
#' Additional functions that can handle styles: [export_with_style()]
#'
#' Additional functions that can handle formats: [summarise_plus()], [recode()],
#' [recode_multi()]
#'
#' @examples
#' # Example data frame
#' my_data <- dummy_data(1000)
#' my_data[["person"]] <- 1
#'
#' # Formats
#' age. <- discrete_format(
#'     "Total"          = 0:100,
#'     "under 18"       = 0:17,
#'     "18 to under 25" = 18:24,
#'     "25 to under 55" = 25:54,
#'     "55 to under 65" = 55:64,
#'     "65 and older"   = 65:100)
#'
#' sex. <- discrete_format(
#'     "Total"  = 1:2,
#'     "Male"   = 1,
#'     "Female" = 2)
#'
#' education. <- discrete_format(
#'     "Total"            = c("low", "middle", "high"),
#'     "low education"    = "low",
#'     "middle education" = "middle",
#'     "high education"   = "high")
#'
#' # Define style
#' my_style <- excel_output_style(column_widths = c(2, 15, 15, 15, 9))
#'
#' # Define titles and footnotes. If you want to add hyperlinks you can do so by
#' # adding "link:" followed by the hyperlink to the main text.
#' titles <- c("This is title number 1 link: https://cran.r-project.org/",
#'             "This is title number 2",
#'             "This is title number 3")
#' footnotes <- c("This is footnote number 1",
#'                "This is footnote number 2",
#'                "This is footnote number 3 link: https://cran.r-project.org/")
#'
#' # Output complex tables with different percentages
#' my_data |> any_table(rows       = c("sex + age", "sex", "age"),
#'                      columns    = c("year", "education + year"),
#'                      values     = weight,
#'                      statistics = c("sum", "pct_group"),
#'                      pct_group  = c("sex", "age", "education", "year"),
#'                      formats    = list(sex = sex., age = age.,
#'                                        education = education.),
#'                      style      = my_style,
#'                      na.rm      = TRUE)
#'
#' # If you want to get a clearer vision of what the result table looks like, in terms
#' # of the row and column categories, you can write the code like this, to make out
#' # the variable crossings and see the order.
#' my_data |> any_table(columns = c(            "year", "education + year"),
#'                      rows    = c("sex + age",
#'                                  "sex",
#'                                  "age"),
#'                      values     = weight,
#'                      statistics = c("sum", "pct_group"),
#'                      pct_group  = c("sex", "age", "education", "year"),
#'                      formats    = list(sex = sex., age = age.,
#'                                        education = education.),
#'                      style      = my_style,
#'                      na.rm      = TRUE)
#'
#' # Percentages based on value variables instead of categories
#' my_data |> any_table(rows       = c("age + year"),
#'                      columns    = c("sex"),
#'                      values     = c(probability, person),
#'                      statistics = c("pct_value", "sum", "freq"),
#'                      pct_value  = list(rate = "probability / person"),
#'                      weight     = weight,
#'                      formats    = list(sex = sex., age = age.),
#'                      style      = my_style,
#'                      na.rm      = TRUE)
#'
#' # Customize the visual appearance by adding titles, footnotes and variable
#' # and statistic labels.
#' # Note: You don't have to describe every element. Sometimes a table can be more
#' # readable with less text. To completely remove a variable label just put in an
#' # empty text "" as label.
#' my_data |> any_table(rows        = c("age + year"),
#'                      columns     = c("sex"),
#'                      values      = weight,
#'                      statistics  = c("sum", "pct_group"),
#'                      order_by    = "interleaved",
#'                      formats     = list(sex = sex., age = age.),
#'                      titles      = titles,
#'                      footnotes   = footnotes,
#'                      var_labels  = list(age = "Age categories",
#'                                        sex = "", weight = ""),
#'                      stat_labels = list(pct = "%"),
#'                      style       = my_style,
#'                      na.rm       = TRUE)
#'
#' # With individual styling
#' my_style <- my_style |> modify_output_style(header_back_color = "0077B6",
#'                                             font              = "Times New Roman")
#'
#' my_data |> any_table(rows       = c("age + year"),
#'                      columns    = c("sex"),
#'                      values     = c(probability, person),
#'                      statistics = c("pct_value", "sum", "freq"),
#'                      pct_value  = list(rate = "probability / person"),
#'                      weight     = weight,
#'                      formats    = list(sex = sex., age = age.),
#'                      style      = my_style,
#'                      na.rm      = TRUE)
#'
#' # Pass on workbook to create more sheets in the same file
#' my_style <- my_style |> modify_output_style(sheet_name = "age_sex")
#'
#' result_list <- my_data |>
#'            any_table(rows       = c("age"),
#'                      columns    = c("sex"),
#'                      values     = weight,
#'                      statistics = c("sum"),
#'                      formats    = list(sex = sex., age = age.),
#'                      style      = my_style,
#'                      na.rm      = TRUE,
#'                      print      = FALSE)
#'
#' my_style <- my_style |> modify_output_style(sheet_name = "edu_year")
#'
#' my_data |> any_table(workbook   = result_list[["workbook"]],
#'                      rows       = c("education"),
#'                      columns    = c("year"),
#'                      values     = weight,
#'                      statistics = c("pct_group"),
#'                      formats    = list(education = education.),
#'                      style      = my_style,
#'                      na.rm      = TRUE)
#'
#' # Output multiple complex tables by expressions of another variable.
#' # If you specify the sheet name as "by" in the output style, the sheet
#' # names are named by the variable expressions of the by-variable. Otherwise
#' # the given sheet named gets a running number.
#' my_style <- my_style |> modify_output_style(sheet_name = "by")
#'
#' my_data |> any_table(rows       = c("sex", "age"),
#'                      columns    = c("education + year"),
#'                      values     = weight,
#'                      by         = state,
#'                      statistics = c("sum", "pct_group"),
#'                      pct_group  = c("education"),
#'                      formats    = list(sex = sex., age = age., state = state.,
#'                                        education = education.),
#'                      titles     = titles,
#'                      footnotes  = footnotes,
#'                      style      = my_style,
#'                      na.rm      = TRUE)
#'
#' @export
any_table <- function(data_frame,
                      rows,
                      columns        = "",
                      values,
                      statistics     = c("sum"),
                      pct_group      = c(),
                      pct_value      = list(),
                      formats        = list(),
                      by             = c(),
                      weight         = NULL,
                      order_by       = "stats",
                      titles         = c(),
                      footnotes      = c(),
                      var_labels     = list(),
                      stat_labels    = list(),
                      box            = "",
                      workbook       = NULL,
                      style          = excel_output_style(),
                      output         = "excel",
                      pre_summed     = FALSE,
                      na.rm          = FALSE,
                      print          = TRUE,
                      monitor        = FALSE){

    # Measure the time
    start_time <- Sys.time()

    monitor_df <- NULL |> monitor_start("Error handling", "Preparation")

    # First convert data frame to data table
    if (!data.table::is.data.table(data_frame)){
        data_frame <- data.table::as.data.table(data_frame)
    }

    # Evaluate formats early, otherwise apply formats can't evaluate them in unit
    # test situation.
    formats_list <- as.list(substitute(formats))[-1]

    formats <- stats::setNames(
        lapply(formats_list, function(expression){
            # Catch expression if passed as string
            if (is.character(expression)) {
                tryCatch(get(expression, envir = parent.frame()),
                         error = function(e) NULL)
            }
            # Catch expression if passed as symbol
            else{
                tryCatch(eval(expression, envir = parent.frame()),
                         error = function(e) NULL)
            }
        }),
        names(formats_list))

    # Look up variable names in format data frame to check whether there is an
    # interval or discrete format
    flag_interval <- FALSE

    for (current_var in names(formats)){
        format_df          <- formats[[current_var]]
        interval_variables <- c("from", "to")
        actual_variables   <- names(format_df)[1:2]

        if (identical(interval_variables, actual_variables)){
            flag_interval <- TRUE
            break
        }
    }

    ###########################################################################
    # Error handling
    ###########################################################################

    # Get row variables from provided combinations
    row_vars <- unique(trimws(unlist(strsplit(rows, "\\+"))))

    invalid_rows <- row_vars[!row_vars %in% names(data_frame)]
    row_vars     <- row_vars[row_vars %in% names(data_frame)]

    if (length(invalid_rows) > 0){
        message(" X ERROR: The provided row variable '", paste(invalid_rows, collapse = ", "), "' is not part of\n",
                "          the data frame. Any table will be aborted.")
        return(invisible(NULL))
    }

    if (length(rows) == 0){
        message(" X ERROR: No valid row variables provided. Any table will be aborted.")
        return(invisible(NULL))
    }

    if (length(rows) == 1){
        if (rows == ""){
            message(" X ERROR: No valid row variables provided. Any table will be aborted.")
            return(invisible(NULL))
        }
    }

    # Get row variables from provided combinations
    col_vars <- unique(trimws(unlist(strsplit(columns, "\\+"))))

    invalid_columns <- col_vars[!col_vars %in% names(data_frame)]
    col_vars        <- col_vars[col_vars %in% names(data_frame)]

    if (length(invalid_columns) > 0){
        message(" X ERROR: The provided column variable '", paste(invalid_columns, collapse = ", "), "' is not part of\n",
                "          the data frame. Any table will be aborted.")
        return(invisible(NULL))
    }

    if (length(columns) == 0){
        message(" X ERROR: No valid column variables provided. Any table will be aborted.")
        return(invisible(NULL))
    }

    if (length(columns) == 1){
        if (columns == ""){
            # Create empty pseudo variable to let the rest of the program run as normal
            data_frame[[".temp.var"]] <- 1
            columns    <- ".temp.var"
            col_vars   <- ".temp.var"
            var_labels <- c(var_labels, ".temp.var" = "")
            formats[[".temp.var"]] <- suppressMessages(discrete_format(" " = 1))
        }
    }

    # Convert to character vectors
    by_temp <- sub("^list\\(", "c(", gsub("\"", "", deparse(substitute(by), width.cutoff = 500L)))

    if (substr(by_temp, 1, 2) == "c("){
        by <- as.character(substitute(by))
    }
    else if (!is_error(by)){
        # Do nothing. In this case variables already contains the substituted variable names
        # while variables_temp is evaluated to the symbol passed into the function.
    }
    else{
        by <- by_temp
    }

    # Remove extra first character created with substitution
    by <- by[by != "c"]

    provided_by <- by
    invalid_by  <- by[!by %in% names(data_frame)]
    by          <- by[by %in% names(data_frame)]

    if (length(invalid_by) > 0){
        message(" ! WARNING: The provided by variable '", paste(invalid_by, collapse = ", "), "' is not part of\n",
                "            the data frame. This variable will be omitted during computation.")
    }

    variables  <- c(row_vars, col_vars)
    invalid_by <- by[by %in% variables]

    if (length(invalid_by) > 0){
        message(" X ERROR: The provided by variable '", paste(invalid_by, collapse = ", "), "' is also part of\n",
                "          the row and column variables which is not allowed. Any table will be aborted.")
        return(invisible(NULL))
    }

    if (length(by) == 1){
        if (by == ""){
            message(" X ERROR: No valid by variables provided. Any table will be aborted.")
            return(invisible(NULL))
        }
    }

    # Create temporary weight column if none is provided.
    # Also get the name of the weight variable as string.
    weight_temp <- sub("^list\\(", "c(", gsub("\"", "", deparse(substitute(weight), width.cutoff = 500L)))

    if (weight_temp == "NULL" || substr(weight_temp, 1, 2) == "c("){
        weight_var <- ".temp_weight"
        data_frame[[".temp_weight"]] <- 1

        if (substr(weight_temp, 1, 2) == "c("){
            message(" ! WARNING: Only one variable for weight allowed. Evaluations will be unweighted.")
        }
    }
    else if (!is_numeric(data_frame[[weight_temp]])){
        weight_var <- ".temp_weight"
        data_frame[[".temp_weight"]] <- 1

        message(" ! WARNING: Provided weight variable is not numeric. Unweighted results will be computed.")
    }
    else{
        weight_var <- weight_temp

        # NA values in weight lead to errors therefor convert them to 0
        if (anyNA(data_frame[[weight_temp]])){
            message(" ~ NOTE: Missing values in weight variable '", weight_temp, "' will be converted to 0.")
        }
        data_frame[[weight_temp]] <- data.table::fifelse(is.na(data_frame[[weight_temp]]), 0, data_frame[[weight_temp]])

        # @Hack: so I don't have to check if .temp_weight exists later on
        data_frame[[".temp_weight"]] <- 1
    }

    # Convert to character vectors
    values_temp <- sub("^list\\(", "c(", gsub("\"", "", deparse(substitute(values), width.cutoff = 500L)))

    if (substr(values_temp, 1, 2) == "c("){
        values <- as.character(substitute(values))
    }
    else if (!is_error(values)){
        # Do nothing. In this case values already contains the substituted variable names
        # while values_temp is evaluated to the symbol passed into the function.
    }
    else{
        values <- values_temp
    }

    # Remove extra first character created with substitution
    values <- values[values != "c"]

    # If no value variables are provided abort
    if (length(values) == 0){
        message(" X ERROR: No values provided.")
        return(invisible(NULL))
    }
    else if (length(values) == 1){
        if (values == ""){
            message(" X ERROR: No values provided.")
            return(invisible(NULL))
        }
    }

    # Make sure there is no class variable that is also a value variable.
    invalid_class <- values[values %in% c(row_vars, col_vars)]
    values        <- values[!values %in% c(row_vars, col_vars)]

    if (length(invalid_class) > 0){
        message(" x ERROR: The provided row/column variable '", paste(invalid_class, collapse = ", "), "' is also part of\n",
                "          the analysis variables. Any table will be aborted.")
        return(invisible(NULL))
    }

    invalid_values <- values[!values %in% names(data_frame)]
    values         <- values[values %in% names(data_frame)]

    if (length(invalid_values) > 0){
        message(" ! WARNING: The provided analysis variable '", paste(invalid_values, collapse = ", "), "' is not part of\n",
                "            the data frame. This variable will be omitted during computation.")
    }

    if (length(values) == 0){
        message(" X ERROR: No valid analysis variables provided. Any table will be aborted.")
        return(invisible(NULL))
    }

    provided_values <- values
    values          <- unique(values)

    if (length(provided_values) > length(values)){
        message(" ! WARNING: Some analysis variables are provided more than once. The doubled entries will be omitted.")
    }

    # Check for invalid output option
    if (!tolower(output) %in% c("excel", "excel_nostyle")){
        message(" ! WARNING: Output format '", output, "' not available. Using 'excel' instead.")

        output <- "excel"
    }
    else{
        output <- tolower(output)
    }

    # Correct nesting option if not set right
    if (!tolower(order_by) %in% c("values", "stats", "values_stats", "columns", "interleaved")){
        message(" ! WARNING: Order by option '", order_by, "' doesn't exist. Options 'values', 'stats', 'values_stats', 'columns'\n",
                "            and 'interleaved' are available. Order by will be set to 'stats'.")
        order_by <- "stats"
    }

    # Remove missing variables from pct_group
    if ("pct_group" %in% tolower(statistics)){
        invalid_pct <- pct_group[!pct_group %in% c(row_vars, col_vars)]

        if (length(invalid_pct) > 0){
            message(" ! WARNING: The variable '", paste(invalid_pct, collapse = ", "), "' provided as pct_group is not part of the row and column variables.\n",
                    "            The variable will be omitted.")

            pct_group <- pct_group[pct_group %in% c(row_vars, col_vars)]
        }

        rm(invalid_pct)
    }

    # In case of using a pre summarised data frame, underscores are only allowed if they carry
    # the statistics extension afterwards.
    if (pre_summed){
        if (!"TYPE" %in% names(data_frame)){
            message(" X ERROR: The pre summarised data needs the TYPE variable generated by summarise_plus. Any table will be aborted.")
            return(invisible(NULL))
        }

        # Check if value variables have statistics extension
        extensions <- c("_sum", "_pct_group", "_pct_total", "_pct_value", "_pct", "_freq_g0",
                        "_freq", "_mean", "_median", "_mode", "_min", "_max", "_first",
                        "_last", "_sum_wgt", "_p[0-9]+$", "_sd", "_variance", "_missing")
        pattern    <- paste0("(", paste(extensions, collapse = "|"), ")$")


        # If one of the value variables hasn't got any of the above extension abort
        if (!all(grepl(pattern, values))){
            message(" X ERROR: All value variables need to have the statistic extensions in their variable names.\n",
                    "          Execution will be aborted.")
            return(invisible(NULL))
        }

        # Set up options to make sure nothing errors below. Statistics is set to "mean"
        # because then summarise_plus takes a route where factor variables are kept in order.
        # With "sum" the order would messed up.
        statistics <- "mean"
        pct_group  <- ""
        pct_value  <- ""
        weight     <- NULL
        formats    <- list()

        rm(extensions, pattern)
    }

    rm(invalid_by, invalid_class, invalid_columns, invalid_rows, invalid_values,
       provided_by, provided_values, weight_temp, values_temp, by_temp)

    ###########################################################################
    # Any tabulation starts
    ###########################################################################

    monitor_df <- monitor_df |> monitor_next("Summary", "Summary")
    message("\n > Computing stats.")

    # Put together vector of grouping variables
    group_vars <- c(by, variables)

    # In case of group percentages order group variable to the last position
    if ("pct_group" %in% tolower(statistics) && length(pct_group) > 0){
        group_vars <- c(setdiff(group_vars, pct_group[1]), pct_group[1])
    }
    # If pct_group is specified in statistics but no group is provided set
    # last variable of group_vars as standard.
    else if ("pct_group" %in% tolower(statistics) && length(pct_group) == 0){
        pct_group <- group_vars[length(group_vars)]
    }

    # Put combinations in a single vector
    combinations <- as.vector(outer(rows, columns, paste, sep = " + "))

    # In case by variables are specified, add by to group variables and build
    # additional combinations
    if (length(by) > 0){
        combinations <- as.vector(outer(by, combinations, paste, sep = "+"))
    }

    # Compute statistics
    if (!pre_summed){
        any_tab <- suppressMessages(data_frame |>
              summarise_plus(class      = group_vars,
                             values     = values,
                             statistics = statistics,
                             formats    = formats,
                             weight     = weight_var,
                             nesting    = "all",
                             types      = combinations,
                             notes      = FALSE,
                             na.rm      = na.rm)) |>
            rename_pattern("pct_group", paste0("pct_group_", pct_group[1])) |>
            collapse::fsubset(TYPE != "total")
    }
    else{
        # With pre summarised data just take the input data frame
        any_tab <- data_frame
    }

    if (is.null(any_tab)){
        message(" X ERROR: Any table could not be computed. Execution will be aborted.")
        return(invisible(NULL))
    }

    # Underscores are not allowed in column variables because when constructing the
    # table header later, the underscore is the sign by which the column names are split.
    # Having additional underscores would mess up this part and lead to errors.
    if (any(grepl("_", unlist(any_tab[col_vars])))){
        message(" X ERROR: No underscores allowed in column variable values. Execution will be aborted.")
        return(invisible(NULL))
    }

    # In case of pre summarised data frame remove the temporary weight variable
    if (pre_summed){
        any_tab <- any_tab |> dropp(".temp_weight")
    }

    # In case of by variables fuse them into one
    if (length(by) > 0){
        any_tab <- any_tab |> fuse_variables("by_vars", by)

        # Normally extract the first variable from the TYPE combination into a
        # separate BY variable.
        if (!pre_summed){
            any_tab[["BY"]]   <- sub("\\+.*", "", any_tab[["TYPE"]])
            any_tab[["TYPE"]] <- sub("^[^+]*\\+\\s*", "", any_tab[["TYPE"]])
        }
        # In case of pre summarised data frame it is not clear, at which position
        # the by variable is in the TYPE.
        else{
            # Get TYPE variable as list of variable combinations
            type_split <- strsplit(any_tab[["TYPE"]], "\\+")

            # If by variable was found in a variable combination, put it into
            # a separate variable.
            any_tab[["BY"]] <- vapply(type_split, function(obs){
                by_found <- obs[obs %in% by]

                if (length(by_found) > 0){
                    by_found[1]
                }
                else{
                    ""
                }
            }, character(1L))

            # The TYPE variable itself is put back together as all variables which
            # are no by-Variables.
            any_tab[["TYPE"]] <- vapply(type_split, function(obs){
                paste(obs[!obs %in% by], collapse = "+")
            }, character(1L))
        }
    }

    # In case multiple group percentages should be computed, evaluate them in a loop
    # and join them to the main data frame.
    if (length(pct_group) > 1){
        monitor_df <- monitor_df |> monitor_next("Additional group pct", "Summary")

        for (group in seq_along(pct_group)){
            # First group was computed before so omit it here
            if (group == 1){
                next
            }

            # In case of group percentages order group variable to the last position
            group_vars <- c(setdiff(group_vars, pct_group[group]), pct_group[group])

            # Compute group percentages
            group_tab <- suppressMessages(data_frame |>
                summarise_plus(class      = group_vars,
                               values     = values,
                               statistics = "pct_group",
                               formats    = formats,
                               weight     = weight_var,
                               nesting    = "all",
                               types      = combinations,
                               notes      = FALSE,
                               na.rm      = na.rm)) |>
                rename_pattern("pct_group", paste0("pct_group_", pct_group[group])) |>
                drop_type_vars()

            merge_vars <- variables

            if (length(by) > 0){
                group_tab <- group_tab |> fuse_variables("by_vars", by)
                merge_vars <- c("by_vars", variables)
            }

            # Join percentages to the main data frame
            any_tab <- any_tab |>
                collapse::join(group_tab, on = merge_vars, how = "left",
                               verbose = FALSE, overid = 2)
        }
        rm(group, group_tab)
    }

    # In case only pct_value was selected as statistic
    if ("pct_value" %in% tolower(statistics) && length(statistics) == 1){
        message(" X ERROR: pct_value can only be computed in combination with statistic\n",
                "          'sum'. Since no other statistic is provided any table will be aborted.")
        return(invisible(NULL))
    }
    # In case percentages based on value variables should be computed
    else if ("pct_value" %in% tolower(statistics) && length(pct_value) > 0){
        for (i in seq_along(pct_value)){
            value <- pct_value[[i]]
            name  <- names(pct_value)[i]

            # Separate provided variables first
            eval_vars <- trimws(strsplit(value, split = "/")[[1]])

            # Compute percentages
            if (paste0(eval_vars[1], "_sum") %in% names(any_tab) &&
                paste0(eval_vars[2], "_sum") %in% names(any_tab)){
                any_tab[[paste0(name, "_pct_value")]] <-
                    any_tab[[paste0(eval_vars[1], "_sum")]] * 100 /
                    any_tab[[paste0(eval_vars[2], "_sum")]]
            }
            # Without sum percentages can't be computed
            else{
                message(" ! WARNING: pct_value can only be computed in combination with statistic\n",
                        "            'sum'. Percentages for '", name, "' could not be evaluated.")

                # Additional warnings for missing variables
                if (!eval_vars[1] %in% names(data_frame)){
                    message(" ! WARNING: Variable '", eval_vars[1], "' not found in the data frame.")
                }
                if (!eval_vars[2] %in% names(data_frame)){
                    message(" ! WARNING: Variable '", eval_vars[2], "' not found in the data frame.")
                }
            }
        }

        rm(eval_vars, i, value)
    }

    rm(data_frame)

    # Reorder variables according to statistics. This is necessary because pct_value
    # can only be computed after summarise_plus and therefor isn't ordered.
    any_tab <- any_tab |> setcolorder_by_pattern(statistics)

    # Get value variable names
    if (length(by) == 0){
        value_vars <- any_tab |> inverse(c(variables, "TYPE", "TYPE_NR", "DEPTH"))
    }
    else{
        value_vars <- any_tab |> inverse(c(variables, "TYPE", "TYPE_NR", "DEPTH", "by_vars", "BY"))
    }

    # Round values according to number formats
    for (var_name in names(any_tab)){
        if (!var_name %in% value_vars){
            next
        }

        # Get stat from variable name
        stat <- strsplit(var_name, split = "_")[[1]]
        stat <- stat[length(stat)]

        # Round values to the decimals places specified in the style
        if (tolower(stat) %in% c("sum", "freq", "freq", "mean", "median", "mode",
                                 "min", "max")){
            any_tab[[var_name]] <- round(any_tab[[var_name]],
                                         style[["number_formats"]][[paste0(stat, "_decimals")]])
        }
        else if(stat == "g0"){
            any_tab[[var_name]] <- round(any_tab[[var_name]],
                                         style[["number_formats"]][["freq_decimals"]])
        }
        else if(stat == "wgt"){
            any_tab[[var_name]] <- round(any_tab[[var_name]],
                                         style[["number_formats"]][["sum_decimals"]])
        }
        else if(grepl("^[0-9]$", substr(stat, 2, 2))){
            any_tab[[var_name]] <- round(any_tab[[var_name]],
                                         style[["number_formats"]][["p_decimals"]])
        }
        else{
            any_tab[[var_name]] <- round(any_tab[[var_name]],
                                         style[["number_formats"]][["pct_decimals"]])
        }
    }

    # Tear apart the the summarised data frame by row and column combinations and
    # transpose columns to generate user defined combination. At the end put all
    # the pieces back together to form a fully printable result data frame.
    monitor_df <- monitor_df |> monitor_next("Transform table", "Transform")

    part_combi_list       <- list()
    header_combi_list     <- list()
    col_header_df         <- list()
    row_header_dimensions <- list()
    last_number_of_rows   <- 0
    by_division           <- 1

    if (length(by) > 0){
        by_division <- length(unique(any_tab[["by_vars"]]))
    }

    # Underscores are not allowed in column and values variables, because when constructing the
    # table header later, the underscore is the symbol by which the variable names are split.
    # Having additional underscores would mess up this part and lead to errors. Therefor the
    # additional underscores will be temporarily replaced.
    extensions <- c("_sum", "_pct_group_", "_pct_total", "_pct_value", "_pct", "_freq_g0",
                    "_freq", "_mean", "_median", "_mode", "_min", "_max", "_first",
                    "_last", "_p1", "_p2", "_p3", "_p4", "_p5", "_p6", "_p7", "_p8", "_p9",
                    "sum_wgt", "_sd", "_variance", "_missing", "by_vars", "TYPE_NR")

    rows       <- replace_except(rows,       "_", "!!!", extensions)
    columns    <- replace_except(columns,    "_", "!!!", extensions)
    value_vars <- replace_except(value_vars, "_", "!!!", extensions)
    value_sort <- replace_except(values,     "_", "!!!", extensions)
    names(any_tab)    <- replace_except(names(any_tab),    "_", "!!!", extensions)
    any_tab[["TYPE"]] <- replace_except(any_tab[["TYPE"]], "_", "!!!", extensions)

    # Sort type alphabetically to make finding the right combination easier
    any_tab[["TYPE"]] <- reorder_combination(any_tab[["TYPE"]])

    # Get number of column header variables by getting the maximum number of + signs in the
    # column variables.
    max_plus <- max(sapply(gregexpr("\\+", columns), function(var_to_test) {
        if (var_to_test[1] == -1){
            1
        }
        else{
            length(var_to_test) + 1
        }}))

    index <- 1
    any_header <- NULL

    for (row_combi in rows){
        combined_col_df <- NULL

        # Get current single variables from row combination
        row_combi_vars <- unique(trimws(unlist(strsplit(row_combi, "\\+"))))

        for (col_combi in columns){
            # Get current single variables from column combination
            col_combi_vars <- unique(trimws(unlist(strsplit(col_combi, "\\+"))))

            current_combi <- c(row_combi_vars, col_combi_vars)

            # Sort combination alphabetically
            sorted_combi <- paste(sort(current_combi), collapse = "+")

            # Keep only necessary variables
            if (length(by) > 0){
                current_combi <- c("BY", "by_vars", current_combi)
            }

            combi_df <- any_tab |> keep(current_combi, "TYPE", value_vars)

            # Combine variables to match TYPE variable and subset data frame by
            # current combination
            subset_type <- paste(sorted_combi, collapse = "+")

            # Convert column variables to factor if necessary to retain value order
            # after sorting
            for (col_var in col_combi_vars){
                if (is.character(combi_df[[col_var]])){
                    # Extract the number of labels from variable
                    label_levels <- combi_df[[col_var]] |>
                        unlist(use.names = FALSE) |>
                        unique() |>
                        stats::na.omit()

                    # Convert variable to factor
                    combi_df[[col_var]] <- factor(
                        combi_df[[col_var]],
                        levels  = label_levels,
                        ordered = TRUE)
                }
            }

            # Sort to have the correct order after pivoting
            combi_df <- combi_df |>
                collapse::fsubset(TYPE == subset_type) |>
                data.table::setcolorder(current_combi, before = 1) |>
                data.table::setorderv(col_combi_vars, na.last = TRUE) |>
                data.table::setorderv(row_combi_vars)

            # Rename the row variables to something neutral. In the next steps the
            # data frame is puzzled back together, but the thing is, that the row
            # variables can be individually in any order in the result table. If
            # this step would be omitted the variables would all be in a fixed order
            # in the result table.
            new_row_names <- paste0("var", seq_along(row_combi_vars))

            if (length(by) == 0){
                names(combi_df)[seq_along(row_combi_vars)] <- new_row_names

                id_vars <- new_row_names
            }
            else{
                end_col <- length(row_combi_vars) + 2
                names(combi_df)[3:end_col] <- new_row_names

                id_vars <- new_row_names
                id_vars <- c("BY", "by_vars", new_row_names)
            }

            # If there is only one value provided and only one statistic selected
            # the pivoted variable names below only receive the column expressions
            # as names. The needed format is "value_stat_expression". In the mentioned
            # case this format is pre computed.
            if (length(values) == 1 && length(statistics) == 1){
                value_stat <- names(combi_df[ncol(combi_df)])

                combi_df[[col_combi_vars[1]]] <-
                    paste0(value_stat, "_", combi_df[[col_combi_vars[1]]])
            }
            # Fallback check if the user happens to input only wrong statistics but one.
            # In this case statistics is longer than 1 even though only one statistic was
            # computed. This leads to the condition above being omitted.
            else if (ncol(combi_df) - length(c(id_vars, col_combi_vars, "TYPE")) == 1){
                value_stat <- names(combi_df[ncol(combi_df)])

                combi_df[[col_combi_vars[1]]] <-
                    paste0(value_stat, "_", combi_df[[col_combi_vars[1]]])
            }

            # Pivot to wider format, which basically is the final format to print the data
            combi_df <- combi_df |>
                collapse::pivot(id     = id_vars,
                                names  = col_combi_vars,
                                values = value_vars,
                                how    = "wider")

            combi_df[id_vars] <- lapply(combi_df[id_vars], as.character)

            # Replace NA values with text so that they can be differentiated from empty
            # row header columns later on.
            row_var_cols <- seq_along(id_vars)
            combi_df[, row_var_cols][is.na(combi_df[, row_var_cols])] <- style[["na_symbol"]]

            # Sort interleaved
            if (tolower(order_by) == "interleaved"){
                combi_df <- combi_df |> order_interleaved(statistics)
            }

            # Join different column results together
            if (is.null(combined_col_df)){
                row_labels <- c()

                # Loop through all provided labels
                for (variable in row_combi_vars){
                    # Revert back underscores
                    variable <- gsub("!!!", "_", variable)

                    # If there are no labels specified add the variable names as labels
                    if (!variable %in% names(var_labels)){
                        row_labels <- c(row_labels, variable)
                        next
                    }

                    # Replace stat texts with provided labels
                    row_labels <- c(row_labels, var_labels[[variable]])
                }

                # Insert labels as new variable and sort it to the front
                if (!all(row_labels == "")){
                    combi_df[["row.label"]] <- paste(row_labels, collapse = " / ")
                }
                # In case only empty labels were provided, just add empty character
                # so that the column can be identified as empty later on.
                else{
                    combi_df[["row.label"]] <- " "
                }

                # Order row label to the front
                if (length(by) == 0){
                    combi_df <- combi_df |>
                        data.table::setcolorder("row.label", before = 1)
                }
                else{
                    combi_df <- combi_df |>
                        data.table::setcolorder(c("BY", "by_vars", "row.label"), before = 1)
                }

                new_row_names <- c("row.label", new_row_names)

                # First iteration
                combined_col_df <- combi_df
            }
            # Following iterations
            else{
                combi_df <- suppressMessages(combi_df |>
                    dropp(new_row_names, "row.label", "BY", "by_vars"))

                # Add row header variable
                new_row_names <- c("row.label", new_row_names)

                # Check for duplicate variable names. If any duplicate is found abort.
                duplicates <- intersect(names(combined_col_df), names(combi_df))

                if (length(duplicates) > 0) {
                    message(" X ERROR: Duplicate column names found: ", paste(duplicates, collapse = ", "), ".\n",
                            "          If you are working with original values, consider making them unique by using formats.")
                    return(invisible(NULL))
                }

                # cbind current data frame to the iterations before
                combined_col_df <- cbind(combined_col_df, combi_df)
            }

            # Build variable name table header for later use during formatting.
            # Column header only needs to be built once because other iterations
            # would just produce the same header.
            if (index == 1){
                # Get data frame variable names
                col_header_df <- combi_df[0, ]

                # Fill header rows with column variable names
                for (var_index in seq_along(col_combi_vars)){
                    col_header_df[var_index, ] <- gsub("!!!", "_", col_combi_vars[[var_index]])
                }

                # If the header has fewer rows than the maximum header rows, fill up the header
                # with additional empty rows.
                header_diff <- max_plus - nrow(col_header_df)
                if (header_diff > 0){
                    col_header_df <- rbind(col_header_df,
                                           stats::setNames(data.table::as.data.table(
                                               matrix("", header_diff, ncol(col_header_df))),
                                               names(col_header_df)))
                }

                # First iteration
                if (is.null(any_header)){
                    # Remove row header columns from column header data frame
                    row_header_var_count <- length(id_vars) + 1
                    any_header <- col_header_df[, -(1:row_header_var_count), drop = FALSE]
                }
                # Following iterations
                else{
                    # cbind current header data frame to the iterations before
                    any_header <- cbind(any_header, col_header_df)
                }
            }
        }

        # Store combined column data frame in a list for later rbind
        part_combi_list[[row_combi]] <- combined_col_df

        # Get table dimensions for later use during formatting.
        name <- paste0("header_end_", paste(row_combi_vars, collapse = "%%%"))
        last_number_of_rows           <- (nrow(combined_col_df) / by_division) + last_number_of_rows
        row_header_dimensions[[name]] <- last_number_of_rows

        name <- paste0("header_size_", paste(row_combi_vars, collapse = "%%%"))
        row_header_dimensions[[name]] <- length(row_combi_vars)

        index <- index + 1
    }

    # Put all computed data frames below each other to form a final result data frame
    any_tab <- data.table::rbindlist(part_combi_list,   fill = TRUE)

    # Reorder variables according to statistics
    if (tolower(order_by) == "stats" || tolower(order_by) == "values_stats"){
        any_tab    <- any_tab    |> setcolorder_by_pattern(statistics)
        any_header <- any_header |> setcolorder_by_pattern(statistics)
    }

    # Reorder variables by provided values
    if (tolower(order_by) == "values" || tolower(order_by) == "values_stats"){
        any_tab    <- any_tab    |> setcolorder_by_pattern(value_sort)
        any_header <- any_header |> setcolorder_by_pattern(value_sort)
    }

    # After binding together the data frames it can happen, that some of the new var
    # variables end up at the end of the data frame instead of the front. Therefor
    # these columns need to be ordered to the front for safety.
    ordered_cols <- grep("^var[0-9]+$", names(any_tab), value = TRUE)
    any_tab <- any_tab |> data.table::setcolorder(c("row.label", ordered_cols), before = 1)

    # If all row labels are empty, delete the row header column
    if (all(any_tab[["row.label"]] == " ")){
        any_tab <- any_tab |> dropp("row.label")
    }
    # If only some labels are empty the row label column is printed. Problem: the cell
    # merging omits empty cells. Therefor convert empty cells into cells with a space
    # to merge the empty parts correctly.
    else{
        any_tab["row.label" == ""] <- " "
    }

    # Get number of row header variables by getting the maximum number of + signs in the
    # row variables.
    max_plus <- max(sapply(gregexpr("\\+", rows), function(var_to_test) {
        if (var_to_test[1] == -1){
            1
        }
        else{
            length(var_to_test) + 1
        }}))

    length_row_header <- max_plus + 1
    if (length(by) != 0){
        length_row_header <- length_row_header + 2
    }

    # Mark empty row header cells
    row_var_cols <- 1:length_row_header
    any_tab[, row_var_cols][is.na(any_tab[, row_var_cols])] <- ""

    # In between clean up to get a better overview
    rm(combi_df, combined_col_df, part_combi_list, col_combi, col_combi_vars,
       combinations, current_combi, current_var, flag_interval, index,
       last_number_of_rows, name, new_row_names, row_combi, row_combi_vars, sorted_combi,
       subset_type, group_vars, length_row_header, col_header_df, header_diff,
       row_header_var_count)

    # Grab all information, which is necessary to format the workbook. This list will be
    # returned at the end and can be grabbed by the workbook combine function.
    meta <- mget(c("rows", "columns", "statistics",
                   "by", "titles", "footnotes", "var_labels", "stat_labels",
                   "box", "any_header", "row_header_dimensions",
                   "style", "na.rm"))

    # Prepare table format for output
    monitor_df <- monitor_df |> monitor_next("Excel prepare", "Format")
    message(" > Formatting tables.")

    # Setup styling in new workbook if no other is provided
    if (is.null(workbook)){
        workbook <- openxlsx2::wb_workbook() |>
            prepare_styles(style)
    }
    # Update style options in provided workbook
    else{
        workbook <- workbook |>
            prepare_styles(style)
    }

    monitor_df <- monitor_df |> monitor_end()

    # In case no by variables are provided
    if (length(by) == 0){
        wb_list <- format_any_excel(workbook, any_tab, rows, columns, statistics,
                                    by, titles, footnotes, var_labels, stat_labels,
                                    box, any_header,
                                    style, output, monitor_df = monitor_df)

        wb         <- wb_list[[1]]
        monitor_df <- wb_list[[2]]
    }
    # In case there are  by variables are provided
    else{
        wb_list <- format_any_by_excel(workbook, any_tab, rows, columns, statistics,
                                       by, titles, footnotes, var_labels, stat_labels,
                                       box, any_header,
                                       style, output, na.rm, monitor_df)

        wb         <- wb_list[[1]]
        monitor_df <- wb_list[[2]]
    }

    # Output formatted table into different formats
    if (print){
        monitor_df <- monitor_df |> monitor_next("Output tables", "Output tables")

        if (is.null(style[["file"]])){
            if(interactive()){
                wb$open()
            }
        }
        else{
            wb$save(file = style[["file"]], overwrite = TRUE)
        }
    }

    end_time <- round(difftime(Sys.time(), start_time, units = "secs"), 3)
    message("\n- - - 'any_table' execution time: ", end_time, " seconds\n")

    monitor_df <- monitor_df |> monitor_end()
    monitor_df |> monitor_plot(draw_plot = monitor)

    invisible(list("table"    = any_tab,
                   "workbook" = wb,
                   "meta"     = meta))
}


###############################################################################
# Format any table for excel output
###############################################################################
#' Format Any Table Output (Excel Based)
#'
#' @description
#' Format any table with the provided row and column variables. Statistics
#' can be anything available.
#'
#' @param wb An already created workbook to add more sheets to.
#' @param any_tab The data frame which contains the information for this cross
#' table.
#' @param rows The variable that appears in the table rows.
#' @param columns The variable that appears in the table columns.
#' @param statistics The user requested statistics.
#' @param by Separate the cross table output by the expressions of the provided variables.
#' @param titles Character vector of titles to display above the table.
#' @param footnotes Character vector of footnotes to display under the table.
#' @param var_labels List which contains column variable names and their respective labels.
#' @param stat_labels List which contains statistic names and their respective labels.
#' @param box The text that should appear in the upper left box of the table.
#' @param any_header The column header carrying the variable names.
#' @param style A list containing the styling elements.
#' @param output Determines whether to style the output or to just quickly paste
#' the data.
#' @param by_info Text which contains the information which variable with which
#' expression is computed at the moment.Used for computation with by variables.
#' @param index Index of the current variable expression. Used for computation with
#' by variables.
#' @param monitor_df Data frame which stores the monitoring values.
#'
#' @return
#' Returns a list containing a formatted Excel workbook as well as the monitoring
#' data frame.
#'
#' @noRd
format_any_excel <- function(wb,
                             any_tab,
                             rows,
                             columns,
                             statistics,
                             by,
                             titles,
                             footnotes,
                             var_labels,
                             stat_labels,
                             box,
                             any_header,
                             style,
                             output,
                             by_info = NULL,
                             index   = NULL,
                             monitor_df){
    monitor_df <- monitor_df |> monitor_start("Excel prepare", "Format")

    # Cut down percentage names to just "pct"
    names(any_tab) <- gsub("pct_group_", "pct group ", names(any_tab))
    names(any_tab) <- gsub("pct_total", "pct total", names(any_tab))
    names(any_tab) <- gsub("pct_value", "pct value", names(any_tab))

    # Replace underscore in the following stats to preserve them
    names(any_tab) <- gsub("sum_wgt", "weight_sum.wgt", names(any_tab))
    names(any_tab) <- gsub("freq_g0", "freq.g0", names(any_tab))

    # Build header from variable names
    multi_header <- build_multi_header(names(any_tab), any_header, var_labels, style)

    # Remove empty statistics rows, but keep multi_header with statistics because the information
    # is needed below for applying the correct number formats.
    column_header <- multi_header |> set_statistic_labels(stat_labels)
    column_header <- column_header[rowSums(column_header == "") != ncol(column_header), , drop = FALSE]

    stats_row <- multi_header[nrow(multi_header), , drop = FALSE]

    # Get table ranges
    any_ranges <- get_any_tab_ranges(any_tab, column_header, stats_row,
                                     titles, footnotes, style)

    # Add empty columns to the header for the top left box at the beginning
    blank_columns <- matrix("", nrow = nrow(column_header), ncol = any_ranges[["cat_col.width"]])
    column_header <- cbind(blank_columns, column_header)

    # Add box text
    if (box != ""){
        column_header[1, 1] <- box
    }
    # If no box text provided put in variable names of row headers
    else{
        column_header[1, 1] <- gsub("!!!", "_", paste(rows, collapse = "\n"))
    }

    # If function is called with by variables the sheet names have to be differentiated
    # and by info has to be written above the table.
    if (!is.null(by_info)){
        if (style[["sheet_name"]] == "by"){
            wb$add_worksheet(by,
                             grid_lines = style[["grid_lines"]])
        }
        else{
            wb$add_worksheet(paste0(style[["sheet_name"]], index),
                             grid_lines = style[["grid_lines"]])
        }
    }
    else{
        wb$add_worksheet(style[["sheet_name"]], grid_lines = style[["grid_lines"]])
    }

    # Rename the following stats back to match number formats in style element
    names(any_tab) <- gsub("sum.wgt", "sum_wgt", names(any_tab))
    names(any_tab) <- gsub("freq.g0", "freq_g0", names(any_tab))

    # Add table data and format according to style options
    monitor_df <- monitor_df |> monitor_next("Excel data", "Format")

    wb$add_data(x          = any_tab,
                start_col  = style[["start_column"]],
                start_row  = any_ranges[["table.row"]],
                col_names  = FALSE,
                na.strings = style[["na_symbol"]])


    # Add column header above table.
    wb$add_data(x          = column_header,
                start_col  = style[["start_column"]],
                start_row  = any_ranges[["header.row"]],
                col_names  = FALSE)

    # Format titles and footnotes if there are any
    monitor_df <- monitor_df |> monitor_next("Excel titles/footnotes", "Format")
    wb <- wb |>
        format_titles_foot_excel(titles, footnotes, any_ranges, style, output)

    # Only do the formatting when user specified it. With the excel_nostyle
    # option this whole part gets omitted to get a very quick unformatted
    # excel output.
    if (output == "excel"){
        # Merge top left box
        wb$merge_cells(dims = any_ranges[["box_range"]])

        # Merge column and row headers
        monitor_df <- monitor_df |> monitor_next("Excel format col headers", "Format")
        wb <- wb |>
            handle_col_header_merge(column_header[, -c(1:any_ranges[["cat_col.width"]]), drop = FALSE], any_ranges)

        monitor_df <- monitor_df |> monitor_next("Excel format row headers", "Format")
        wb <- wb |>
            handle_row_header_merge(any_tab[, 1:any_ranges[["cat_col.width"]]], any_ranges)

        # Style table
        monitor_df <- monitor_df |> monitor_next("Excel cell styles", "Format")
        wb <- wb |> handle_cell_styles(any_ranges, style)

        monitor_df <- monitor_df |> monitor_next("Excel number formats", "Format")

        # Set up inner table number formats
        col_index <- 1
        chunks    <- sub("p[0-9]+$", "p", rle(stats_row)$values)

        for (type in chunks){
            wb$add_cell_style(dims                = any_ranges[[paste0("any_col_ranges", col_index)]],
                              apply_number_format = TRUE,
                              num_fmt_id          = wb$styles_mgr$get_numfmt_id(paste0(type, "_numfmt")))

            col_index <- col_index + 1
        }

        # Freeze headers. If both options are true they have to be set together, otherwise one
        # option would overwrite the other.
        if (style[["freeze_col_header"]] && style[["freeze_row_header"]]){
            wb$freeze_pane(first_active_col = any_ranges[["header.column"]] + any_ranges[["cat_col.width"]],
                           first_active_row = any_ranges[["table.row"]])
        }
        else if (style[["freeze_col_header"]]){
            wb$freeze_pane(first_active_col = any_ranges[["header.column"]] + any_ranges[["cat_col.width"]])
        }
        else if (style[["freeze_row_header"]]){
            wb$freeze_pane(first_active_row = any_ranges[["table.row"]])
        }

        # Adjust table dimensions
        monitor_df <- monitor_df |> monitor_next("Excel widths/heights", "Format")

        wb <- wb |> handle_col_row_dimensions(any_ranges,
                                              ncol(any_tab) + (style[["start_column"]] - 1),
                                              nrow(any_tab) + nrow(multi_header) + (style[["start_row"]] - 1),
                                              style) |>
            handle_any_auto_dimensions(any_ranges, style) |>
            handle_header_table_dim(any_ranges, style)

        wb$add_ignore_error(dims = any_ranges[["header_range"]],  number_stored_as_text = TRUE)
        wb$add_ignore_error(dims = any_ranges[["cat_col_range"]], number_stored_as_text = TRUE)

        wb$add_named_region(dims = any_ranges[["whole_tab_range"]], name = "table", local_sheet = TRUE)
        wb$add_named_region(dims = any_ranges[["table_range"]],     name = "data",  local_sheet = TRUE)
    }

    monitor_df <- monitor_df |> monitor_end()

    # Return workbook
    list(wb, monitor_df)
}


#' Insert Statistic Labels
#'
#' @description
#' Give the statistics in the column header a custom label.
#'
#' @param column_header The complete column multi header.
#' @param stat_labels A list in which is specified which label should be printed for
#' which statistic instead of the statistic name.
#'
#' @return
#' Returns a multi layered column header with replaced statistic texts.
#'
#' @noRd
set_statistic_labels <- function(column_header, stat_labels){
    if (length(stat_labels) == 0){
        return(column_header)
    }

    # Loop through all provided labels
    for (i in seq_along(stat_labels)){
        name  <- names(stat_labels)[i]
        label <- stat_labels[[i]]

        # Omit label with missing variable name
        if (is.null(name) || name == ""){
            next
        }

        # Replace stat texts with provided labels
        column_header[nrow(column_header), ] <- gsub(name, label, column_header[nrow(column_header), ])
    }

    column_header
}


#' Insert Variable Labels
#'
#' @description
#' Give the variables in the column header a custom label.
#'
#' @param column_header The complete column multi header.
#' @param var_labels A list in which is specified which label should be printed for
#' which variable instead of the variable name.
#'
#' @return
#' Returns a multi layered column header with replaced variable texts.
#'
#' @noRd
set_col_variable_labels <- function(column_header, var_labels){
    if (length(var_labels) == 0){
        return(column_header)
    }

    # Loop through all provided labels
    for (i in seq_along(var_labels)){
        name  <- names(var_labels)[i]
        label <- var_labels[[i]]

        # Omit label with missing variable name
        if (is.null(name) || name == ""){
            next
        }

        # Replace variable texts with provided labels
        column_header[,] <- gsub(name, label, as.matrix(column_header))
    }

    # If header only consists of one row it gets converted to a vector when using gsub above.
    # In case this happens, convert the vector back into a one row data frame.
    if (!is.matrix(column_header)) {
        column_header <- data.table::as.data.table(as.list(column_header), stringsAsFactors = FALSE)
    }

    column_header
}


#' Build a Multi Header from Variable Names
#'
#' @description
#' Build a multi layered header from variable names of any_tab data frame. Analysis
#' variable name, stat and variable expressions are layerd into rows.
#'
#' @param var_names Variable names from any_tab.
#' @param any_header The column header carrying the variable names.
#' @param var_labels A list in which is specified which label should be printed for
#' which variable instead of the variable name.
#' @param style A list containing the styling elements.
#'
#' @return
#' Returns a data table with a multi layered header.
#'
#' @noRd
build_multi_header <- function(var_names,
                               any_header,
                               var_labels,
                               style){
    # Replace variable texts with custom labels
    any_header      <- any_header |> set_col_variable_labels(var_labels)
    col_var_headers <- data.table::as.data.table(any_header, stringsAsFactors = FALSE)

    # Split up variable name into different parts
    header_parts <- strsplit(var_names, "_")
    max_parts    <- max(lengths(header_parts))

    # Remove row header variable names
    for (part in seq_along(header_parts)){
        if (length(header_parts[[part]]) > 1){
            break
        }

        header_parts[[part]] <- ""
    }

    # Loop through header parts
    header_matrix <- sapply(header_parts, function(parts){
        # Prepare empty header with max size
        multi_header <- rep("", max_parts)

        if (parts[1] != ""){
            # Identify different parts
            var_name    <- parts[1]
            stat        <- strsplit(parts[2], " ")[[1]][1]
            expressions <- parts[-c(1, 2)]

            # Put multi header together: variable name, expressions, stat
            multi_header[1] <- var_name
            multi_header[seq(2, length(expressions) + 1)] <- expressions
            multi_header[max_parts] <- stat
        }

        multi_header
    })

    # Revert back underscores
    header_matrix   <- gsub("!!!", "_", header_matrix)

    # Replace variable texts with custom labels
    header_matrix <- header_matrix |>
        set_col_variable_labels(var_labels)

    # Make sure header_matrix is treated as a matrix, even though there can be only one row
    if (is.null(dim(header_matrix))) {
        dim(header_matrix) <- c(1, length(header_matrix))
    }

    # Drop completely empty rows
    if (nrow(header_matrix) > 1){
        header_matrix <- data.table::as.data.table(
            header_matrix[rowSums(header_matrix != "" & header_matrix != " ") > 0,
                          colSums(header_matrix != "" & header_matrix != " ") > 0, drop = FALSE])
    }
    else{
        header_matrix <- data.table::as.data.table(
            header_matrix[, colSums(header_matrix != "" & header_matrix != " "), drop = FALSE])
    }

    # Replace NA values
    header_matrix[is.na(header_matrix) | header_matrix == "NA"] <- style[["na_symbol"]]

    # Inject column variable headers into the multi header
    merge_headers(header_matrix, col_var_headers)
}


#' Inject Variable Header into Multi Header
#'
#' @description
#' The multi layered headers based on variable values and variable names are generated
#' separately and need to be put together afterwards.
#'
#' @param value_header Header based on variable values.
#' @param variable_header Header based on variable names which will be injected.
#'
#' @return
#' Returns a data table with a complete multi layered header.
#'
#' @noRd
merge_headers <- function(value_header, variable_header){
    row_list <- list()

    # Loop over the value header rows to inject the variable headers into them
    for (i in seq_len(nrow(value_header))){
        # Take a row from the value header
        row_list[[length(row_list) + 1]] <- value_header[i, , drop = FALSE]

        # If there are still rows in the variable header check whether to inject them
        if (i <= nrow(variable_header)){
            variable_row <- variable_header[i, , drop = FALSE]

            # Only inject variable row, if the row isn't completely empty
            if (!all(variable_row == "")) {
                row_list[[length(row_list) + 1]] <- variable_row
            }
        }
    }

    # Output as data table
    data.table::rbindlist(row_list, use.names=FALSE)
}

###############################################################################
# Format grouped by tables for excel output
###############################################################################
#' Format Any Table Output with by Variables (Excel Based)
#'
#' @description
#' Format any table with the provided row and column variables. Statistics
#' can be anything available.
#'
#' @param wb An already created workbook to add more sheets to.
#' @param any_tab The data frame which contains the information for this cross
#' table.
#' @param rows The variable that appears in the table rows.
#' @param columns The variable that appears in the table columns.
#' @param statistics The user requested statistics.
#' @param by Separate the cross table output by the expressions of the provided variables.
#' @param titles Character vector of titles to display above the table.
#' @param footnotes Character vector of footnotes to display under the table.
#' @param var_labels List which contains column variable names and their respective labels.
#' @param stat_labels List which contains statistic names and their respective labels.
#' @param box The text that should appear in the upper left box of the table.
#' @param any_header The column header carrying the variable names.
#' @param style A list containing the styling elements.
#' @param output Determines whether to style the output or to just quickly paste
#' the data.
#' @param na.rm If TRUE removes all NA values from the tabulation.
#' @param monitor_df Data frame which stores the monitoring values.
#'
#' @return
#' Returns a list containing a formatted Excel workbook as well as the monitoring
#' data frame.
#'
#' @noRd
format_any_by_excel <- function(wb,
                                any_tab,
                                rows,
                                columns,
                                statistics,
                                by,
                                titles,
                                footnotes,
                                var_labels,
                                stat_labels,
                                box,
                                any_header,
                                style,
                                output,
                                na.rm,
                                monitor_df){
    # Loop through all by variables
    index <- 1

    for (by_var in by){
        monitor_df <- monitor_df |> monitor_start(paste0("Excel prepare (", by_var, ")"), "Format by")

        # Select by variables one by one
        any_by <- any_tab |>
            collapse::fsubset(any_tab[["BY"]] == by_var)

        # Extract unique values
        if (anyNA(any_by[["by_vars"]])){
            values <- c(unique(stats::na.omit(any_by[["by_vars"]])), NA)
        }
        else{
            values <- unique(any_by[["by_vars"]])
        }

        monitor_df <- monitor_df |> monitor_end()

        # Loop through all unique values to generate frequency tables per expression
        for (value in values){
            # In case NAs are removed
            if (is.na(value) && na.rm){
                next
            }

            monitor_df <- monitor_df |> monitor_start(paste0("Excel (", by_var, "_", value, ")"), "Format by")
            message("   + ", paste0(by_var, " = ", value))

            # Put additional by info together with the information which by variable
            # and which value is currently filtered.
            by_info <- paste0(by_var, " = ", value)

            # Filter table by current by variable and value
            if (!is.na(value)){
                any_temp <- any_by |>
                    collapse::fsubset(any_by[["by_vars"]] == value)
            }
            else{
                any_temp <- any_by |>
                    collapse::fsubset(is.na(any_by[["by_vars"]]))
            }

            any_temp <- any_temp |> dropp("BY", "by_vars")

            # Add by info below the titles
            if (length(titles) > 0){
                titles_temp <- c(titles, "", by_info)
            }
            # Or on top if there are no titles
            else{
                titles_temp <- by_info
            }

            # Generate frequency tables as normal but base is filtered data frame
            wb_list <- format_any_excel(wb,
                                        any_temp,
                                        rows,
                                        columns,
                                        statistics,
                                        value,
                                        titles_temp,
                                        footnotes,
                                        var_labels,
                                        stat_labels,
                                        box,
                                        any_header,
                                        style,
                                        output,
                                        by_info,
                                        index,
                                        NULL)

            index <- index + 1

            wb <- wb_list[[1]]

            monitor_df <- monitor_df |> monitor_end()
        }

        monitor_df <- monitor_df |> monitor_end()
    }

    # Return workbook
    list(wb, monitor_df)
}


###############################################################################
# Combine multiple tables into one workbook
###############################################################################
#' Combine Multiple Tables Into One Workbook
#'
#' @description
#' Combines any number of tables created with [any_table()] into one workbook
#' and styles them according to their meta information.
#'
#' @param ... Provide any number of result lists output by [any_table()].
#' @param file If NULL, opens the output as temporary file. If a filename with path
#' is specified, saves the output to the specified path.
#' @param output The following output formats are available: excel and excel_nostyle.
#' @param print TRUE by default. If TRUE prints the output, if FALSE doesn't print anything. Can be used
#' if one only wants to catch the combined workbook.
#' @param monitor FALSE by default. If TRUE outputs two charts to visualize the functions time consumption.
#'
#' @return
#' A fully styled workbook containing the provided tables.
#'
#' @examples
#' # Example data frame
#' my_data <- dummy_data(1000)
#' my_data[["person"]] <- 1
#'
#' # Formats
#' age. <- discrete_format(
#'     "Total"          = 0:100,
#'     "under 18"       = 0:17,
#'     "18 to under 25" = 18:24,
#'     "25 to under 55" = 25:54,
#'     "55 to under 65" = 55:64,
#'     "65 and older"   = 65:100)
#'
#' sex. <- discrete_format(
#'     "Total"  = 1:2,
#'     "Male"   = 1,
#'     "Female" = 2)
#'
#' education. <- discrete_format(
#'     "Total"            = c("low", "middle", "high"),
#'     "low education"    = "low",
#'     "middle education" = "middle",
#'     "high education"   = "high")
#'
#' # Define style
#' my_style <- excel_output_style(column_widths = c(2, 15, 15, 15, 9))
#'
#' # Define titles and footnotes. If you want to add hyperlinks you can do so by
#' # adding "link:" followed by the hyperlink to the main text.
#' titles <- c("This is title number 1 link: https://cran.r-project.org/",
#'             "This is title number 2",
#'             "This is title number 3")
#' footnotes <- c("This is footnote number 1",
#'                "This is footnote number 2",
#'                "This is footnote number 3 link: https://cran.r-project.org/")
#'
#' # Catch the output and additionally use the options:
#' # pint = FALSE and output = "excel_nostyle".
#' # This skips the styling and output part, so that the function runs faster.
#' # The styling is done later on.
#' my_style <- my_style |> modify_output_style(sheet_name = "big table")
#'
#' tab1 <- my_data |> any_table(rows       = c("sex + age", "sex", "age"),
#'                              columns    = c("year", "education + year"),
#'                              values     = weight,
#'                              statistics = c("sum", "pct_group"),
#'                              pct_group  = c("sex", "age", "education", "year"),
#'                              formats    = list(sex = sex., age = age.,
#'                                                education = education.),
#'                              style      = my_style,
#'                              na.rm      = TRUE,
#'                              print      = FALSE,
#'                              output     = "excel_nostyle")
#'
#' my_style <- my_style |> modify_output_style(sheet_name = "age_sex")
#'
#' tab2 <- my_data |> any_table(rows       = c("age"),
#'                              columns    = c("sex"),
#'                              values     = weight,
#'                              statistics = c("sum"),
#'                              formats    = list(sex = sex., age = age.),
#'                              style      = my_style,
#'                              na.rm      = TRUE,
#'                              print      = FALSE,
#'                              output     = "excel_nostyle")
#'
#' my_style <- my_style |> modify_output_style(sheet_name = "edu_year")
#'
#' tab3 <- my_data |> any_table(rows       = c("education"),
#'                              columns    = c("year"),
#'                              values     = weight,
#'                              statistics = c("pct_group"),
#'                              formats    = list(education = education.),
#'                              style      = my_style,
#'                              na.rm      = TRUE,
#'                              print      = FALSE,
#'                              output     = "excel_nostyle")
#'
#' # Every of the above tabs is a list, which contains the data table, an unstyled
#' # workbook and the meta information needed for the individual styling. These
#' # tabs can be input into the following function, which reads the meta information,
#' # styles each table individually and combines them as separate sheets into a single workbook.
#' combine_into_workbook(tab1, tab2, tab3)
#'
#' @export
combine_into_workbook <- function(...,
                                  file    = NULL,
                                  output  = "excel",
                                  print   = TRUE,
                                  monitor = FALSE){
    monitor_df <- NULL |> monitor_start("Prepare combine", "Prepare")

    # Measure the time
    start_time <- Sys.time()

    tables    <- list(...)
    tab_names <- as.character(substitute(list(...)))[-1]
    wb        <- openxlsx2::wb_workbook()

    i <- 1

    message(" > Formatting tables")

    for (table in tables){
        monitor_df <- monitor_df |> monitor_next(paste0("Format ", tab_names[i]), "Format tables")
        message(paste0("   + ", tab_names[i]))

        meta <- table[["meta"]]

        wb <- wb |> prepare_styles(meta[["style"]])

        # In case no by variables are provided
        if (length(meta[["by"]]) == 0){
            wb_list <- suppressMessages(
                format_any_excel(wb, table[["table"]], meta[["rows"]], meta[["columns"]],
                                 meta[["statistics"]], meta[["by"]], meta[["titles"]],
                                 meta[["footnotes"]], meta[["var_labels"]], meta[["stat_labels"]],
                                 meta[["box"]], meta[["any_header"]],
                                 meta[["style"]], output, monitor_df = monitor_df))

            wb <- wb_list[[1]]
        }
        # In case there are  by variables are provided
        else{
            wb_list <- suppressMessages(
                format_any_by_excel(wb, table[["table"]], meta[["rows"]], meta[["columns"]],
                                    meta[["statistics"]], meta[["by"]], meta[["titles"]],
                                    meta[["footnotes"]], meta[["var_labels"]], meta[["stat_labels"]],
                                    meta[["box"]], meta[["any_header"]],
                                    meta[["style"]], output, meta[["na.rm"]], monitor_df))

            wb <- wb_list[[1]]
        }

        i <- i + 1
    }

    # Output formatted table into different formats
    if (print){
        monitor_df <- monitor_df |> monitor_next("Output tables", "Output tables")

        if (is.null(file)){
            wb$open()
        }
        else if (!dir.exists(dirname(file))){
            message(" ! WARNING: Directory '", dirname(file), "' does not exist. File won't be saved.")
            wb$open()
        }
        else{
            wb$save(file = file, overwrite = TRUE)
        }
    }

    end_time <- round(difftime(Sys.time(), start_time, units = "secs"), 3)
    message("\n- - - 'combine_into_workbook' execution time: ", end_time, " seconds\n")

    monitor_df <- monitor_df |> monitor_end()
    monitor_df |> monitor_plot(draw_plot = monitor)

    invisible(wb)
}

Try the qol package in your browser

Any scripts or data that you put into this service are public.

qol documentation built on Dec. 14, 2025, 1:06 a.m.