R/any_table.R
In qol: Powerful 'SAS' Inspired Concepts for more Efficient Bigger Outputs

Documented in any_table combine_into_workbook

#' Compute Any Possible Table
#'
#' @description
#' [any_table()] produces any possible descriptive table in 'Excel' format. Any number
#' of variables can be nested and crossed. The output is an individually styled
#' 'Excel' table, which also receives named ranges, making it easier to read the data back in.
#'
#' @param data_frame A data frame in which are the variables to tabulate.
#' @param rows A vector that provides single variables or variable combinations that
#' should appear in the table rows. To nest variables use the form:
#' "var1 + var2 + var3 + ...".
#' @param columns A vector that provides single variables or variable combinations that
#' should appear in the table rows. To nest variables use the form:
#' "var1 + var2 + var3 + ...".
#' @param values A vector containing all variables that should be summarised.
#' @param statistics Available functions:
#' - "sum"       -> Weighted and unweighted sum
#' - "sum_wgt"   -> Sum of all weights
#' - "freq"      -> Unweighted frequency
#' - "freq_g0"   -> Unweighted frequency of all values greater than zero
#' - "pct_group" -> Weighted and unweighted percentages within the respective group
#' - "pct_value" -> Weighted and unweighted percentages between value variables
#' - "pct_total" -> Weighted and unweighted percentages compared to the grand total
#' - "mean"      -> Weighted and unweighted mean
#' - "median"    -> Weighted and unweighted median
#' - "mode"      -> Weighted and unweighted mode
#' - "min"       -> Minimum
#' - "max"       -> Maximum
#' - "sd"        -> Weighted and unweighted standard deviation
#' - "variance"  -> Weighted and unweighted standard variance
#' - "first"     -> First value
#' - "last"      -> Last value
#' - "pn"        -> Weighted and unweighted percentiles (any p1, p2, p3, ... possible)
#' - "missing"   -> Missings generated by the value variables
#' @param pct_group If pct_group is specified in the statistics, this option is used to
#' determine which variable of the row and column variables should add up to 100 %.
#' Multiple variables can be specified in a vector to generate multiple group percentages.
#' @param pct_value If pct_value is specified in the statistics, you can pass a list here
#' which contains the information for a new variable name and between which of the value
#' variables percentages should be computed.
#' @param formats A list in which is specified which formats should be applied to which variables.
#' @param by Compute tables stratified by the expressions of the provided variables.
#' @param weight Put in a weight variable to compute weighted results.
#' @param order_by Determine how the columns will be ordered. "values" orders the results by the
#' order you provide the variables in values. "stats" orders them by the order under statistics.
#' "values_stats" is a combination of both. "columns" keeps the order as given in columns
#' and "interleaved" alternates the stats.
#' @param titles Specify one or more table titles.
#' @param footnotes Specify one or more table footnotes.
#' @param var_labels A list in which is specified which label should be printed for
#' which variable instead of the variable name.
#' @param stat_labels A list in which is specified which label should be printed for
#' which statistic instead of the statistic name.
#' @param box Provide a text for the upper left box of the table.
#' @param workbook Insert a previously created workbook to expand the sheets instead of
#' creating a new file.
#' @param style A list of options can be passed to control the appearance of excel outputs.
#' Styles can be created with [excel_output_style()].
#' @param output The following output formats are available: excel and excel_nostyle.
#' @param pre_summed FALSE by default. If TRUE this function works with pre summarised data. This can be
#' used, if not all the needed results can be calculated by [any_table()] and need to be prepared in
#' advance. Enabling you to still make use of the styled tabulation. For this to work, the values have to
#' carry the statistic extension (e.g. "_sum", "_pct") in the variable name.
#' @param na.rm FALSE by default. If TRUE removes all NA values from the variables.
#' @param print TRUE by default. If TRUE prints the output, if FALSE doesn't print anything. Can be used
#' if one only wants to catch the output data frame and workbook with meta information.
#' @param monitor FALSE by default. If TRUE, outputs two charts to visualize the functions time consumption.
#'
#' @details
#' [any_table()] is based on the 'SAS' procedure Proc Tabulate, which provides
#' efficient and readable ways to perform complex tabulations.
#'
#' With this function you can combine any number of variables in any possible way, all
#' at once. You just define which variables or variable combinations should end up in
#' the table rows and columns with a simple syntax. Listing variables in a vector like
#' c("var1", "var2", "var3",...) means to put variables below (in case of the
#' row variables) or besides (in case of the column variables) each other. Nesting variables
#' is as easy as putting a plus sign between them, e.g. c("var1 + var2", "var2" + "var3" + "var4", etc.).
#' And of course you can combine both versions.
#'
#' The real highlight is, that this function not only creates all the desired variable
#' combinations and exports them to an 'Excel' file, it prints a fully custom styled
#' table to a workbook. Setting up a custom, reusable style is as easy as setting up
#' options like: provide a color for the table header, set the font size for the row header,
#' should borders be drawn for the table cells yes/no, and so on. Merging doubled header texts,
#' happens automatically.
#'
#' With this function you basically can fully concentrate on designing a table, instead of
#' thinking hard about how to calculate where to put a border or to even manually prepare
#' a designed workbook.
#'
#' @return
#' Returns a list with the data table containing the results for the table, the formatted
#' 'Excel' workbook and the meta information needed for styling the final table.
#'
#' @seealso
#' Creating a custom table style: [excel_output_style()], [modify_output_style()],
#' [number_format_style()], [modify_number_formats()].
#'
#' Creating formats: [discrete_format()] and [interval_format()].
#'
#' Functions that can handle formats and styles: [frequencies()], [crosstabs()].
#'
#' Additional functions that can handle styles: [export_with_style()]
#'
#' Additional functions that can handle formats: [summarise_plus()], [recode()],
#' [recode_multi()]
#'
#' @examples
#' # Example data frame
#' my_data <- dummy_data(1000)
#' my_data[["person"]] <- 1
#'
#' # Formats
#' age. <- discrete_format(
#'     "Total"          = 0:100,
#'     "under 18"       = 0:17,
#'     "18 to under 25" = 18:24,
#'     "25 to under 55" = 25:54,
#'     "55 to under 65" = 55:64,
#'     "65 and older"   = 65:100)
#'
#' sex. <- discrete_format(
#'     "Total"  = 1:2,
#'     "Male"   = 1,
#'     "Female" = 2)
#'
#' education. <- discrete_format(
#'     "Total"            = c("low", "middle", "high"),
#'     "low education"    = "low",
#'     "middle education" = "middle",
#'     "high education"   = "high")
#'
#' # Define style
#' my_style <- excel_output_style(column_widths = c(2, 15, 15, 15, 9))
#'
#' # Define titles and footnotes. If you want to add hyperlinks you can do so by
#' # adding "link:" followed by the hyperlink to the main text.
#' titles <- c("This is title number 1 link: https://cran.r-project.org/",
#'             "This is title number 2",
#'             "This is title number 3")
#' footnotes <- c("This is footnote number 1",
#'                "This is footnote number 2",
#'                "This is footnote number 3 link: https://cran.r-project.org/")
#'
#' # Output complex tables with different percentages
#' my_data |> any_table(rows       = c("sex + age", "sex", "age"),
#'                      columns    = c("year", "education + year"),
#'                      values     = weight,
#'                      statistics = c("sum", "pct_group"),
#'                      pct_group  = c("sex", "age", "education", "year"),
#'                      formats    = list(sex = sex., age = age.,
#'                                        education = education.),
#'                      style      = my_style,
#'                      na.rm      = TRUE)
#'
#' # If you want to get a clearer vision of what the result table looks like, in terms
#' # of the row and column categories, you can write the code like this, to make out
#' # the variable crossings and see the order.
#' my_data |> any_table(columns = c(            "year", "education + year"),
#'                      rows    = c("sex + age",
#'                                  "sex",
#'                                  "age"),
#'                      values     = weight,
#'                      statistics = c("sum", "pct_group"),
#'                      pct_group  = c("sex", "age", "education", "year"),
#'                      formats    = list(sex = sex., age = age.,
#'                                        education = education.),
#'                      style      = my_style,
#'                      na.rm      = TRUE)
#'
#' # Percentages based on value variables instead of categories
#' my_data |> any_table(rows       = c("age + year"),
#'                      columns    = c("sex"),
#'                      values     = c(probability, person),
#'                      statistics = c("pct_value", "sum", "freq"),
#'                      pct_value  = list(rate = "probability / person"),
#'                      weight     = weight,
#'                      formats    = list(sex = sex., age = age.),
#'                      style      = my_style,
#'                      na.rm      = TRUE)
#'
#' # Customize the visual appearance by adding titles, footnotes and variable
#' # and statistic labels.
#' # Note: You don't have to describe every element. Sometimes a table can be more
#' # readable with less text. To completely remove a variable label just put in an
#' # empty text "" as label.
#' my_data |> any_table(rows        = c("age + year"),
#'                      columns     = c("sex"),
#'                      values      = weight,
#'                      statistics  = c("sum", "pct_group"),
#'                      order_by    = "interleaved",
#'                      formats     = list(sex = sex., age = age.),
#'                      titles      = titles,
#'                      footnotes   = footnotes,
#'                      var_labels  = list(age = "Age categories",
#'                                        sex = "", weight = ""),
#'                      stat_labels = list(pct = "%"),
#'                      style       = my_style,
#'                      na.rm       = TRUE)
#'
#' # With individual styling
#' my_style <- my_style |> modify_output_style(header_back_color = "0077B6",
#'                                             font              = "Times New Roman")
#'
#' my_data |> any_table(rows       = c("age + year"),
#'                      columns    = c("sex"),
#'                      values     = c(probability, person),
#'                      statistics = c("pct_value", "sum", "freq"),
#'                      pct_value  = list(rate = "probability / person"),
#'                      weight     = weight,
#'                      formats    = list(sex = sex., age = age.),
#'                      style      = my_style,
#'                      na.rm      = TRUE)
#'
#' # Pass on workbook to create more sheets in the same file
#' my_style <- my_style |> modify_output_style(sheet_name = "age_sex")
#'
#' result_list <- my_data |>
#'            any_table(rows       = c("age"),
#'                      columns    = c("sex"),
#'                      values     = weight,
#'                      statistics = c("sum"),
#'                      formats    = list(sex = sex., age = age.),
#'                      style      = my_style,
#'                      na.rm      = TRUE,
#'                      print      = FALSE)
#'
#' my_style <- my_style |> modify_output_style(sheet_name = "edu_year")
#'
#' my_data |> any_table(workbook   = result_list[["workbook"]],
#'                      rows       = c("education"),
#'                      columns    = c("year"),
#'                      values     = weight,
#'                      statistics = c("pct_group"),
#'                      formats    = list(education = education.),
#'                      style      = my_style,
#'                      na.rm      = TRUE)
#'
#' # Output multiple complex tables by expressions of another variable.
#' # If you specify the sheet name as "by" in the output style, the sheet
#' # names are named by the variable expressions of the by-variable. Otherwise
#' # the given sheet named gets a running number.
#' my_style <- my_style |> modify_output_style(sheet_name = "by")
#'
#' my_data |> any_table(rows       = c("sex", "age"),
#'                      columns    = c("education + year"),
#'                      values     = weight,
#'                      by         = state,
#'                      statistics = c("sum", "pct_group"),
#'                      pct_group  = c("education"),
#'                      formats    = list(sex = sex., age = age., state = state.,
#'                                        education = education.),
#'                      titles     = titles,
#'                      footnotes  = footnotes,
#'                      style      = my_style,
#'                      na.rm      = TRUE)
#'
#' @export
any_table <- function(data_frame,
                      rows,
                      columns        = "",
                      values,
                      statistics     = c("sum"),
                      pct_group      = c(),
                      pct_value      = list(),
                      formats        = list(),
                      by             = c(),
                      weight         = NULL,
                      order_by       = "stats",
                      titles         = c(),
                      footnotes      = c(),
                      var_labels     = list(),
                      stat_labels    = list(),
                      box            = "",
                      workbook       = NULL,
                      style          = excel_output_style(),
                      output         = "excel",
                      pre_summed     = FALSE,
                      na.rm          = FALSE,
                      print          = TRUE,
                      monitor        = FALSE){

    # Measure the time
    start_time <- Sys.time()

    monitor_df <- NULL |> monitor_start("Error handling", "Preparation")

    # First convert data frame to data table
    if (!data.table::is.data.table(data_frame)){
        data_frame <- data.table::as.data.table(data_frame)
    }

    # Evaluate formats early, otherwise apply formats can't evaluate them in unit
    # test situation.
    formats_list <- as.list(substitute(formats))[-1]

    formats <- stats::setNames(
        lapply(formats_list, function(expression){
            # Catch expression if passed as string
            if (is.character(expression)) {
                tryCatch(get(expression, envir = parent.frame()),
                         error = function(e) NULL)
            }
            # Catch expression if passed as symbol
            else{
                tryCatch(eval(expression, envir = parent.frame()),
                         error = function(e) NULL)
            }
        }),
        names(formats_list))

    # Look up variable names in format data frame to check whether there is an
    # interval or discrete format
    flag_interval <- FALSE

    for (current_var in names(formats)){
        format_df          <- formats[[current_var]]
        interval_variables <- c("from", "to")
        actual_variables   <- names(format_df)[1:2]

        if (identical(interval_variables, actual_variables)){
            flag_interval <- TRUE
            break
        }
    }

    ###########################################################################
    # Error handling
    ###########################################################################

    # Get row variables from provided combinations
    row_vars <- unique(trimws(unlist(strsplit(rows, "\\+"))))

    invalid_rows <- row_vars[!row_vars %in% names(data_frame)]
    row_vars     <- row_vars[row_vars %in% names(data_frame)]

    if (length(invalid_rows) > 0){
        message(" X ERROR: The provided row variable '", paste(invalid_rows, collapse = ", "), "' is not part of\n",
                "          the data frame. Any table will be aborted.")
        return(invisible(NULL))
    }

    if (length(rows) == 0){
        message(" X ERROR: No valid row variables provided. Any table will be aborted.")
        return(invisible(NULL))
    }

    if (length(rows) == 1){
        if (rows == ""){
            message(" X ERROR: No valid row variables provided. Any table will be aborted.")
            return(invisible(NULL))
        }
    }

    # Get row variables from provided combinations
    col_vars <- unique(trimws(unlist(strsplit(columns, "\\+"))))

    invalid_columns <- col_vars[!col_vars %in% names(data_frame)]
    col_vars        <- col_vars[col_vars %in% names(data_frame)]

    if (length(invalid_columns) > 0){
        message(" X ERROR: The provided column variable '", paste(invalid_columns, collapse = ", "), "' is not part of\n",
                "          the data frame. Any table will be aborted.")
        return(invisible(NULL))
    }

    if (length(columns) == 0){
        message(" X ERROR: No valid column variables provided. Any table will be aborted.")
        return(invisible(NULL))
    }

    if (length(columns) == 1){
        if (columns == ""){
            # Create empty pseudo variable to let the rest of the program run as normal
            data_frame[[".temp.var"]] <- 1
            columns    <- ".temp.var"
            col_vars   <- ".temp.var"
            var_labels <- c(var_labels, ".temp.var" = "")
            formats[[".temp.var"]] <- suppressMessages(discrete_format(" " = 1))
        }
    }

    # Convert to character vectors
    by_temp <- sub("^list\\(", "c(", gsub("\"", "", deparse(substitute(by), width.cutoff = 500L)))

    if (substr(by_temp, 1, 2) == "c("){
        by <- as.character(substitute(by))
    }
    else if (!is_error(by)){
        # Do nothing. In this case variables already contains the substituted variable names
        # while variables_temp is evaluated to the symbol passed into the function.
    }
    else{
        by <- by_temp
    }

    # Remove extra first character created with substitution
    by <- by[by != "c"]

    provided_by <- by
    invalid_by  <- by[!by %in% names(data_frame)]
    by          <- by[by %in% names(data_frame)]

    if (length(invalid_by) > 0){
        message(" ! WARNING: The provided by variable '", paste(invalid_by, collapse = ", "), "' is not part of\n",
                "            the data frame. This variable will be omitted during computation.")
    }

    variables  <- c(row_vars, col_vars)
    invalid_by <- by[by %in% variables]

    if (length(invalid_by) > 0){
        message(" X ERROR: The provided by variable '", paste(invalid_by, collapse = ", "), "' is also part of\n",
                "          the row and column variables which is not allowed. Any table will be aborted.")
        return(invisible(NULL))
    }

    if (length(by) == 1){
        if (by == ""){
            message(" X ERROR: No valid by variables provided. Any table will be aborted.")
            return(invisible(NULL))
        }
    }

    # Create temporary weight column if none is provided.
    # Also get the name of the weight variable as string.
    weight_temp <- sub("^list\\(", "c(", gsub("\"", "", deparse(substitute(weight), width.cutoff = 500L)))

    if (weight_temp == "NULL" || substr(weight_temp, 1, 2) == "c("){
        weight_var <- ".temp_weight"
        data_frame[[".temp_weight"]] <- 1

        if (substr(weight_temp, 1, 2) == "c("){
            message(" ! WARNING: Only one variable for weight allowed. Evaluations will be unweighted.")
        }
    }
    else if (!is_numeric(data_frame[[weight_temp]])){
        weight_var <- ".temp_weight"
        data_frame[[".temp_weight"]] <- 1

        message(" ! WARNING: Provided weight variable is not numeric. Unweighted results will be computed.")
    }
    else{
        weight_var <- weight_temp

        # NA values in weight lead to errors therefor convert them to 0
        if (anyNA(data_frame[[weight_temp]])){
            message(" ~ NOTE: Missing values in weight variable '", weight_temp, "' will be converted to 0.")
        }
        data_frame[[weight_temp]] <- data.table::fifelse(is.na(data_frame[[weight_temp]]), 0, data_frame[[weight_temp]])

        # @Hack: so I don't have to check if .temp_weight exists later on
        data_frame[[".temp_weight"]] <- 1
    }

    # Convert to character vectors
    values_temp <- sub("^list\\(", "c(", gsub("\"", "", deparse(substitute(values), width.cutoff = 500L)))

    if (substr(values_temp, 1, 2) == "c("){
        values <- as.character(substitute(values))
    }
    else if (!is_error(values)){
        # Do nothing. In this case values already contains the substituted variable names
        # while values_temp is evaluated to the symbol passed into the function.
    }
    else{
        values <- values_temp
    }

    # Remove extra first character created with substitution
    values <- values[values != "c"]

    # If no value variables are provided abort
    if (length(values) == 0){
        message(" X ERROR: No values provided.")
        return(invisible(NULL))
    }
    else if (length(values) == 1){
        if (values == ""){
            message(" X ERROR: No values provided.")
            return(invisible(NULL))
        }
    }

    # Make sure there is no class variable that is also a value variable.
    invalid_class <- values[values %in% c(row_vars, col_vars)]
    values        <- values[!values %in% c(row_vars, col_vars)]

    if (length(invalid_class) > 0){
        message(" x ERROR: The provided row/column variable '", paste(invalid_class, collapse = ", "), "' is also part of\n",
                "          the analysis variables. Any table will be aborted.")
        return(invisible(NULL))
    }

    invalid_values <- values[!values %in% names(data_frame)]
    values         <- values[values %in% names(data_frame)]

    if (length(invalid_values) > 0){
        message(" ! WARNING: The provided analysis variable '", paste(invalid_values, collapse = ", "), "' is not part of\n",
                "            the data frame. This variable will be omitted during computation.")
    }

    if (length(values) == 0){
        message(" X ERROR: No valid analysis variables provided. Any table will be aborted.")
        return(invisible(NULL))
    }

    provided_values <- values
    values          <- unique(values)

    if (length(provided_values) > length(values)){
        message(" ! WARNING: Some analysis variables are provided more than once. The doubled entries will be omitted.")
    }

    # Check for invalid output option
    if (!tolower(output) %in% c("excel", "excel_nostyle")){
        message(" ! WARNING: Output format '", output, "' not available. Using 'excel' instead.")

        output <- "excel"
    }
    else{
        output <- tolower(output)
    }

    # Correct nesting option if not set right
    if (!tolower(order_by) %in% c("values", "stats", "values_stats", "columns", "interleaved")){
        message(" ! WARNING: Order by option '", order_by, "' doesn't exist. Options 'values', 'stats', 'values_stats', 'columns'\n",
                "            and 'interleaved' are available. Order by will be set to 'stats'.")
        order_by <- "stats"
    }

    # Remove missing variables from pct_group
    if ("pct_group" %in% tolower(statistics)){
        invalid_pct <- pct_group[!pct_group %in% c(row_vars, col_vars)]

        if (length(invalid_pct) > 0){
            message(" ! WARNING: The variable '", paste(invalid_pct, collapse = ", "), "' provided as pct_group is not part of the row and column variables.\n",
                    "            The variable will be omitted.")

            pct_group <- pct_group[pct_group %in% c(row_vars, col_vars)]
        }

        rm(invalid_pct)
    }

    # In case of using a pre summarised data frame, underscores are only allowed if they carry
    # the statistics extension afterwards.
    if (pre_summed){
        if (!"TYPE" %in% names(data_frame)){
            message(" X ERROR: The pre summarised data needs the TYPE variable generated by summarise_plus. Any table will be aborted.")
            return(invisible(NULL))
        }

        # Check if value variables have statistics extension
        extensions <- c("_sum", "_pct_group", "_pct_total", "_pct_value", "_pct", "_freq_g0",
                        "_freq", "_mean", "_median", "_mode", "_min", "_max", "_first",
                        "_last", "_sum_wgt", "_p[0-9]+$", "_sd", "_variance", "_missing")
        pattern    <- paste0("(", paste(extensions, collapse = "|"), ")$")


        # If one of the value variables hasn't got any of the above extension abort
        if (!all(grepl(pattern, values))){
            message(" X ERROR: All value variables need to have the statistic extensions in their variable names.\n",
                    "          Execution will be aborted.")
            return(invisible(NULL))
        }

        # Set up options to make sure nothing errors below. Statistics is set to "mean"
        # because then summarise_plus takes a route where factor variables are kept in order.
        # With "sum" the order would messed up.
        statistics <- "mean"
        pct_group  <- ""
        pct_value  <- ""
        weight     <- NULL
        formats    <- list()

        rm(extensions, pattern)
    }

    rm(invalid_by, invalid_class, invalid_columns, invalid_rows, invalid_values,
       provided_by, provided_values, weight_temp, values_temp, by_temp)

    ###########################################################################
    # Any tabulation starts
    ###########################################################################

    monitor_df <- monitor_df |> monitor_next("Summary", "Summary")
    message("\n > Computing stats.")

    # Put together vector of grouping variables
    group_vars <- c(by, variables)

    # In case of group percentages order group variable to the last position
    if ("pct_group" %in% tolower(statistics) && length(pct_group) > 0){
        group_vars <- c(setdiff(group_vars, pct_group[1]), pct_group[1])
    }
    # If pct_group is specified in statistics but no group is provided set
    # last variable of group_vars as standard.
    else if ("pct_group" %in% tolower(statistics) && length(pct_group) == 0){
        pct_group <- group_vars[length(group_vars)]
    }

    # Put combinations in a single vector
    combinations <- as.vector(outer(rows, columns, paste, sep = " + "))

    # In case by variables are specified, add by to group variables and build
    # additional combinations
    if (length(by) > 0){
        combinations <- as.vector(outer(by, combinations, paste, sep = "+"))
    }

    # Compute statistics
    if (!pre_summed){
        any_tab <- suppressMessages(data_frame |>
              summarise_plus(class      = group_vars,
                             values     = values,
                             statistics = statistics,
                             formats    = formats,
                             weight     = weight_var,
                             nesting    = "all",
                             types      = combinations,
                             notes      = FALSE,
                             na.rm      = na.rm)) |>
            rename_pattern("pct_group", paste0("pct_group_", pct_group[1])) |>
            collapse::fsubset(TYPE != "total")
    }
    else{
        # With pre summarised data just take the input data frame
        any_tab <- data_frame
    }

    if (is.null(any_tab)){
        message(" X ERROR: Any table could not be computed. Execution will be aborted.")
        return(invisible(NULL))
    }

    # Underscores are not allowed in column variables because when constructing the
    # table header later, the underscore is the sign by which the column names are split.
    # Having additional underscores would mess up this part and lead to errors.
    if (any(grepl("_", unlist(any_tab[col_vars])))){
        message(" X ERROR: No underscores allowed in column variable values. Execution will be aborted.")
        return(invisible(NULL))
    }

    # In case of pre summarised data frame remove the temporary weight variable
    if (pre_summed){
        any_tab <- any_tab |> dropp(".temp_weight")
    }

    # In case of by variables fuse them into one
    if (length(by) > 0){
        any_tab <- any_tab |> fuse_variables("by_vars", by)

        # Normally extract the first variable from the TYPE combination into a
        # separate BY variable.
        if (!pre_summed){
            any_tab[["BY"]]   <- sub("\\+.*", "", any_tab[["TYPE"]])
            any_tab[["TYPE"]] <- sub("^[^+]*\\+\\s*", "", any_tab[["TYPE"]])
        }
        # In case of pre summarised data frame it is not clear, at which position
        # the by variable is in the TYPE.
        else{
            # Get TYPE variable as list of variable combinations
            type_split <- strsplit(any_tab[["TYPE"]], "\\+")

            # If by variable was found in a variable combination, put it into
            # a separate variable.
            any_tab[["BY"]] <- vapply(type_split, function(obs){
                by_found <- obs[obs %in% by]

                if (length(by_found) > 0){
                    by_found[1]
                }
                else{
                    ""
                }
            }, character(1L))

            # The TYPE variable itself is put back together as all variables which
            # are no by-Variables.
            any_tab[["TYPE"]] <- vapply(type_split, function(obs){
                paste(obs[!obs %in% by], collapse = "+")
            }, character(1L))
        }
    }

    # In case multiple group percentages should be computed, evaluate them in a loop
    # and join them to the main data frame.
    if (length(pct_group) > 1){
        monitor_df <- monitor_df |> monitor_next("Additional group pct", "Summary")

        for (group in seq_along(pct_group)){
            # First group was computed before so omit it here
            if (group == 1){
                next
            }

            # In case of group percentages order group variable to the last position
            group_vars <- c(setdiff(group_vars, pct_group[group]), pct_group[group])

            # Compute group percentages
            group_tab <- suppressMessages(data_frame |>
                summarise_plus(class      = group_vars,
                               values     = values,
                               statistics = "pct_group",
                               formats    = formats,
                               weight     = weight_var,
                               nesting    = "all",
                               types      = combinations,
                               notes      = FALSE,
                               na.rm      = na.rm)) |>
                rename_pattern("pct_group", paste0("pct_group_", pct_group[group])) |>
                drop_type_vars()

            merge_vars <- variables

            if (length(by) > 0){
                group_tab <- group_tab |> fuse_variables("by_vars", by)
                merge_vars <- c("by_vars", variables)
            }

            # Join percentages to the main data frame
            any_tab <- any_tab |>
                collapse::join(group_tab, on = merge_vars, how = "left",
                               verbose = FALSE, overid = 2)
        }
        rm(group, group_tab)
    }

    # In case only pct_value was selected as statistic
    if ("pct_value" %in% tolower(statistics) && length(statistics) == 1){
        message(" X ERROR: pct_value can only be computed in combination with statistic\n",
                "          'sum'. Since no other statistic is provided any table will be aborted.")
        return(invisible(NULL))
    }
    # In case percentages based on value variables should be computed
    else if ("pct_value" %in% tolower(statistics) && length(pct_value) > 0){
        for (i in seq_along(pct_value)){
            value <- pct_value[[i]]
            name  <- names(pct_value)[i]

            # Separate provided variables first
            eval_vars <- trimws(strsplit(value, split = "/")[[1]])

            # Compute percentages
            if (paste0(eval_vars[1], "_sum") %in% names(any_tab) &&
                paste0(eval_vars[2], "_sum") %in% names(any_tab)){
                any_tab[[paste0(name, "_pct_value")]] <-
                    any_tab[[paste0(eval_vars[1], "_sum")]] * 100 /
                    any_tab[[paste0(eval_vars[2], "_sum")]]
            }
            # Without sum percentages can't be computed
            else{
                message(" ! WARNING: pct_value can only be computed in combination with statistic\n",
                        "            'sum'. Percentages for '", name, "' could not be evaluated.")

                # Additional warnings for missing variables
                if (!eval_vars[1] %in% names(data_frame)){
                    message(" ! WARNING: Variable '", eval_vars[1], "' not found in the data frame.")
                }
                if (!eval_vars[2] %in% names(data_frame)){
                    message(" ! WARNING: Variable '", eval_vars[2], "' not found in the data frame.")
                }
            }
        }

        rm(eval_vars, i, value)
    }

    rm(data_frame)

    # Reorder variables according to statistics. This is necessary because pct_value
    # can only be computed after summarise_plus and therefor isn't ordered.
    any_tab <- any_tab |> setcolorder_by_pattern(statistics)

    # Get value variable names
    if (length(by) == 0){
        value_vars <- any_tab |> inverse(c(variables, "TYPE", "TYPE_NR", "DEPTH"))
    }
    else{
        value_vars <- any_tab |> inverse(c(variables, "TYPE", "TYPE_NR", "DEPTH", "by_vars", "BY"))
    }

    # Round values according to number formats
    for (var_name in names(any_tab)){
        if (!var_name %in% value_vars){
            next
        }

        # Get stat from variable name
        stat <- strsplit(var_name, split = "_")[[1]]
        stat <- stat[length(stat)]

        # Round values to the decimals places specified in the style
        if (tolower(stat) %in% c("sum", "freq", "freq", "mean", "median", "mode",
                                 "min", "max")){
            any_tab[[var_name]] <- round(any_tab[[var_name]],
                                         style[["number_formats"]][[paste0(stat, "_decimals")]])
        }
        else if(stat == "g0"){
            any_tab[[var_name]] <- round(any_tab[[var_name]],
                                         style[["number_formats"]][["freq_decimals"]])
        }
        else if(stat == "wgt"){
            any_tab[[var_name]] <- round(any_tab[[var_name]],
                                         style[["number_formats"]][["sum_decimals"]])
        }
        else if(grepl("^[0-9]$", substr(stat, 2, 2))){
            any_tab[[var_name]] <- round(any_tab[[var_name]],
                                         style[["number_formats"]][["p_decimals"]])
        }
        else{
            any_tab[[var_name]] <- round(any_tab[[var_name]],
                                         style[["number_formats"]][["pct_decimals"]])
        }
    }

    # Tear apart the the summarised data frame by row and column combinations and
    # transpose columns to generate user defined combination. At the end put all
    # the pieces back together to form a fully printable result data frame.
    monitor_df <- monitor_df |> monitor_next("Transform table", "Transform")

    part_combi_list       <- list()
    header_combi_list     <- list()
    col_header_df         <- list()
    row_header_dimensions <- list()
    last_number_of_rows   <- 0
    by_division           <- 1

    if (length(by) > 0){
        by_division <- length(unique(any_tab[["by_vars"]]))
    }

    # Underscores are not allowed in column and values variables, because when constructing the
    # table header later, the underscore is the symbol by which the variable names are split.
    # Having additional underscores would mess up this part and lead to errors. Therefor the
    # additional underscores will be temporarily replaced.
    extensions <- c("_sum", "_pct_group_", "_pct_total", "_pct_value", "_pct", "_freq_g0",
                    "_freq", "_mean", "_median", "_mode", "_min", "_max", "_first",
                    "_last", "_p1", "_p2", "_p3", "_p4", "_p5", "_p6", "_p7", "_p8", "_p9",
                    "sum_wgt", "_sd", "_variance", "_missing", "by_vars", "TYPE_NR")

    rows       <- replace_except(rows,       "_", "!!!", extensions)
    columns    <- replace_except(columns,    "_", "!!!", extensions)
    value_vars <- replace_except(value_vars, "_", "!!!", extensions)
    value_sort <- replace_except(values,     "_", "!!!", extensions)
    names(any_tab)    <- replace_except(names(any_tab),    "_", "!!!", extensions)
    any_tab[["TYPE"]] <- replace_except(any_tab[["TYPE"]], "_", "!!!", extensions)

    # Sort type alphabetically to make finding the right combination easier
    any_tab[["TYPE"]] <- reorder_combination(any_tab[["TYPE"]])

    # Get number of column header variables by getting the maximum number of + signs in the
    # column variables.
    max_plus <- max(sapply(gregexpr("\\+", columns), function(var_to_test) {
        if (var_to_test[1] == -1){
            1
        }
        else{
            length(var_to_test) + 1
        }}))

    index <- 1
    any_header <- NULL

    for (row_combi in rows){
        combined_col_df <- NULL

        # Get current single variables from row combination
        row_combi_vars <- unique(trimws(unlist(strsplit(row_combi, "\\+"))))

        for (col_combi in columns){
            # Get current single variables from column combination
            col_combi_vars <- unique(trimws(unlist(strsplit(col_combi, "\\+"))))

            current_combi <- c(row_combi_vars, col_combi_vars)

            # Sort combination alphabetically
            sorted_combi <- paste(sort(current_combi), collapse = "+")

            # Keep only necessary variables
            if (length(by) > 0){
                current_combi <- c("BY", "by_vars", current_combi)
            }

            combi_df <- any_tab |> keep(current_combi, "TYPE", value_vars)

            # Combine variables to match TYPE variable and subset data frame by
            # current combination
            subset_type <- paste(sorted_combi, collapse = "+")

            # Convert column variables to factor if necessary to retain value order
            # after sorting
            for (col_var in col_combi_vars){
                if (is.character(combi_df[[col_var]])){
                    # Extract the number of labels from variable
                    label_levels <- combi_df[[col_var]] |>
                        unlist(use.names = FALSE) |>
                        unique() |>
                        stats::na.omit()

                    # Convert variable to factor
                    combi_df[[col_var]] <- factor(
                        combi_df[[col_var]],
                        levels  = label_levels,
                        ordered = TRUE)
                }
            }

            # Sort to have the correct order after pivoting
            combi_df <- combi_df |>
                collapse::fsubset(TYPE == subset_type) |>
                data.table::setcolorder(current_combi, before = 1) |>
                data.table::setorderv(col_combi_vars, na.last = TRUE) |>
                data.table::setorderv(row_combi_vars)

            # Rename the row variables to something neutral. In the next steps the
            # data frame is puzzled back together, but the thing is, that the row
            # variables can be individually in any order in the result table. If
            # this step would be omitted the variables would all be in a fixed order
            # in the result table.
            new_row_names <- paste0("var", seq_along(row_combi_vars))

            if (length(by) == 0){
                names(combi_df)[seq_along(row_combi_vars)] <- new_row_names

                id_vars <- new_row_names
            }
            else{
                end_col <- length(row_combi_vars) + 2
                names(combi_df)[3:end_col] <- new_row_names

                id_vars <- new_row_names
                id_vars <- c("BY", "by_vars", new_row_names)
            }

            # If there is only one value provided and only one statistic selected
            # the pivoted variable names below only receive the column expressions
            # as names. The needed format is "value_stat_expression". In the mentioned
            # case this format is pre computed.
            if (length(values) == 1 && length(statistics) == 1){
                value_stat <- names(combi_df[ncol(combi_df)])

                combi_df[[col_combi_vars[1]]] <-
                    paste0(value_stat, "_", combi_df[[col_combi_vars[1]]])
            }
            # Fallback check if the user happens to input only wrong statistics but one.
            # In this case statistics is longer than 1 even though only one statistic was
            # computed. This leads to the condition above being omitted.
            else if (ncol(combi_df) - length(c(id_vars, col_combi_vars, "TYPE")) == 1){
                value_stat <- names(combi_df[ncol(combi_df)])

                combi_df[[col_combi_vars[1]]] <-
                    paste0(value_stat, "_", combi_df[[col_combi_vars[1]]])
            }

            # Pivot to wider format, which basically is the final format to print the data
            combi_df <- combi_df |>
                collapse::pivot(id     = id_vars,
                                names  = col_combi_vars,
                                values = value_vars,
                                how    = "wider")

            combi_df[id_vars] <- lapply(combi_df[id_vars], as.character)

            # Replace NA values with text so that they can be differentiated from empty
            # row header columns later on.
            row_var_cols <- seq_along(id_vars)
            combi_df[, row_var_cols][is.na(combi_df[, row_var_cols])] <- style[["na_symbol"]]

            # Sort interleaved
            if (tolower(order_by) == "interleaved"){
                combi_df <- combi_df |> order_interleaved(statistics)
            }

            # Join different column results together
            if (is.null(combined_col_df)){
                row_labels <- c()

                # Loop through all provided labels
                for (variable in row_combi_vars){
                    # Revert back underscores
                    variable <- gsub("!!!", "_", variable)

                    # If there are no labels specified add the variable names as labels
                    if (!variable %in% names(var_labels)){
                        row_labels <- c(row_labels, variable)
                        next
                    }

                    # Replace stat texts with provided labels
                    row_labels <- c(row_labels, var_labels[[variable]])
                }

                # Insert labels as new variable and sort it to the front
                if (!all(row_labels == "")){
                    combi_df[["row.label"]] <- paste(row_labels, collapse = " / ")
                }
                # In case only empty labels were provided, just add empty character
                # so that the column can be identified as empty later on.
                else{
                    combi_df[["row.label"]] <- " "
                }

                # Order row label to the front
                if (length(by) == 0){
                    combi_df <- combi_df |>
                        data.table::setcolorder("row.label", before = 1)
                }
                else{
                    combi_df <- combi_df |>
                        data.table::setcolorder(c("BY", "by_vars", "row.label"), before = 1)
                }

                new_row_names <- c("row.label", new_row_names)

                # First iteration
                combined_col_df <- combi_df
            }
            # Following iterations
            else{
                combi_df <- suppressMessages(combi_df |>
                    dropp(new_row_names, "row.label", "BY", "by_vars"))

                # Add row header variable
                new_row_names <- c("row.label", new_row_names)

                # Check for duplicate variable names. If any duplicate is found abort.
                duplicates <- intersect(names(combined_col_df), names(combi_df))

                if (length(duplicates) > 0) {
                    message(" X ERROR: Duplicate column names found: ", paste(duplicates, collapse = ", "), ".\n",
                            "          If you are working with original values, consider making them unique by using formats.")
                    return(invisible(NULL))
                }

                # cbind current data frame to the iterations before
                combined_col_df <- cbind(combined_col_df, combi_df)
            }

            # Build variable name table header for later use during formatting.
            # Column header only needs to be built once because other iterations
            # would just produce the same header.
            if (index == 1){
                # Get data frame variable names
                col_header_df <- combi_df[0, ]

                # Fill header rows with column variable names
                for (var_index in seq_along(col_combi_vars)){
                    col_header_df[var_index, ] <- gsub("!!!", "_", col_combi_vars[[var_index]])
                }

                # If the header has fewer rows than the maximum header rows, fill up the header
                # with additional empty rows.
                header_diff <- max_plus - nrow(col_header_df)
                if (header_diff > 0){
                    col_header_df <- rbind(col_header_df,
                                           stats::setNames(data.table::as.data.table(
                                               matrix("", header_diff, ncol(col_header_df))),
                                               names(col_header_df)))
                }

                # First iteration
                if (is.null(any_header)){
                    # Remove row header columns from column header data frame
                    row_header_var_count <- length(id_vars) + 1
                    any_header <- col_header_df[, -(1:row_header_var_count), drop = FALSE]
                }
                # Following iterations
                else{
                    # cbind current header data frame to the iterations before
                    any_header <- cbind(any_header, col_header_df)
                }
            }
        }

        # Store combined column data frame in a list for later rbind
        part_combi_list[[row_combi]] <- combined_col_df

        # Get table dimensions for later use during formatting.
        name <- paste0("header_end_", paste(row_combi_vars, collapse = "%%%"))
        last_number_of_rows           <- (nrow(combined_col_df) / by_division) + last_number_of_rows
        row_header_dimensions[[name]] <- last_number_of_rows

        name <- paste0("header_size_", paste(row_combi_vars, collapse = "%%%"))
        row_header_dimensions[[name]] <- length(row_combi_vars)

        index <- index + 1
    }

    # Put all computed data frames below each other to form a final result data frame
    any_tab <- data.table::rbindlist(part_combi_list,   fill = TRUE)

    # Reorder variables according to statistics
    if (tolower(order_by) == "stats" || tolower(order_by) == "values_stats"){
        any_tab    <- any_tab    |> setcolorder_by_pattern(statistics)
        any_header <- any_header |> setcolorder_by_pattern(statistics)
    }

    # Reorder variables by provided values
    if (tolower(order_by) == "values" || tolower(order_by) == "values_stats"){
        any_tab    <- any_tab    |> setcolorder_by_pattern(value_sort)
        any_header <- any_header |> setcolorder_by_pattern(value_sort)
    }

    # After binding together the data frames it can happen, that some of the new var
    # variables end up at the end of the data frame instead of the front. Therefor
    # these columns need to be ordered to the front for safety.
    ordered_cols <- grep("^var[0-9]+$", names(any_tab), value = TRUE)
    any_tab <- any_tab |> data.table::setcolorder(c("row.label", ordered_cols), before = 1)

    # If all row labels are empty, delete the row header column
    if (all(any_tab[["row.label"]] == " ")){
        any_tab <- any_tab |> dropp("row.label")
    }
    # If only some labels are empty the row label column is printed. Problem: the cell
    # merging omits empty cells. Therefor convert empty cells into cells with a space
    # to merge the empty parts correctly.
    else{
        any_tab["row.label" == ""] <- " "
    }

    # Get number of row header variables by getting the maximum number of + signs in the
    # row variables.
    max_plus <- max(sapply(gregexpr("\\+", rows), function(var_to_test) {
        if (var_to_test[1] == -1){
            1
        }
        else{
            length(var_to_test) + 1
        }}))

    length_row_header <- max_plus + 1
    if (length(by) != 0){
        length_row_header <- length_row_header + 2
    }

    # Mark empty row header cells
    row_var_cols <- 1:length_row_header
    any_tab[, row_var_cols][is.na(any_tab[, row_var_cols])] <- ""

    # In between clean up to get a better overview
    rm(combi_df, combined_col_df, part_combi_list, col_combi, col_combi_vars,
       combinations, current_combi, current_var, flag_interval, index,
       last_number_of_rows, name, new_row_names, row_combi, row_combi_vars, sorted_combi,
       subset_type, group_vars, length_row_header, col_header_df, header_diff,
       row_header_var_count)

    # Grab all information, which is necessary to format the workbook. This list will be
    # returned at the end and can be grabbed by the workbook combine function.
    meta <- mget(c("rows", "columns", "statistics",
                   "by", "titles", "footnotes", "var_labels", "stat_labels",
                   "box", "any_header", "row_header_dimensions",
                   "style", "na.rm"))

    # Prepare table format for output
    monitor_df <- monitor_df |> monitor_next("Excel prepare", "Format")
    message(" > Formatting tables.")

    # Setup styling in new workbook if no other is provided
    if (is.null(workbook)){
        workbook <- openxlsx2::wb_workbook() |>
            prepare_styles(style)
    }
    # Update style options in provided workbook
    else{
        workbook <- workbook |>
            prepare_styles(style)
    }

    monitor_df <- monitor_df |> monitor_end()

    # In case no by variables are provided
    if (length(by) == 0){
        wb_list <- format_any_excel(workbook, any_tab, rows, columns, statistics,
                                    by, titles, footnotes, var_labels, stat_labels,
                                    box, any_header,
                                    style, output, monitor_df = monitor_df)

        wb         <- wb_list[[1]]
        monitor_df <- wb_list[[2]]
    }
    # In case there are  by variables are provided
    else{
        wb_list <- format_any_by_excel(workbook, any_tab, rows, columns, statistics,
                                       by, titles, footnotes, var_labels, stat_labels,
                                       box, any_header,
                                       style, output, na.rm, monitor_df)

        wb         <- wb_list[[1]]
        monitor_df <- wb_list[[2]]
    }

    # Output formatted table into different formats
    if (print){
        monitor_df <- monitor_df |> monitor_next("Output tables", "Output tables")

        if (is.null(style[["file"]])){
            if(interactive()){
                wb$open()
            }
        }
        else{
            wb$save(file = style[["file"]], overwrite = TRUE)
        }
    }

    end_time <- round(difftime(Sys.time(), start_time, units = "secs"), 3)
    message("\n- - - 'any_table' execution time: ", end_time, " seconds\n")

    monitor_df <- monitor_df |> monitor_end()
    monitor_df |> monitor_plot(draw_plot = monitor)

    invisible(list("table"    = any_tab,
                   "workbook" = wb,
                   "meta"     = meta))
}


###############################################################################
# Format any table for excel output
###############################################################################
#' Format Any Table Output (Excel Based)
#'
#' @description
#' Format any table with the provided row and column variables. Statistics
#' can be anything available.
#'
#' @param wb An already created workbook to add more sheets to.
#' @param any_tab The data frame which contains the information for this cross
#' table.
#' @param rows The variable that appears in the table rows.
#' @param columns The variable that appears in the table columns.
#' @param statistics The user requested statistics.
#' @param by Separate the cross table output by the expressions of the provided variables.
#' @param titles Character vector of titles to display above the table.
#' @param footnotes Character vector of footnotes to display under the table.
#' @param var_labels List which contains column variable names and their respective labels.
#' @param stat_labels List which contains statistic names and their respective labels.
#' @param box The text that should appear in the upper left box of the table.
#' @param any_header The column header carrying the variable names.
#' @param style A list containing the styling elements.
#' @param output Determines whether to style the output or to just quickly paste
#' the data.
#' @param by_info Text which contains the information which variable with which
#' expression is computed at the moment.Used for computation with by variables.
#' @param index Index of the current variable expression. Used for computation with
#' by variables.
#' @param monitor_df Data frame which stores the monitoring values.
#'
#' @return
#' Returns a list containing a formatted Excel workbook as well as the monitoring
#' data frame.
#'
#' @noRd
format_any_excel <- function(wb,
                             any_tab,
                             rows,
                             columns,
                             statistics,
                             by,
                             titles,
                             footnotes,
                             var_labels,
                             stat_labels,
                             box,
                             any_header,
                             style,
                             output,
                             by_info = NULL,
                             index   = NULL,
                             monitor_df){
    monitor_df <- monitor_df |> monitor_start("Excel prepare", "Format")

    # Cut down percentage names to just "pct"
    names(any_tab) <- gsub("pct_group_", "pct group ", names(any_tab))
    names(any_tab) <- gsub("pct_total", "pct total", names(any_tab))
    names(any_tab) <- gsub("pct_value", "pct value", names(any_tab))

    # Replace underscore in the following stats to preserve them
    names(any_tab) <- gsub("sum_wgt", "weight_sum.wgt", names(any_tab))
    names(any_tab) <- gsub("freq_g0", "freq.g0", names(any_tab))

    # Build header from variable names
    multi_header <- build_multi_header(names(any_tab), any_header, var_labels, style)

    # Remove empty statistics rows, but keep multi_header with statistics because the information
    # is needed below for applying the correct number formats.
    column_header <- multi_header |> set_statistic_labels(stat_labels)
    column_header <- column_header[rowSums(column_header == "") != ncol(column_header), , drop = FALSE]

    stats_row <- multi_header[nrow(multi_header), , drop = FALSE]

    # Get table ranges
    any_ranges <- get_any_tab_ranges(any_tab, column_header, stats_row,
                                     titles, footnotes, style)

    # Add empty columns to the header for the top left box at the beginning
    blank_columns <- matrix("", nrow = nrow(column_header), ncol = any_ranges[["cat_col.width"]])
    column_header <- cbind(blank_columns, column_header)

    # Add box text
    if (box != ""){
        column_header[1, 1] <- box
    }
    # If no box text provided put in variable names of row headers
    else{
        column_header[1, 1] <- gsub("!!!", "_", paste(rows, collapse = "\n"))
    }

    # If function is called with by variables the sheet names have to be differentiated
    # and by info has to be written above the table.
    if (!is.null(by_info)){
        if (style[["sheet_name"]] == "by"){
            wb$add_worksheet(by,
                             grid_lines = style[["grid_lines"]])
        }
        else{
            wb$add_worksheet(paste0(style[["sheet_name"]], index),
                             grid_lines = style[["grid_lines"]])
        }
    }
    else{
        wb$add_worksheet(style[["sheet_name"]], grid_lines = style[["grid_lines"]])
    }

    # Rename the following stats back to match number formats in style element
    names(any_tab) <- gsub("sum.wgt", "sum_wgt", names(any_tab))
    names(any_tab) <- gsub("freq.g0", "freq_g0", names(any_tab))

    # Add table data and format according to style options
    monitor_df <- monitor_df |> monitor_next("Excel data", "Format")

    wb$add_data(x          = any_tab,
                start_col  = style[["start_column"]],
                start_row  = any_ranges[["table.row"]],
                col_names  = FALSE,
                na.strings = style[["na_symbol"]])


    # Add column header above table.
    wb$add_data(x          = column_header,
                start_col  = style[["start_column"]],
                start_row  = any_ranges[["header.row"]],
                col_names  = FALSE)

    # Format titles and footnotes if there are any
    monitor_df <- monitor_df |> monitor_next("Excel titles/footnotes", "Format")
    wb <- wb |>
        format_titles_foot_excel(titles, footnotes, any_ranges, style, output)

    # Only do the formatting when user specified it. With the excel_nostyle
    # option this whole part gets omitted to get a very quick unformatted
    # excel output.
    if (output == "excel"){
        # Merge top left box
        wb$merge_cells(dims = any_ranges[["box_range"]])

        # Merge column and row headers
        monitor_df <- monitor_df |> monitor_next("Excel format col headers", "Format")
        wb <- wb |>
            handle_col_header_merge(column_header[, -c(1:any_ranges[["cat_col.width"]]), drop = FALSE], any_ranges)

        monitor_df <- monitor_df |> monitor_next("Excel format row headers", "Format")
        wb <- wb |>
            handle_row_header_merge(any_tab[, 1:any_ranges[["cat_col.width"]]], any_ranges)

        # Style table
        monitor_df <- monitor_df |> monitor_next("Excel cell styles", "Format")
        wb <- wb |> handle_cell_styles(any_ranges, style)

        monitor_df <- monitor_df |> monitor_next("Excel number formats", "Format")

        # Set up inner table number formats
        col_index <- 1
        chunks    <- sub("p[0-9]+$", "p", rle(stats_row)$values)

        for (type in chunks){
            wb$add_cell_style(dims                = any_ranges[[paste0("any_col_ranges", col_index)]],
                              apply_number_format = TRUE,
                              num_fmt_id          = wb$styles_mgr$get_numfmt_id(paste0(type, "_numfmt")))

            col_index <- col_index + 1
        }

        # Freeze headers. If both options are true they have to be set together, otherwise one
        # option would overwrite the other.
        if (style[["freeze_col_header"]] && style[["freeze_row_header"]]){
            wb$freeze_pane(first_active_col = any_ranges[["header.column"]] + any_ranges[["cat_col.width"]],
                           first_active_row = any_ranges[["table.row"]])
        }
        else if (style[["freeze_col_header"]]){
            wb$freeze_pane(first_active_col = any_ranges[["header.column"]] + any_ranges[["cat_col.width"]])
        }
        else if (style[["freeze_row_header"]]){
            wb$freeze_pane(first_active_row = any_ranges[["table.row"]])
        }

        # Adjust table dimensions
        monitor_df <- monitor_df |> monitor_next("Excel widths/heights", "Format")

        wb <- wb |> handle_col_row_dimensions(any_ranges,
                                              ncol(any_tab) + (style[["start_column"]] - 1),
                                              nrow(any_tab) + nrow(multi_header) + (style[["start_row"]] - 1),
                                              style) |>
            handle_any_auto_dimensions(any_ranges, style) |>
            handle_header_table_dim(any_ranges, style)

        wb$add_ignore_error(dims = any_ranges[["header_range"]],  number_stored_as_text = TRUE)
        wb$add_ignore_error(dims = any_ranges[["cat_col_range"]], number_stored_as_text = TRUE)

        wb$add_named_region(dims = any_ranges[["whole_tab_range"]], name = "table", local_sheet = TRUE)
        wb$add_named_region(dims = any_ranges[["table_range"]],     name = "data",  local_sheet = TRUE)
    }

    monitor_df <- monitor_df |> monitor_end()

    # Return workbook
    list(wb, monitor_df)
}


#' Insert Statistic Labels
#'
#' @description
#' Give the statistics in the column header a custom label.
#'
#' @param column_header The complete column multi header.
#' @param stat_labels A list in which is specified which label should be printed for
#' which statistic instead of the statistic name.
#'
#' @return
#' Returns a multi layered column header with replaced statistic texts.
#'
#' @noRd
set_statistic_labels <- function(column_header, stat_labels){
    if (length(stat_labels) == 0){
        return(column_header)
    }

    # Loop through all provided labels
    for (i in seq_along(stat_labels)){
        name  <- names(stat_labels)[i]
        label <- stat_labels[[i]]

        # Omit label with missing variable name
        if (is.null(name) || name == ""){
            next
        }

        # Replace stat texts with provided labels
        column_header[nrow(column_header), ] <- gsub(name, label, column_header[nrow(column_header), ])
    }

    column_header
}


#' Insert Variable Labels
#'
#' @description
#' Give the variables in the column header a custom label.
#'
#' @param column_header The complete column multi header.
#' @param var_labels A list in which is specified which label should be printed for
#' which variable instead of the variable name.
#'
#' @return
#' Returns a multi layered column header with replaced variable texts.
#'
#' @noRd
set_col_variable_labels <- function(column_header, var_labels){
    if (length(var_labels) == 0){
        return(column_header)
    }

    # Loop through all provided labels
    for (i in seq_along(var_labels)){
        name  <- names(var_labels)[i]
        label <- var_labels[[i]]

        # Omit label with missing variable name
        if (is.null(name) || name == ""){
            next
        }

        # Replace variable texts with provided labels
        column_header[,] <- gsub(name, label, as.matrix(column_header))
    }

    # If header only consists of one row it gets converted to a vector when using gsub above.
    # In case this happens, convert the vector back into a one row data frame.
    if (!is.matrix(column_header)) {
        column_header <- data.table::as.data.table(as.list(column_header), stringsAsFactors = FALSE)
    }

    column_header
}


#' Build a Multi Header from Variable Names
#'
#' @description
#' Build a multi layered header from variable names of any_tab data frame. Analysis
#' variable name, stat and variable expressions are layerd into rows.
#'
#' @param var_names Variable names from any_tab.
#' @param any_header The column header carrying the variable names.
#' @param var_labels A list in which is specified which label should be printed for
#' which variable instead of the variable name.
#' @param style A list containing the styling elements.
#'
#' @return
#' Returns a data table with a multi layered header.
#'
#' @noRd
build_multi_header <- function(var_names,
                               any_header,
                               var_labels,
                               style){
    # Replace variable texts with custom labels
    any_header      <- any_header |> set_col_variable_labels(var_labels)
    col_var_headers <- data.table::as.data.table(any_header, stringsAsFactors = FALSE)

    # Split up variable name into different parts
    header_parts <- strsplit(var_names, "_")
    max_parts    <- max(lengths(header_parts))

    # Remove row header variable names
    for (part in seq_along(header_parts)){
        if (length(header_parts[[part]]) > 1){
            break
        }

        header_parts[[part]] <- ""
    }

    # Loop through header parts
    header_matrix <- sapply(header_parts, function(parts){
        # Prepare empty header with max size
        multi_header <- rep("", max_parts)

        if (parts[1] != ""){
            # Identify different parts
            var_name    <- parts[1]
            stat        <- strsplit(parts[2], " ")[[1]][1]
            expressions <- parts[-c(1, 2)]

            # Put multi header together: variable name, expressions, stat
            multi_header[1] <- var_name
            multi_header[seq(2, length(expressions) + 1)] <- expressions
            multi_header[max_parts] <- stat
        }

        multi_header
    })

    # Revert back underscores
    header_matrix   <- gsub("!!!", "_", header_matrix)

    # Replace variable texts with custom labels
    header_matrix <- header_matrix |>
        set_col_variable_labels(var_labels)

    # Make sure header_matrix is treated as a matrix, even though there can be only one row
    if (is.null(dim(header_matrix))) {
        dim(header_matrix) <- c(1, length(header_matrix))
    }

    # Drop completely empty rows
    if (nrow(header_matrix) > 1){
        header_matrix <- data.table::as.data.table(
            header_matrix[rowSums(header_matrix != "" & header_matrix != " ") > 0,
                          colSums(header_matrix != "" & header_matrix != " ") > 0, drop = FALSE])
    }
    else{
        header_matrix <- data.table::as.data.table(
            header_matrix[, colSums(header_matrix != "" & header_matrix != " "), drop = FALSE])
    }

    # Replace NA values
    header_matrix[is.na(header_matrix) | header_matrix == "NA"] <- style[["na_symbol"]]

    # Inject column variable headers into the multi header
    merge_headers(header_matrix, col_var_headers)
}


#' Inject Variable Header into Multi Header
#'
#' @description
#' The multi layered headers based on variable values and variable names are generated
#' separately and need to be put together afterwards.
#'
#' @param value_header Header based on variable values.
#' @param variable_header Header based on variable names which will be injected.
#'
#' @return
#' Returns a data table with a complete multi layered header.
#'
#' @noRd
merge_headers <- function(value_header, variable_header){
    row_list <- list()

    # Loop over the value header rows to inject the variable headers into them
    for (i in seq_len(nrow(value_header))){
        # Take a row from the value header
        row_list[[length(row_list) + 1]] <- value_header[i, , drop = FALSE]

        # If there are still rows in the variable header check whether to inject them
        if (i <= nrow(variable_header)){
            variable_row <- variable_header[i, , drop = FALSE]

            # Only inject variable row, if the row isn't completely empty
            if (!all(variable_row == "")) {
                row_list[[length(row_list) + 1]] <- variable_row
            }
        }
    }

    # Output as data table
    data.table::rbindlist(row_list, use.names=FALSE)
}

###############################################################################
# Format grouped by tables for excel output
###############################################################################
#' Format Any Table Output with by Variables (Excel Based)
#'
#' @description
#' Format any table with the provided row and column variables. Statistics
#' can be anything available.
#'
#' @param wb An already created workbook to add more sheets to.
#' @param any_tab The data frame which contains the information for this cross
#' table.
#' @param rows The variable that appears in the table rows.
#' @param columns The variable that appears in the table columns.
#' @param statistics The user requested statistics.
#' @param by Separate the cross table output by the expressions of the provided variables.
#' @param titles Character vector of titles to display above the table.
#' @param footnotes Character vector of footnotes to display under the table.
#' @param var_labels List which contains column variable names and their respective labels.
#' @param stat_labels List which contains statistic names and their respective labels.
#' @param box The text that should appear in the upper left box of the table.
#' @param any_header The column header carrying the variable names.
#' @param style A list containing the styling elements.
#' @param output Determines whether to style the output or to just quickly paste
#' the data.
#' @param na.rm If TRUE removes all NA values from the tabulation.
#' @param monitor_df Data frame which stores the monitoring values.
#'
#' @return
#' Returns a list containing a formatted Excel workbook as well as the monitoring
#' data frame.
#'
#' @noRd
format_any_by_excel <- function(wb,
                                any_tab,
                                rows,
                                columns,
                                statistics,
                                by,
                                titles,
                                footnotes,
                                var_labels,
                                stat_labels,
                                box,
                                any_header,
                                style,
                                output,
                                na.rm,
                                monitor_df){
    # Loop through all by variables
    index <- 1

    for (by_var in by){
        monitor_df <- monitor_df |> monitor_start(paste0("Excel prepare (", by_var, ")"), "Format by")

        # Select by variables one by one
        any_by <- any_tab |>
            collapse::fsubset(any_tab[["BY"]] == by_var)

        # Extract unique values
        if (anyNA(any_by[["by_vars"]])){
            values <- c(unique(stats::na.omit(any_by[["by_vars"]])), NA)
        }
        else{
            values <- unique(any_by[["by_vars"]])
        }

        monitor_df <- monitor_df |> monitor_end()

        # Loop through all unique values to generate frequency tables per expression
        for (value in values){
            # In case NAs are removed
            if (is.na(value) && na.rm){
                next
            }

            monitor_df <- monitor_df |> monitor_start(paste0("Excel (", by_var, "_", value, ")"), "Format by")
            message("   + ", paste0(by_var, " = ", value))

            # Put additional by info together with the information which by variable
            # and which value is currently filtered.
            by_info <- paste0(by_var, " = ", value)

            # Filter table by current by variable and value
            if (!is.na(value)){
                any_temp <- any_by |>
                    collapse::fsubset(any_by[["by_vars"]] == value)
            }
            else{
                any_temp <- any_by |>
                    collapse::fsubset(is.na(any_by[["by_vars"]]))
            }

            any_temp <- any_temp |> dropp("BY", "by_vars")

            # Add by info below the titles
            if (length(titles) > 0){
                titles_temp <- c(titles, "", by_info)
            }
            # Or on top if there are no titles
            else{
                titles_temp <- by_info
            }

            # Generate frequency tables as normal but base is filtered data frame
            wb_list <- format_any_excel(wb,
                                        any_temp,
                                        rows,
                                        columns,
                                        statistics,
                                        value,
                                        titles_temp,
                                        footnotes,
                                        var_labels,
                                        stat_labels,
                                        box,
                                        any_header,
                                        style,
                                        output,
                                        by_info,
                                        index,
                                        NULL)

            index <- index + 1

            wb <- wb_list[[1]]

            monitor_df <- monitor_df |> monitor_end()
        }

        monitor_df <- monitor_df |> monitor_end()
    }

    # Return workbook
    list(wb, monitor_df)
}


###############################################################################
# Combine multiple tables into one workbook
###############################################################################
#' Combine Multiple Tables Into One Workbook
#'
#' @description
#' Combines any number of tables created with [any_table()] into one workbook
#' and styles them according to their meta information.
#'
#' @param ... Provide any number of result lists output by [any_table()].
#' @param file If NULL, opens the output as temporary file. If a filename with path
#' is specified, saves the output to the specified path.
#' @param output The following output formats are available: excel and excel_nostyle.
#' @param print TRUE by default. If TRUE prints the output, if FALSE doesn't print anything. Can be used
#' if one only wants to catch the combined workbook.
#' @param monitor FALSE by default. If TRUE outputs two charts to visualize the functions time consumption.
#'
#' @return
#' A fully styled workbook containing the provided tables.
#'
#' @examples
#' # Example data frame
#' my_data <- dummy_data(1000)
#' my_data[["person"]] <- 1
#'
#' # Formats
#' age. <- discrete_format(
#'     "Total"          = 0:100,
#'     "under 18"       = 0:17,
#'     "18 to under 25" = 18:24,
#'     "25 to under 55" = 25:54,
#'     "55 to under 65" = 55:64,
#'     "65 and older"   = 65:100)
#'
#' sex. <- discrete_format(
#'     "Total"  = 1:2,
#'     "Male"   = 1,
#'     "Female" = 2)
#'
#' education. <- discrete_format(
#'     "Total"            = c("low", "middle", "high"),
#'     "low education"    = "low",
#'     "middle education" = "middle",
#'     "high education"   = "high")
#'
#' # Define style
#' my_style <- excel_output_style(column_widths = c(2, 15, 15, 15, 9))
#'
#' # Define titles and footnotes. If you want to add hyperlinks you can do so by
#' # adding "link:" followed by the hyperlink to the main text.
#' titles <- c("This is title number 1 link: https://cran.r-project.org/",
#'             "This is title number 2",
#'             "This is title number 3")
#' footnotes <- c("This is footnote number 1",
#'                "This is footnote number 2",
#'                "This is footnote number 3 link: https://cran.r-project.org/")
#'
#' # Catch the output and additionally use the options:
#' # pint = FALSE and output = "excel_nostyle".
#' # This skips the styling and output part, so that the function runs faster.
#' # The styling is done later on.
#' my_style <- my_style |> modify_output_style(sheet_name = "big table")
#'
#' tab1 <- my_data |> any_table(rows       = c("sex + age", "sex", "age"),
#'                              columns    = c("year", "education + year"),
#'                              values     = weight,
#'                              statistics = c("sum", "pct_group"),
#'                              pct_group  = c("sex", "age", "education", "year"),
#'                              formats    = list(sex = sex., age = age.,
#'                                                education = education.),
#'                              style      = my_style,
#'                              na.rm      = TRUE,
#'                              print      = FALSE,
#'                              output     = "excel_nostyle")
#'
#' my_style <- my_style |> modify_output_style(sheet_name = "age_sex")
#'
#' tab2 <- my_data |> any_table(rows       = c("age"),
#'                              columns    = c("sex"),
#'                              values     = weight,
#'                              statistics = c("sum"),
#'                              formats    = list(sex = sex., age = age.),
#'                              style      = my_style,
#'                              na.rm      = TRUE,
#'                              print      = FALSE,
#'                              output     = "excel_nostyle")
#'
#' my_style <- my_style |> modify_output_style(sheet_name = "edu_year")
#'
#' tab3 <- my_data |> any_table(rows       = c("education"),
#'                              columns    = c("year"),
#'                              values     = weight,
#'                              statistics = c("pct_group"),
#'                              formats    = list(education = education.),
#'                              style      = my_style,
#'                              na.rm      = TRUE,
#'                              print      = FALSE,
#'                              output     = "excel_nostyle")
#'
#' # Every of the above tabs is a list, which contains the data table, an unstyled
#' # workbook and the meta information needed for the individual styling. These
#' # tabs can be input into the following function, which reads the meta information,
#' # styles each table individually and combines them as separate sheets into a single workbook.
#' combine_into_workbook(tab1, tab2, tab3)
#'
#' @export
combine_into_workbook <- function(...,
                                  file    = NULL,
                                  output  = "excel",
                                  print   = TRUE,
                                  monitor = FALSE){
    monitor_df <- NULL |> monitor_start("Prepare combine", "Prepare")

    # Measure the time
    start_time <- Sys.time()

    tables    <- list(...)
    tab_names <- as.character(substitute(list(...)))[-1]
    wb        <- openxlsx2::wb_workbook()

    i <- 1

    message(" > Formatting tables")

    for (table in tables){
        monitor_df <- monitor_df |> monitor_next(paste0("Format ", tab_names[i]), "Format tables")
        message(paste0("   + ", tab_names[i]))

        meta <- table[["meta"]]

        wb <- wb |> prepare_styles(meta[["style"]])

        # In case no by variables are provided
        if (length(meta[["by"]]) == 0){
            wb_list <- suppressMessages(
                format_any_excel(wb, table[["table"]], meta[["rows"]], meta[["columns"]],
                                 meta[["statistics"]], meta[["by"]], meta[["titles"]],
                                 meta[["footnotes"]], meta[["var_labels"]], meta[["stat_labels"]],
                                 meta[["box"]], meta[["any_header"]],
                                 meta[["style"]], output, monitor_df = monitor_df))

            wb <- wb_list[[1]]
        }
        # In case there are  by variables are provided
        else{
            wb_list <- suppressMessages(
                format_any_by_excel(wb, table[["table"]], meta[["rows"]], meta[["columns"]],
                                    meta[["statistics"]], meta[["by"]], meta[["titles"]],
                                    meta[["footnotes"]], meta[["var_labels"]], meta[["stat_labels"]],
                                    meta[["box"]], meta[["any_header"]],
                                    meta[["style"]], output, meta[["na.rm"]], monitor_df))

            wb <- wb_list[[1]]
        }

        i <- i + 1
    }

    # Output formatted table into different formats
    if (print){
        monitor_df <- monitor_df |> monitor_next("Output tables", "Output tables")

        if (is.null(file)){
            wb$open()
        }
        else if (!dir.exists(dirname(file))){
            message(" ! WARNING: Directory '", dirname(file), "' does not exist. File won't be saved.")
            wb$open()
        }
        else{
            wb$save(file = file, overwrite = TRUE)
        }
    }

    end_time <- round(difftime(Sys.time(), start_time, units = "secs"), 3)
    message("\n- - - 'combine_into_workbook' execution time: ", end_time, " seconds\n")

    monitor_df <- monitor_df |> monitor_end()
    monitor_df |> monitor_plot(draw_plot = monitor)

    invisible(wb)
}
Any scripts or data that you put into this service are public.
qol documentation built on Dec. 14, 2025, 1:06 a.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
qol
Powerful 'SAS' Inspired Concepts for more Efficient Bigger Outputs

R/any_table.R
In qol: Powerful 'SAS' Inspired Concepts for more Efficient Bigger Outputs

Defines functions combine_into_workbook format_any_by_excel merge_headers build_multi_header set_col_variable_labels set_statistic_labels format_any_excel any_table

Documented in any_table combine_into_workbook

Try the qol package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

qol Powerful 'SAS' Inspired Concepts for more Efficient Bigger Outputs

R/any_table.R In qol: Powerful 'SAS' Inspired Concepts for more Efficient Bigger Outputs

Defines functions combine_into_workbook format_any_by_excel merge_headers build_multi_header set_col_variable_labels set_statistic_labels format_any_excel any_table

Documented in any_table combine_into_workbook

Try the qol package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

qol
Powerful 'SAS' Inspired Concepts for more Efficient Bigger Outputs

R/any_table.R
In qol: Powerful 'SAS' Inspired Concepts for more Efficient Bigger Outputs