R/crosstabs.R
In qol: Powerful 'SAS' Inspired Concepts for more Efficient Bigger Outputs

Documented in crosstabs

#' Display Cross Table of Two Variables
#'
#' @description
#' [crosstabs()] produces a cross table of two variables. Statistics can be
#' weighted sums, unweighted frequencies or different percentages.
#'
#' @param data_frame A data frame in which are the variables to tabulate.
#' @param rows The variable that appears in the table rows.
#' @param columns The variable that appears in the table columns.
#' @param statistics The user requested statistics.Available functions:
#' - "sum"        -> Weighted and unweighted sum
#' - "freq"       -> Unweighted frequency
#' - "pct_row"    -> Weighted and unweighted row percentages
#' - "pct_column" -> Weighted and unweighted column percentages
#' - "pct_total"  -> Weighted and unweighted percentages compared to the grand total
#' @param formats A list in which is specified which formats should be applied to which variables.
#' @param by Compute tables stratified by the expressions of the provided variables.
#' @param weight Put in a weight variable to compute weighted results.
#' @param titles Specify one or more table titles.
#' @param footnotes Specify one or more table footnotes.
#' @param style A list of options can be passed to control the appearance of excel outputs.
#' Styles can be created with [excel_output_style()].
#' @param output The following output formats are available: console (default), text,
#' excel and excel_nostyle.
#' @param na.rm FALSE by default. If TRUE removes all NA values from the variables.
#' @param print TRUE by default. If TRUE prints the output, if FALSE doesn't print anything. Can be used
#' if one only wants to catch the output data frame.
#' @param monitor FALSE by default. If TRUE, outputs two charts to visualize the functions time consumption.
#'
#' @details
#' [crosstabs()] is based on the 'SAS' procedure Proc Freq, which provides
#' efficient and readable ways to perform cross tabulations.
#'
#' To create a cross table you only need to provide a variable for the rows and columns.
#' Nothing special about this. The real power comes into play, when you output your
#' tables as a fully styled 'Excel' workbook. Setting up a custom, reusable style is
#' as easy as setting up options like: provide a color for the table header, set the
#' font size for the row header, should borders be drawn for the table cells yes/no,
#' and so on.
#'
#' You can not only output sums and frequencies, but also different percentages, all
#' set up in separate, evenly designed tables. For just a quick overview, rather than
#' fully designed tables, you can also just output the tables in ASCII style format.
#'
#' @return
#' Returns a data tables containing the results for the cross table.
#'
#' @seealso
#' Creating a custom table style: [excel_output_style()], [modify_output_style()],
#' [number_format_style()], [modify_number_formats()].
#'
#' Creating formats: [discrete_format()] and [interval_format()].
#'
#' Functions that can handle formats and styles: [frequencies()], [any_table()].
#'
#' Additional functions that can handle styles: [export_with_style()]
#'
#' Additional functions that can handle formats: [summarise_plus()], [recode()],
#' [recode_multi()]
#'
#' @examples
#' # Example data frame
#' my_data <- dummy_data(1000)
#'
#' # Define titles and footnotes. If you want to add hyperlinks you can do so by
#' # adding "link:" followed by the hyperlink to the main text.
#' titles <- c("This is title number 1 link: https://cran.r-project.org/",
#'             "This is title number 2",
#'             "This is title number 3")
#' footnotes <- c("This is footnote number 1",
#'                "This is footnote number 2",
#'                "This is footnote number 3 link: https://cran.r-project.org/")
#'
#' # Output cross tables
#' my_data |> crosstabs(age, sex)
#' my_data |> crosstabs(age, sex,
#'                      weight = "weight")
#'
#' # Also works with characters
#' my_data |> crosstabs("age", "sex")
#' my_data |> crosstabs("age", "sex",
#'                      weight = "weight")
#'
#' # Applying formats and titles
#' age. <- discrete_format(
#'     "Total"          = 0:100,
#'     "under 18"       = 0:17,
#'     "18 to under 25" = 18:24,
#'     "25 to under 55" = 25:54,
#'     "55 to under 65" = 55:64,
#'     "65 and older"   = 65:100)
#'
#' sex. <- discrete_format(
#'     "Total"  = 1:2,
#'     "Male"   = 1,
#'     "Female" = 2)
#'
#' my_data |> crosstabs(age, sex,
#'                      formats   = list(age = age., sex = sex.),
#'                      titles    = titles,
#'                      footnotes = footnotes)
#'
#' # Split cross table by expressions of another variable
#' my_data |> crosstabs(age, sex, by = education)
#'
#' # Compute different stats
#' my_data |> crosstabs(age, sex,
#'                      statistics = c("sum", "freq", "pct_row", "pct_column", "pct_total"))
#'
#' # Get a list with two data tables for further usage
#' result_list <- my_data |> crosstabs(age, sex,
#'                                     formats = list(age = age., sex = sex.))
#'
#' # Output in text file
#' my_data |> crosstabs(age, sex, output = "text")
#'
#' # Output to Excel
#' my_data |> crosstabs(age, sex, output = "excel")
#'
#' # With individual styling
#' my_style <- excel_output_style(header_back_color = "0077B6",
#'                                font              = "Times New Roman")
#'
#' my_data |> crosstabs(age, sex, output = "excel", style = my_style)
#'
#' @export
crosstabs <- function(data_frame,
                      rows,
                      columns,
                      statistics = c("sum"),
                      formats    = c(),
                      by         = c(),
                      weight     = NULL,
                      titles     = c(),
                      footnotes  = c(),
                      style      = excel_output_style(),
                      output     = "console",
                      na.rm      = FALSE,
                      print      = TRUE,
                      monitor    = FALSE){

    # Measure the time
    start_time <- Sys.time()

    monitor_df <- NULL |> monitor_start("Error handling", "Preparation")

    # First convert data frame to data table
    if (!data.table::is.data.table(data_frame)){
        data_frame <- data.table::as.data.table(data_frame)
    }

    # Evaluate formats early, otherwise apply formats can't evaluate them in unit
    # test situation.
    formats_list <- as.list(substitute(formats))[-1]

    formats <- stats::setNames(
        lapply(formats_list, function(expression){
            # Catch expression if passed as string
            if (is.character(expression)) {
                tryCatch(get(expression, envir = parent.frame()),
                         error = function(e) NULL)
            }
            # Catch expression if passed as symbol
            else{
                tryCatch(eval(expression, envir = parent.frame()),
                         error = function(e) NULL)
            }
        }),
        names(formats_list))

    # Look up variable names in format data frame to check whether there is an
    # interval or discrete format
    flag_interval <- FALSE

    for (current_var in names(formats)){
        format_df          <- formats[[current_var]]
        interval_variables <- c("from", "to")
        actual_variables   <- names(format_df)[1:2]

        if (identical(interval_variables, actual_variables)){
            flag_interval <- TRUE
            break
        }
    }

    ###########################################################################
    # Error handling
    ###########################################################################

    # Convert to character vectors
    rows_temp <- sub("^list\\(", "c(", gsub("\"", "", deparse(substitute(rows))))

    if (substr(rows_temp, 1, 2) == "c("){
        rows <- as.character(substitute(rows))
    }
    else if (!is_error(rows)){
        # Do nothing. In this case variables already contains the substituted variable names
        # while rows_temp is evaluated to the symbol passed into the function.
    }
    else{
        rows <- rows_temp
    }

    # Remove extra first character created with substitution
    rows <- rows[rows != "c"]

    provided_rows <- rows
    invalid_rows  <- rows[!rows %in% names(data_frame)]
    rows          <- rows[rows %in% names(data_frame)]

    if (length(invalid_rows) > 0){
        message(" ! WARNING: The provided row variable '", paste(invalid_rows, collapse = ", "), "' is not part of\n",
                "            the data frame. This variable will be omitted during computation.")
    }

    if (length(rows) == 0){
        message(" X ERROR: No valid row variable provided. Crosstabs will be aborted.")
        return(invisible(NULL))
    }

    if (length(rows) > 1){
        message(" X ERROR: Only one variable for rows allowed. Crosstabs will be aborted.")
        return(invisible(NULL))
    }

    # Convert to character vectors
    columns_temp <- sub("^list\\(", "c(", gsub("\"", "", deparse(substitute(columns))))

    if (substr(columns_temp, 1, 2) == "c("){
        columns <- as.character(substitute(columns))
    }
    else if (!is_error(columns)){
        # Do nothing. In this case variables already contains the substituted variable names
        # while columns_temp is evaluated to the symbol passed into the function.
    }
    else{
        columns <- columns_temp
    }

    # Remove extra first character created with substitution
    columns <- columns[columns != "c"]

    provided_columns <- columns
    invalid_columns  <- columns[!columns %in% names(data_frame)]
    columns          <- columns[columns %in% names(data_frame)]

    if (length(invalid_columns) > 0){
        message(" ! WARNING: The provided column variable '", paste(invalid_columns, collapse = ", "), "' is not part of\n",
                "            the data frame. This variable will be omitted during computation.")
    }

    if (length(columns) == 0){
        message(" X ERROR: No valid column variable provided. Crosstabs will be aborted.")
        return(invisible(NULL))
    }

    if (length(columns) > 1){
        message(" X ERROR: Only one variable for columns allowed. Crosstabs will be aborted.")
        return(invisible(NULL))
    }

    # Convert to character vectors
    by_temp <- sub("^list\\(", "c(", gsub("\"", "", deparse(substitute(by))))

    if (substr(by_temp, 1, 2) == "c("){
        by <- as.character(substitute(by))
    }
    else if (!is_error(by)){
        # Do nothing. In this case variables already contains the substituted variable names
        # while variables_temp is evaluated to the symbol passed into the function.
    }
    else{
        by <- by_temp
    }

    # Remove extra first character created with substitution
    by <- by[by != "c"]

    provided_by <- by
    invalid_by  <- by[!by %in% names(data_frame)]
    by          <- by[by %in% names(data_frame)]

    if (length(invalid_by) > 0){
        message(" ! WARNING: The provided by variable '", paste(invalid_by, collapse = ", "), "' is not part of\n",
                "            the data frame. This variable will be omitted during computation.")
    }

    variables  <- c(rows, columns)
    invalid_by <- by[by %in% variables]

    if (length(invalid_by) > 0){
        message(" X ERROR: The provided by variable '", paste(invalid_by, collapse = ", "), "' is also part of\n",
                "          the row and column variables which is not allowed. Crosstabs will be aborted.")
        return(invisible(NULL))
    }

    # Create temporary weight column if none is provided.
    # Also get the name of the weight variable as string.
    weight_temp <- sub("^list\\(", "c(", gsub("\"", "", deparse(substitute(weight))))

    if (weight_temp == "NULL" || substr(weight_temp, 1, 2) == "c("){
        weight_var <- ".temp_weight"
        data_frame[[".temp_weight"]] <- 1

        if (substr(weight_temp, 1, 2) == "c("){
            message(" ! WARNING: Only one variable for weight allowed. Evaluations will be unweighted.")
        }
    }
    else if (!is_numeric(data_frame[[weight_temp]])){
        weight_var <- ".temp_weight"
        data_frame[[".temp_weight"]] <- 1

        message(" ! WARNING: Provided weight variable is not numeric. Unweighted results will be computed.")
    }
    else{
        weight_var <- weight_temp

        # NA values in weight lead to errors therefor convert them to 0
        if (anyNA(data_frame[[weight_temp]])){
            message(" ~ NOTE: Missing values in weight variable '", weight_temp, "' will be converted to 0.")
        }
        data_frame[[weight_temp]] <- data.table::fifelse(is.na(data_frame[[weight_temp]]), 0, data_frame[[weight_temp]])

        # @Hack: so I don't have to check if .temp_weight exists later on
        data_frame[[".temp_weight"]] <- 1
    }

    # Check for invalid output option
    if (!tolower(output) %in% c("console", "text", "excel", "excel_nostyle")){
        message(" ! WARNING: Output format '", output, "' not available. Using 'console' instead.")

        output <- "console"
    }
    else{
        output <- tolower(output)
    }

    # Get the intersection of the requested statistics to make sure
    # only valid actions are passed down
    statistics <- statistics[unique(statistics) %in%
                             c("sum", "freq", "pct_row", "pct_column", "pct_total")]

    # If no valid statistics selected, default to sum
    if (length(statistics) == 0){
        statistics <- "sum"
    }

    ###########################################################################
    # Cross tabulation starts
    ###########################################################################

    monitor_df <- monitor_df |> monitor_next("Summary", "Summary")
    message("\n > Computing stats.")

    # Summarise results for a crosstabs output
    data_frame[["var"]] <- 1

    # Put together vector of grouping variables
    group_vars <- c(by, rows, columns)

    # In case no by variables are specified
    if (length(by) == 0){
        cross_tab <- suppressMessages(data_frame |>
             summarise_plus(class      = group_vars,
                            values     = "var",
                            statistics = c("sum", "pct_group", "pct_total", "freq"),
                            formats    = formats,
                            weight     = weight_var,
                            nesting    = "deepest",
                            notes      = FALSE,
                            na.rm      = na.rm)) |>
            drop_type_vars() |>
            collapse::frename(var_pct_group = var_pct_row)

        # Get the expressions of the column variable to use them later as column names
        column_names <- unique(as.character(cross_tab[[columns]]))

        # Add unique ids to column variables so that the pivoted variable names receive
        # running numbers instead of the values or labels.
        run_nr_df        <- data.table::data.table(column_names)
        names(run_nr_df) <- columns
        run_nr_df        <- run_nr_df |> running_number()

        cross_tab <- cross_tab |>
            collapse::join(run_nr_df,
                           on      = columns,
                           how     = "left",
                           verbose = FALSE)

        # Pivot to wider format, which basically is the final format to print the data
        cross_tab <- cross_tab |>
            collapse::pivot(id     = rows,
                            names  = "run_nr",
                            values = c("var_sum", "var_pct_row", "var_pct_total", "var_freq"),
                            how    = "wider")
    }
    # In case by variables are specified
    else{
        combinations <- as.vector(outer(by, paste0(rows, "+", columns), paste, sep = "+"))

        cross_tab <- suppressMessages(data_frame |>
            summarise_plus(class      = group_vars,
                           values     = "var",
                           statistics = c("sum", "pct_group", "pct_total", "freq"),
                           formats    = formats,
                           weight     = weight_var,
                           nesting    = "all",
                           types      = combinations,
                           notes      = FALSE,
                           na.rm      = na.rm)) |>
            collapse::frename(var_pct_group = var_pct_row) |>
            fuse_variables("by_vars", by)

        cross_tab[["BY"]] <- sub("\\+.*", "", cross_tab[["TYPE"]])

        # Get the expressions of the column variable to use them later as column names
        column_names <- unique(as.character(cross_tab[[columns]]))

        # Add unique ids to column variables so that the pivoted variable names receive
        # running numbers instead of the values or labels.
        run_nr_df        <- data.table::data.table(column_names)
        names(run_nr_df) <- columns
        run_nr_df        <- run_nr_df |> running_number()

        cross_tab <- cross_tab |>
            collapse::join(run_nr_df,
                           on      = columns,
                           how     = "left",
                           verbose = FALSE)

        # Pivot to wider format, which basically is the final format to print the data
        cross_tab <- cross_tab |>
            collapse::pivot(id     = c("BY", "by_vars", rows),
                            names  = "run_nr",
                            values = c("var_sum", "var_pct_row", "var_pct_total", "var_freq"),
                            how    = "wider")
    }

    if (is.null(cross_tab)){
        message(" X ERROR: Crosstab could not be computed.")
        return(invisible(NULL))
    }

    # Prepare table format for output
    message(" > Formatting tables.")

    if (output %in% c("console", "text")){
        monitor_df <- monitor_df |> monitor_next("Format tables", "Format tables")

        # In case no by variables are provided
        if (length(by) == 0){
            complete_table  <- format_cross_text(cross_tab, rows, columns, column_names,
                                                 statistics, formats, by, titles, footnotes)
        }
        # In case there are  by variables are provided
        else{
            complete_table <- format_cross_by_text(cross_tab, rows, columns, column_names,
                                                   statistics, formats, by, titles, footnotes, na.rm)
        }
    }
    else if (output == "excel" || output == "excel_nostyle"){
        wb <- openxlsx2::wb_workbook() |>
            prepare_styles(style)

        monitor_df <- monitor_df |> monitor_end()

        # In case no by variables are provided
        if (length(by) == 0){
            wb_list <- format_cross_excel(wb, cross_tab, rows, columns, column_names,
                                          statistics, formats, by, titles, footnotes, style, output,
                                          monitor_df = monitor_df)

            wb         <- wb_list[[1]]
            monitor_df <- wb_list[[2]]
        }
        # In case there are  by variables are provided
        else{
            wb_list <- format_cross_by_excel(cross_tab, rows, columns, column_names,
                                             statistics, formats, by, titles, footnotes,
                                             style, output, na.rm, wb, monitor_df)

            wb         <- wb_list[[1]]
            monitor_df <- wb_list[[2]]
        }
    }

    # Output formatted table into different formats
    if (print){
        monitor_df <- monitor_df |> monitor_next("Output tables", "Output tables")

        if (output %in% c("console")){
            cat(paste(complete_table, collapse = "\n"), "\n\n")
        }
        else if (output == "text"){
            temp_file <- tempfile(fileext = ".txt")
            writeLines(complete_table, temp_file)
            file.show(temp_file)
        }
        else if (output == "excel" || output == "excel_nostyle"){
            if (is.null(style[["file"]])){
                if(interactive()){
                    wb$open()
                }
            }
            else{
                wb$save(file = style[["file"]], overwrite = TRUE)
            }
        }
    }

    end_time <- round(difftime(Sys.time(), start_time, units = "secs"), 3)
    message("\n- - - 'crosstabs' execution time: ", end_time, " seconds\n")

    monitor_df <- monitor_df |> monitor_end()
    monitor_df |> monitor_plot(draw_plot = monitor)

    invisible(cross_tab)
}

###############################################################################
# Format cross table for console output
###############################################################################
#' Format Cross Table Output (Text Based)
#'
#' @description
#' Format a cross table with the provided row and column variables. Statistics
#' can be weighted sums, unweighted frequencies or different percentages.
#'
#' @param cross_tab The data frame which contains the information for this cross
#' table.
#' @param rows The variable that appears in the table rows.
#' @param columns The variable that appears in the table columns.
#' @param column_names The real column names that appear in the table.
#' @param statistics The user requested statistics.
#' @param formats A list in which is specified which formats should be applied to
#' which variable.
#' @param by Separate the frequency output by the expressions of the provided variables.
#' @param titles Character vector of titles to display above the table.
#' @param footnotes Character vector of footnotes to display under the table.
#'
#' @return
#' Returns a character vector with all formatted rows for the tables.
#'
#' @noRd
format_cross_text <- function(cross_tab,
                              rows,
                              columns,
                              column_names,
                              statistics,
                              formats,
                              by,
                              titles,
                              footnotes){
    complete_tabs <- c()

    # Set equal number of maximum column width
    sum_tab          <- setup_print_table(cross_tab, rows, "sum")
    longest_value    <- nchar(format(sum_tab[nrow(sum_tab), "total"], scientific = FALSE))
    max_column_width <- max(8, longest_value + 7)

    # Compute individual cross tables for each operation
    for (stat in statistics){
        # Setup cross table for formatting
        var_tab <- setup_print_table(cross_tab, rows, stat)

        # Get the maximum width of the provided variable names to determine the width
        # of the first column.
        first_column_width <- nchar(paste0(rows, " / ", columns))

        first_column_width <- max(first_column_width,
                                  nchar(stats::na.omit(as.character(var_tab[[rows]]))))

        # Use the max column width to evenly spread the columns.
        # Wrap the column headers according to this width.
        if (!is_multilabel(formats, columns)){
            multi_header   <- wrap_header(c(column_names, "total"), max_column_width)

            formatted_cols <- vector("list", length(c(column_names, "total")))
        }
        # In case of multilabels don't add total column
        else{
            multi_header   <- wrap_header(column_names, max_column_width)

            var_tab        <- var_tab |> dropp("total")
            formatted_cols <- vector("list", length(column_names))
        }

        # Format the first column and connect it with the rest of the header
        first_column <- c(paste0(format(paste0(rows, " / ", columns),
                                        width   = first_column_width,
                                        justify = "left"), " |"),
                          rep(paste0(format("",
                                            width   = first_column_width,
                                            justify = "left"), " |"),
                              nrow(multi_header) - 1))
        multi_header <- cbind(first_column, multi_header)

        # Visually separate total column if there is one
        total_col <- FALSE

        if (trimws(multi_header[1, ncol(multi_header)]) == "total"){
            total_col <- TRUE

            multi_header[1, ncol(multi_header)] <-
                paste0(" | ", multi_header[1, ncol(multi_header)])
        }

        # Convert header matrix into character vector with single lines
        complete_header <- apply(multi_header, 1, paste, collapse = " ")
        complete_header <- c(complete_header, strrep("-", nchar(complete_header[1])))

        # If row multilabel, remove automatically computed total observation
        if (is_multilabel(formats, rows)){
            var_tab <- var_tab |>
                collapse::fsubset(var_tab[[rows]] != "total")

            if (stat == "pct_row"){
                if (length(by) == 0){
                    message(" ~ NOTE: The format for variable '", rows, "' is a multilabel.\n",
                            "         In this case row percentages aren't computed properly.")
                }
            }
        }

        # Format table column by column. Basically concatenate first column text and
        # corresponding values together while keeping the individual maximum column
        # width in mind.
        first_column <- paste0(format(var_tab[[rows]],
                                      width   = first_column_width,
                                      justify = "left"), " | ")

        # Loop through columns and format each number in a readable way
        for (column in seq_len(length(formatted_cols))){
            # Percentages and Unweighted freq
            if (stat %in% c("pct_row", "pct_column", "pct_total")){
                formatted_cols[[column]] <- format_number(var_tab[column + 1],
                                                          width = max_column_width,
                                                          stat  = names(var_tab[column + 1]))
            }
            # Unweighted frequencies
            else if (stat %in% c("freq") ||
                     identical(as.numeric(cross_tab[["var_sum_1"]]),
                               as.numeric(cross_tab[["var_freq_1"]]))){
                formatted_cols[[column]] <- format_number(var_tab[column + 1],
                                                          width    = max_column_width,
                                                          stat     = names(var_tab[column + 1]),
                                                          decimals = 0)
            }
            # Weighted sums
            else if (stat %in% c("sum")){
                formatted_cols[[column]] <- format_number(var_tab[column + 1],
                                                          width    = max_column_width,
                                                          stat     = names(var_tab[column + 1]),
                                                          decimals = 3)
            }

            # Visually separate total column if there is one
            if (column == length(formatted_cols) && total_col){
                formatted_cols[[column]] <- paste0(" | ", formatted_cols[[column]])
            }
        }

        # Convert list to matrix
        formatted_matrix <- do.call(cbind, formatted_cols)

        # Convert matrix to single formatted rows
        all_rows <- paste0(first_column,
                           apply(formatted_matrix, 1, paste, collapse = " "))

        # If no multilabel formats are applied separate total row at the bottom from the rest
        if (!is_multilabel(formats, rows)){
            rows_temp <- length(all_rows)

            all_rows <- c(all_rows[1:(rows_temp - 1)],
                          strrep("-", nchar(complete_header[1])),
                          all_rows[rows_temp])
        }

        # Insert titles and footnotes
        complete_table <- c(complete_header, all_rows)

        if (length(titles) > 0){
            complete_table <- c(titles, "", complete_table)
        }
        if (length(footnotes) > 0){
            complete_table <- c(complete_table, "____________", footnotes)
        }

        # Output formatted result
        complete_tabs <- c(complete_tabs, "\n", complete_table, "\n")
    }

    # Output formatted result
    complete_tabs
}


#' Wrap Text Based Table Header
#'
#' @description
#' The width for each column gets a maximum value. After that maximum value the
#' header wraps to the next line to ensure the table doesn't gets too wide.
#'
#' @param cross_tab The data frame which contains the information for this cross
#' table.
#' @param rows The variables that appear in the table rows.
#' @param stat The currently computed stat.
#'
#' @return
#' Returns a data table trimmed down to the necessary variables and observations.
#'
#' @noRd
wrap_header <- function(column_names, width = 10){
    # Split each column name into single letters
    letter_list <- strsplit(column_names, "")

    wrapped <- lapply(letter_list, function(word){
        # How many chunks are needed for this label
        n_chunks <- ceiling(length(word) / (width - 1))

        # Extract substrings of max 'width' characters
        parts <- character(n_chunks)

        for (i in seq_len(n_chunks)){
            start    <- (i - 1) * (width - 1) + 1
            end      <- min(i * (width - 1), length(word))
            parts[i] <- paste(word[start:end], collapse = "")
        }

        format(parts, width = width, justify = "right")
    })

    # Find the maximum number of lines across all labels
    max_lines <- max(sapply(wrapped, length))

    # Pad shorter labels with empty strings so they all align
    wrapped_padded <- lapply(wrapped, function(parts){
        c(parts, rep(format("", width = width, justify = "right"), max_lines - length(parts)))
    })

    # Return as a matrix: rows = lines, cols = labels
    do.call(cbind, wrapped_padded)
}


#' Trim Cross Table to the Needed Parts
#'
#' @description
#' Cross tables are computed by stats. Here the stats get reduced the the current
#' stat in the loop. Column percentages get newly computed because they aren't
#' computed beforehand.
#'
#' @param cross_tab The data frame which contains the information for this cross
#' table.
#' @param rows The variables that appear in the table rows.
#' @param stat The currently computed stat.
#'
#' @return
#' Returns a data table trimmed down to the necessary variables and observations.
#'
#' @noRd
setup_print_table <- function(cross_tab, rows, stat){
    # Compute column percentages
    if (stat == "pct_column"){
        var_tab <- cross_tab |>
            collapse::get_vars(c(rows, "sum"), regex = TRUE)

        # Compute column percentages and rename variables
        for (column in names(var_tab[-1])){
            new_var <- sub("sum", "pct_column", column)
            var_tab[[new_var]] <- var_tab[[column]] * 100 / collapse::fsum(var_tab[[column]])
        }

        # Get only the needed variables for this iteration
        var_tab <- var_tab |>
            collapse::get_vars(c(rows, stat), regex = TRUE)
    }
    else{
        # Get only the needed variables for this iteration
        var_tab <- cross_tab |>
            collapse::get_vars(c(rows, stat), regex = TRUE)
    }

    # Compute row and column sums
    var_tab[["total"]]    <- rowSums(var_tab[, -1], na.rm = TRUE)

    column_totals         <- var_tab[1, ]
    column_totals[[rows]] <- "total"
    column_totals[-1]     <- colSums(var_tab[, -1], na.rm = TRUE)
    var_tab               <- rbind(var_tab, column_totals)

    # Set unneeded totals to NA
    if (stat == "pct_column"){
        var_tab[["total"]] <- NA
    }
    else if (stat == "pct_row"){
        var_tab[nrow(var_tab), -1] <- NA
    }

    var_tab
}

###############################################################################
# Format cross table for excel output
###############################################################################
#' Format Cross Table Output (Excel Based)
#'
#' @description
#' Format a cross table with the provided row and column variables. Statistics
#' can be weighted sums, unweighted frequencies or different percentages.
#'
#' @param wb An already created workbook to add more sheets to.
#' @param cross_tab The data frame which contains the information for this cross
#' table.
#' @param rows The variable that appears in the table rows.
#' @param columns The variable that appears in the table columns.
#' @param column_names The real column names that appear in the table.
#' @param statistics The user requested statistics.
#' @param formats A list in which is specified which formats should be applied to
#' which variable.
#' @param by Separate the cross table output by the expressions of the provided variables.
#' @param titles Character vector of titles to display above the table.
#' @param footnotes Character vector of footnotes to display under the table.
#' @param style A list containing the styling elements.
#' @param output Determines whether to style the output or to just quickly paste
#' the data.
#' @param by_info Text which contains the information which variable with which
#' expression is computed at the moment.Used for computation with by variables.
#' @param index Index of the current variable expression. Used for computation with
#' by variables.
#' @param monitor_df Data frame which stores the monitoring values.
#'
#' @return
#' Returns a list containing a formatted Excel workbook as well as the monitoring
#' data frame.
#'
#' @noRd
format_cross_excel <- function(wb,
                               cross_tab,
                               rows,
                               columns,
                               column_names,
                               statistics,
                               formats,
                               by,
                               titles,
                               footnotes,
                               style,
                               output,
                               by_info = NULL,
                               index   = NULL,
                               monitor_df){
    monitor_df <- monitor_df |> monitor_start("Excel prepare", "Format")

    # Compute individual cross tables for each operation
    for (stat in statistics){
        monitor_df <- monitor_df |> monitor_next("Excel prepare", "Format")

        # Setup cross table for formatting
        var_tab <- setup_print_table(cross_tab, rows, stat)

        # Round values according to style options
        if (stat %in% c("sum", "freq")){
            var_tab <- var_tab |>
                collapse::fmutate(across(is.numeric, round,
                                         digits = style[["number_formats"]][["sum_decimals"]]))
        }
        else{
            var_tab <- var_tab |>
                collapse::fmutate(across(is.numeric, round,
                                         digits = style[["number_formats"]][["pct_decimals"]]))
        }

        # If row multilabel, remove automatically computed total observation
        if (is_multilabel(formats, rows)){
            var_tab <- var_tab |>
                collapse::fsubset(var_tab[[rows]] != "total")

            if (stat == "pct_row"){
                if (length(by) == 0){
                    message(" ~ NOTE: The format for variable '", rows, "' is a multilabel.\n",
                            "         In this case row percentages aren't computed properly.")
                }
            }
        }

        column_names[is.na(column_names)] <- style[["na_symbol"]]

        # If column multilabel, remove automatically computed total column.
        # Set the real column names according to number of columns.
        if (is_multilabel(formats, columns)){
            var_tab <- var_tab |> dropp("total")

            names(var_tab) <- c(paste0(rows, " / ", columns), column_names)
        }
        else{
            names(var_tab) <- c(paste0(rows, " / ", columns), column_names, "total")
        }

        # Get table ranges
        cross_ranges <- get_table_ranges(var_tab, titles, footnotes, style)

        # If function is called with by variables the sheet names have to be differentiated
        # and by info has to be written besides the table.
        if (!is.null(by_info)){
            by_info_col <- style[["start_column"]] + length(var_tab) + 1

            wb$add_worksheet(paste0(stat, index),
                             grid_lines = style[["grid_lines"]])
            wb$add_data(x         = by_info,
                        start_col = by_info_col,
                        start_row = style[["start_row"]])
        }
        else{
            wb$add_worksheet(stat, grid_lines = style[["grid_lines"]])
        }

        # Format titles and footnotes if there are any
        monitor_df <- monitor_df |> monitor_next("Excel titles/footnotes", "Format")
        wb <- wb |>
            format_titles_foot_excel(titles, footnotes, cross_ranges, style, output)

        # Add table data and format according to style options
        monitor_df <- monitor_df |> monitor_next("Excel data", "Format")

        wb$add_data(x          = var_tab,
                    start_col  = style[["start_column"]],
                    start_row  = cross_ranges[["header.row"]],
                    na.strings = style[["na_symbol"]])

        # Only do the formatting when user specified it. With the excel_nostyle
        # option this whole part gets omitted to get a very quick unformatted
        # excel output.
        if (output == "excel"){
            monitor_df <- monitor_df |> monitor_next("Excel cell styles", "Format")
            wb <- wb |> handle_cell_styles(cross_ranges, style)

            if (stat == "sum" || stat== "freq"){
                wb$add_cell_style(dims                = cross_ranges[["table_range"]],
                                  apply_number_format = TRUE,
                                  num_fmt_id          = wb$styles_mgr$get_numfmt_id(paste0(stat, "_numfmt")))
            }
            else{
                wb$add_cell_style(dims                = cross_ranges[["table_range"]],
                                  apply_number_format = TRUE,
                                  num_fmt_id          = wb$styles_mgr$get_numfmt_id("pct_numfmt"))
            }

            # Adjust table dimensions
            monitor_df <- monitor_df |> monitor_next("Excel widths/heights", "Format")
            wb <- wb |> handle_col_row_dimensions(cross_ranges,
                                                  ncol(var_tab) + (style[["start_column"]] - 1),
                                                  nrow(var_tab) + (style[["start_row"]] - 1),
                                                  style)

            wb <- wb |> handle_auto_dimensions(cross_ranges,
                                               style)

            wb <- wb |> handle_header_table_dim(cross_ranges,
                                                style)

            wb$add_ignore_error(dims = cross_ranges[["header_range"]],  number_stored_as_text = TRUE)
            wb$add_ignore_error(dims = cross_ranges[["cat_col_range"]], number_stored_as_text = TRUE)

            wb$add_named_region(dims = cross_ranges[["whole_tab_range"]], name = "table", local_sheet = TRUE)
            wb$add_named_region(dims = cross_ranges[["table_range"]],     name = "data",  local_sheet = TRUE)
        }
    }

    monitor_df <- monitor_df |> monitor_end()

    # Return workbook
    list(wb, monitor_df)
}

###############################################################################
# Format grouped by tables for console output
###############################################################################
#' Format Cross Table Output with by Variables (Text Based)
#'
#' @description
#' Format a cross table with the provided row and column variables. Statistics
#' can be weighted sums, unweighted frequencies or different percentages.
#'
#' @param cross_tab The data frame which contains the information for this cross
#' table.
#' @param rows The variable that appears in the table rows.
#' @param columns The variable that appears in the table columns.
#' @param column_names The real column names that appear in the table.
#' @param statistics The user requested statistics.
#' @param formats A list in which is specified which formats should be applied to
#' which variable.
#' @param by Separate the output by the expressions of the provided variables.
#' @param titles Character vector of titles to display above the table.
#' @param footnotes Character vector of footnotes to display under the table.
#' @param na.rm If TRUE removes all NA values from the tabulation.
#'
#' @return
#' Returns a character vector with all formatted rows for the tables.
#'
#' @noRd
format_cross_by_text <- function(cross_tab,
                                 rows,
                                 columns,
                                 column_names,
                                 statistics,
                                 formats,
                                 by,
                                 titles,
                                 footnotes,
                                 na.rm){
    # Print message if multilabels are applied
    if (is_multilabel(formats, rows)){
        if ("pct_row" %in% statistics){
            message(" ~ NOTE: The format for variable '", rows, "' is a multilabel.\n",
                    "         In this case row percentages aren't computed properly.")
        }
    }

    complete_tabs <- c()

    # Loop through all by variables
    for (by_var in by){
        # Select by variables one by one
        cross_by <- cross_tab |>
            collapse::fsubset(cross_tab[["BY"]] == by_var)

        # Extract unique values
        if (anyNA(cross_by[["by_vars"]])){
            values <- c(unique(stats::na.omit(cross_by[["by_vars"]])), NA)
        }
        else{
            values <- unique(cross_by[["by_vars"]])
        }

        # Loop through all unique values to generate frequency tables per expression
        for (value in values){
            # In case NAs are removed
            if (is.na(value) && na.rm){
                next
            }

            # Put additional header together with the information which by variable
            # and which value is currently filtered.
            header <- paste0("| ", by_var, " = ", value, " |")

            complete_header <- c("\n", strrep("-", nchar(header)),
                                 header,
                                 strrep("-", nchar(header)))

            # Filter table by current by variable and value
            if (!is.na(value)){
                cross_temp <- cross_by |>
                    collapse::fsubset(cross_by[["by_vars"]] == value)
            }
            else{
                cross_temp <- cross_by |>
                    collapse::fsubset(is.na(cross_by[["by_vars"]]))
            }

            # Generate frequency tables as normal but base is filtered data frame
            current_cross <- format_cross_text(cross_temp,
                                               rows,
                                               columns,
                                               column_names,
                                               statistics,
                                               formats,
                                               by,
                                               titles,
                                               footnotes)

            # Output formatted result
            complete_tabs <- c(complete_tabs, complete_header, current_cross)
        }
    }

    # Output formatted result
    complete_tabs
}

###############################################################################
# Format grouped by tables for excel output
###############################################################################
#' Format Cross Table Output with by Variables (Excel Based)
#'
#' @description
#' Format a cross table with the provided row and column variables. Statistics
#' can be weighted sums, unweighted frequencies or different percentages.
#'
#' @param cross_tab The data frame which contains the information for this cross
#' table.
#' @param rows The variable that appears in the table rows.
#' @param columns The variable that appears in the table columns.
#' @param column_names The real column names that appear in the table.
#' @param statistics The user requested statistics.
#' @param formats A list in which is specified which formats should be applied to
#' which variable.
#' @param by Separate the output by the expressions of the provided variables.
#' @param titles Character vector of titles to display above the table.
#' @param footnotes Character vector of footnotes to display under the table.
#' @param style A list containing the styling elements.
#' @param output Determines whether to style the output or to just quickly paste
#' the data.
#' @param na.rm If TRUE removes all NA values from the tabulation.
#' @param monitor_df Data frame which stores the monitoring values.
#'
#' @return
#' Returns a list containing a formatted Excel workbook as well as the monitoring
#' data frame.
#'
#' @noRd
format_cross_by_excel <- function(cross_tab,
                                  rows,
                                  columns,
                                  column_names,
                                  statistics,
                                  formats,
                                  by,
                                  titles,
                                  footnotes,
                                  style,
                                  output,
                                  na.rm,
                                  wb,
                                  monitor_df){
    monitor_df <- monitor_df |> monitor_start("Excel prepare (by)", "Format by")

    # Print message if multilabels are applied
    if (is_multilabel(formats, rows)){
        if ("pct_row" %in% statistics){
            message(" ~ NOTE: The format for variable '", rows, "' is a multilabel.\n",
                    "         In this case row percentages aren't computed properly.")
        }
    }

    index <- 1

    # Loop through all by variables
    monitor_df <- monitor_df |> monitor_end()

    for (by_var in by){
        monitor_df <- monitor_df |> monitor_start(paste0("Excel prepare (", by_var, ")"), "Format by")

        # Select by variables one by one
        cross_by <- cross_tab |>
            collapse::fsubset(cross_tab[["BY"]] == by_var)

        # Extract unique values
        if (anyNA(cross_by[["by_vars"]])){
            values <- c(unique(stats::na.omit(cross_by[["by_vars"]])), NA)
        }
        else{
            values <- unique(cross_by[["by_vars"]])
        }

        monitor_df <- monitor_df |> monitor_end()

        # Loop through all unique values to generate frequency tables per expression
        for (value in values){
            # In case NAs are removed
            if (is.na(value) && na.rm){
                next
            }

            monitor_df <- monitor_df |> monitor_start(paste0("Excel (", by_var, "_", value, ")"), "Format by")
            message("   + ", paste0(by_var, " = ", value))

            # Put additional by info together with the information which by variable
            # and which value is currently filtered.
            by_info <- paste0(by_var, " = ", value)

            # Filter table by current by variable and value
            if (!is.na(value)){
                cross_temp <- cross_by |>
                    collapse::fsubset(cross_by[["by_vars"]] == value)
            }
            else{
                cross_temp <- cross_by |>
                    collapse::fsubset(is.na(cross_by[["by_vars"]]))
            }

            # Generate frequency tables as normal but base is filtered data frame
            wb_list <- format_cross_excel(wb,
                                          cross_temp,
                                          rows,
                                          columns,
                                          column_names,
                                          statistics,
                                          formats,
                                          by,
                                          titles,
                                          footnotes,
                                          style,
                                          output,
                                          by_info,
                                          index,
                                          NULL)

            index <- index + 1

            wb <- wb_list[[1]]

            monitor_df <- monitor_df |> monitor_end()
        }

        monitor_df <- monitor_df |> monitor_end()
    }

    # Return workbook
    list(wb, monitor_df)
}
Any scripts or data that you put into this service are public.
qol documentation built on Dec. 14, 2025, 1:06 a.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
qol
Powerful 'SAS' Inspired Concepts for more Efficient Bigger Outputs

R/crosstabs.R
In qol: Powerful 'SAS' Inspired Concepts for more Efficient Bigger Outputs

Defines functions format_cross_by_excel format_cross_by_text format_cross_excel setup_print_table wrap_header format_cross_text crosstabs

Documented in crosstabs

Try the qol package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

qol Powerful 'SAS' Inspired Concepts for more Efficient Bigger Outputs

R/crosstabs.R In qol: Powerful 'SAS' Inspired Concepts for more Efficient Bigger Outputs

Defines functions format_cross_by_excel format_cross_by_text format_cross_excel setup_print_table wrap_header format_cross_text crosstabs

Documented in crosstabs

Try the qol package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

qol
Powerful 'SAS' Inspired Concepts for more Efficient Bigger Outputs

R/crosstabs.R
In qol: Powerful 'SAS' Inspired Concepts for more Efficient Bigger Outputs