### Formatting
#' Create a `f_str` object
#'
#' `f_str` objects are intended to be used within the function
#' `set_format_strings`. The `f_str` object carries information that powers a
#' significant amount of layer processing. The `format_string` parameter is
#' capable of controlling the display of a data point and decimal precision. The
#' variables provided in `...` control which data points are used to populate
#' the string formatted output.
#'
#' @details Format strings are one of the most powerful components of 'Tplyr'.
#' Traditionally, converting numeric values into strings for presentation can
#' consume a good deal of time. Values and decimals need to align between
#' rows, rounding before trimming is sometimes forgotten - it can become a
#' tedious mess that is realistically not an important part of the analysis
#' being performed. 'Tplyr' makes this process as simple as we can, while
#' still allowing flexibility to the user.
#'
#' Tplyr provides both manual and automatic decimal precision formatting. The
#' display of the numbers in the resulting data frame is controlled by the
#' `format_string` parameter. For manual precision, just like dummy values may
#' be presented on your mocks, integer and decimal precision is specified by
#' the user providing a string of 'x's for how you'd like your numbers
#' formatted. If you'd like 2 integers with 3 decimal places, you specify your
#' string as 'xx.xxx'. 'Tplyr' does the work to get the numbers in the right
#' place.
#'
#' To take this a step further, automatic decimal precision can also be
#' obtained based on the collected precision within the data. When creating
#' tables where results vary by some parameter, different results may call for
#' different degrees of precision. To use automatic precision, use a single
#' 'a' on either the integer and decimal side. If you'd like to use increased
#' precision (i.e. you'd like mean to be collected precision +1), use 'a+1'.
#' So if you'd like both integer and and decimal precision to be based on the
#' data as collected, you can use a format like 'a.a' - or for collected+1
#' decimal precision, 'a.a+1'. You can mix and match this with manual formats
#' as well, making format strings such as 'xx.a+1'.
#'
#' If you want two numbers on the same line, you provide two sets of x's. For
#' example, if you're presenting a value like "mean (sd)" - you could provide
#' the string 'xx.xx (xx.xxx)', or perhaps 'a.a+1 (a.a+2). Note that you're
#' able to provide different integer lengths and different decimal precision
#' for the two values. Each format string is independent and relates only to
#' the format specified.
#'
#' As described above, when using 'x' or 'a', any other character within the
#' format string will stay stationary. So for example, if your format string
#' is 'xx (xxx.x)', your number may format as '12 ( 34.5)'. So the left side
#' parenthesis stays fixed. In some displays, you may want the parenthesis to
#' 'hug' your number. Following this example, when allotting 3 spaces for the
#' integer within parentheses, the parentehsis should shift to the right,
#' making the numbers appear '12 (34.5)'. Using `f_str()` you can achieve
#' this by using a capital 'X' or 'A'. For this example, the format string
#' would be 'xx (XXX.x)'.
#'
#' There are a two rules when using 'parenthesis hugging':
#'
#' - Capital letters should only be used on the integer side of a number
#' - A character must precede the capital letter, otherwise there's no
#' character to 'hug'
#'
#' The other parameters of the `f_str` call specify what values should fill
#' the x's. `f_str` objects are used slightly differently between different
#' layers. When declaring a format string within a count layer, `f_str()`
#' expects to see the values `n` or `distinct_n` for event or distinct counts,
#' `pct` or `distinct_pct` for event or distinct percentages, or `total` or
#' `distinct_total` for denominator calculations. Note that in an `f_str()`
#' for a count layer 'A' or 'a' are based on n counts, and therefore don't
#' make sense to use in percentages. But in descriptive statistic layers,
#' `f_str` parameters refer to the names of the summaries being performed,
#' either by built in defaults, or custom summaries declared using
#' [set_custom_summaries()]. See [set_format_strings()] for some more notes
#' about layers specific implementation.
#'
#' An `f_str()` may also be used outside of a Tplyr table. The function
#' [apply_formats()] allows you to apply an `f_str` within the context of
#' [dplyr::mutate()] or more generally a vectorized function.
#'
#' @section Valid `f_str()` Variables by Layer Type:
#'
#' Valid variables allowed within the `...` parameter of `f_str()` differ by
#' layer type.
#'
#' - Count layers
#' - `n`
#' - `pct`
#' - `total`
#' - `distinct_n`
#' - `distinct_pct`
#' - `distinct_total`
#' - Shift layers
#' - `n`
#' - `pct`
#' - `total`
#' - Desc layers
#' - `n`
#' - `mean`
#' - `sd`
#' - `median`
#' - `var`
#' - `min`
#' - `max`
#' - `iqr`
#' - `q1`
#' - `q3`
#' - `missing`
#' - Custom summaries created by [set_custom_summaries()]
#'
#' @param format_string The desired display format. X's indicate digits. On the
#' left, the number of x's indicates the integer length. On the right, the
#' number of x's controls decimal precision and rounding. Variables are
#' inferred by any separation of the 'x' values other than a decimal.
#' @param ... The variables to be formatted using the format specified in
#' \code{format_string}.
#' @param empty The string to display when the numeric data is not available.
#' For desc layers, an unnamed character vector will populate within the
#' provided format string, set to the same width as the fitted numbers. Use a
#' single element character vector, with the element named '.overall' to
#' instead replace the whole string.
#'
#' @return A `f_str` object
#' @export
#' @md
#'
#' @examples
#'
#' f_str("xx.x (xx.x)", mean, sd)
#'
#' f_str("a.a+1 (a.a+2)", mean, sd)
#'
#' f_str("xx.a (xx.a+1)", mean, sd)
#'
#' f_str("xx.x, xx.x, xx.x", q1, median, q3)
#'
#' f_str("xx (XXX.x%)", n, pct)
#'
#' f_str("a.a+1 (A.a+2)", mean, sd)
#'
f_str <- function(format_string, ..., empty=c(.overall='')) {
# Capture the variables off of the ellipsis
vars <- enexprs(...)
# Null out the names for vars
names(vars) <- NULL
# Check format string class
assert_has_class(format_string, "character")
# Do a pre-check of the format string to catch invalid auto specifications
if (str_detect(format_string, "AA|aa")) {
stop(paste0("In f_str(), only use a single 'A' or 'a' on the integer or",
" decimal side to trigger auto precision."), call.=TRUE)
}
# Parse out the format string sections
rx <- get_format_string_regex()
formats <- str_extract_all(format_string, rx)[[1]]
# Duplicate any '%' to escape them
format_string_1 <- str_replace_all(format_string, "%", "%%")
# Make the sprintf ready string
repl_str <- str_replace_all(format_string_1, rx, "%s")
# Make sure that if two formats were found, two variables exist
assert_that(length(formats) == length(vars),
msg = paste0("In `f_str` ", length(formats), " formats were entered in the format string ",
format_string, "but ", length(vars), " variables were assigned."))
# Pull out the integer and decimal
settings <- map(formats, gather_settings)
# A value in settings will be <0 if it's an auto format
auto_precision <- any(map_lgl(settings, ~ any(as.logical(.[c('auto_int', 'auto_dec')]))))
hug_formatting <- any(map_lgl(settings, ~ !is.na(.['hug_char'])))
# All ellipsis variables are names
assert_that(all(map_lgl(vars, function(x) class(x) == "name")),
msg = "In `f_str` all values submitted via `...` must be variable names.")
structure(
list(format_string = format_string,
vars = vars,
formats = formats,
settings = settings,
size = nchar(format_string),
repl_str = repl_str,
auto_precision = auto_precision,
hug_formatting = hug_formatting,
empty=empty
),
class="f_str"
)
}
#' Gather the settings for a specific format string section
#'
#' This function will collect specific settings about a format string section,
#' including integer and decimal length, whether autos were turned on, and hug
#' character settings/
#'
#' @param x A character string representing a format string section
#'
#' @return A named list of settings
#' @noRd
gather_settings <- function(x) {
settings <- list(
int = 0,
dec = 0,
auto_int = FALSE,
auto_dec = FALSE,
hug_char = NA_character_
)
settings <- parse_hug_char(x, settings)
settings <- separate_int_dig(x, settings)
settings
}
#' Find if a hug character exists and attach to settings
#'
#' @param x Format string section
#' @param settings A list of settings for a format string section
#'
#' @return List of settings
#' @noRd
parse_hug_char <- function(x, settings) {
# Find hugging
if (str_detect(x, "X|A")) {
# Look for characters preceding X or A that aren't X or A
hug_char_rx <- regex("([^XA]+)[XA]")
# Search the hug character and pull out all matches
# x is guaranteed to be a single element vector so pull out first
# element of the list
hug_char_match <- str_match_all(x, hug_char_rx)[[1]]
# If no rows, then X or A was used with no specified hug character
if (nrow(hug_char_match) == 0) {
stop(
paste0("In f_str(), an 'X' or 'A' was used but no hug character ",
"was specified, such as a parenthesis. Use 'X' or 'A' to bind ",
"a character within a format string."),
call.=FALSE
)
}
# The match matrix can't be more than one row. If it is, it was probably
# because X or A were placed before and after a decimal, so show the user
if (nrow(hug_char_match) > 1) {
err_msg <- paste0(
"In f_str(), invalid format string specification. The following section",
" failed to parse:\n\t'", x,
"'\nThe issue is present with a hug character. Was 'X' or 'A' used after",
" a decimal?"
)
stop(err_msg, call.=FALSE)
}
# If X or A was used after the decimal at all, that's also invalid so error
# out as well
if (str_detect(hug_char_match[1,1], fixed("."))) {
stop(
paste0("In f_str(), 'X' or 'A' can only be used on the left side of a",
" decimal within a format string."),
call.=FALSE
)
}
# The hug char is in a capture group, so we pull it out of the match
settings$hug_char <- hug_char_match[1,2]
}
settings
}
#' Parse a portion of a string format
#'
#' After the string is split by the decimal, parse what remains
#' Auto formats will start at -1 and decrement by the + value
#'
#' @param x Portioned string format
#'
#' @return A numeric value. >0 is literal length, <0 is auto format
#' @noRd
parse_fmt <- function(x) {
# If it's an auto format, grab the output value
if (grepl('a|A', x)) {
# Pick out the digit
add <- replace_na(as.double(str_extract(x, '\\d+')), 0)
# Auto formats will be -1 - the specified precision
val <- 0 + add
# Give an attribute that there's an auto format
attr(val, 'auto') <- TRUE
} else {
val <- nchar(x)
# Not auto format
attr(val, 'auto') <- FALSE
}
val
}
#' Evaluate a portion of a format string to check the integer and digit lengths
#'
#' @param x Format string section
#' @param settings A list of settings for a format string section
#'
#' @return List of settings
#'
#' @noRd
separate_int_dig <- function(x, settings){
# Count the characters on each side of the decimal
fields <- str_split(x, "\\.")[[1]]
# Label the split segments
names(fields) <- c('int', 'dec')[1:length(fields)]
# Parse out length and auto info from each field and apply to settings
num_chars <- map(fields, parse_fmt)
auto <- map_lgl(num_chars, ~attr(.x, 'auto'))
settings[names(num_chars)] <- as.numeric(num_chars)
settings[paste0("auto_", names(auto))] <- auto
# If a hug character is specified,subtract if from the integer length
if (!is.na(settings$hug_char) && settings$auto_int) {
settings$int <- settings$int + (nchar(settings$hug_char) - 1)
}
settings
}
#' Helper for changing values on count f_str
#'
#' @param ... The object passed to `set_format_strings` or `set_count_layer_formats`
#'
#' @noRd
count_f_str_check <- function(...) {
# Catch the arguments from the function call so useful errors can be thrown
check <- enquos(...)
# Make sure that all of the attachments were `f_str` objects
for (i in seq_along(check)) {
if (is_named(check)) {
msg = paste0("In `set_format_string` entry `",names(check)[[i]],"` is not an `f_str` object. All assignmentes made within",
" `set_format_string` must be made using the function `f_str`. See the `f_str` documentation.")
} else {
msg = paste0("In `set_format_string` entry ",i," is not an `f_str` object. All assignmentes made within",
" `set_format_string` must be made using the function `f_str`. See the `f_str` documentation.")
}
assert_that(class(quo_get_expr(check[[i]])) == "f_str" || (is_call(quo_get_expr(check[[i]])) && call_name(check[[i]]) == "f_str"),
msg = msg)
}
# Grab the named parameters
params <- list2(...)
# Currently supported format names
valid_names <- c("n_counts", "riskdiff")
# Raise error if names were invalid
if (is_named(params)) {
assert_that(all(names(params) %in% valid_names),
msg = paste('Invalid format names supplied. Count layers only accept the following format names:',
paste(valid_names, collapse = ", "))
)
} else {
# If unnamed, then only one argument should have been supplied
assert_that(length(params) == 1, msg = "If names are not supplied, count layers can only have on format supplied.")
# Force the name in of n_counts
names(params) <- "n_counts"
}
# Check content of each f_str based on their supplied name
for (name in names(params)) {
if (name == "n_counts") {
assert_that(all(params[['n_counts']]$vars %in% c("n", "pct", "distinct", "distinct_n", "distinct_pct", "total", "distinct_total")),
msg = "f_str for n_counts in a count_layer can only be n, pct, distinct, distinct_pct, total, or distinct_total")
# Check to make sure both disintct(old), and distinct_n(new) aren't passed
assert_that(!all(c("distinct", "distinct_n") %in% params[["n_counts"]]$vars),
msg = "You can't pass both distinct and distinct_n, just use distinct_n")
# Check to make sure duplicated parameters aren't passed
assert_that(length(params[["n_counts"]]$vars) == length(unique(params[["n_counts"]]$vars)),
msg = "You've passed duplicate parameters to `set_format_strings`")
# Replace the disinct with distinct_n
if (any(params[["n_counts"]]$vars %in% "distinct")) {
warning("The use of 'distinct' in count f_strs is discouraged. It was replaced with 'distinct_n' for consistancy.")
}
params[["n_counts"]]$vars[params[["n_counts"]]$vars %in% "distinct"] <- "distinct_n"
} else if (name == "riskdiff") {
assert_that(all(params[['riskdiff']]$vars %in% c('comp', 'ref', 'dif', 'low', 'high')),
msg = "f_str for riskdiff in a count_layer can only be comp, ref, dif, low, or high")
}
}
params
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.