R/utils.R

Defines functions extract_digits format_cells

#' Format Numeric Values with Thousands Separator
#'
#' Applies comma-based thousands separators to a vector of numeric values, omitting
#' scientific notation and treating `NA` values appropriately. This function is useful
#' for improving readability of large numbers in tables and data frames.
#'
#' @param values A numeric vector of values to be formatted. `NA` values in the vector
#'        are handled and returned as `NA_character_` to maintain compatibility with
#'        formatted output.
#'
#' @return A character vector of formatted values. Each value is represented with commas
#'         as thousands separators (e.g., \code{"1,000"}), and no scientific notation is used.
#'         `NA` values are returned as `NA_character_`.
#'
#' @details This function processes each element in the `values` vector individually.
#'          Non-`NA` values are formatted as strings with comma separators for thousands,
#'          and spaces are removed to prevent unintended whitespace. `NA` values are
#'          preserved and returned as `NA_character_` to ensure compatibility with
#'          tabular data that may require formatted numeric outputs alongside missing data.
#' @noMd
#' @noRd
#'
format_cells <- function(values) {

  # Apply formatting to each element in the vector
  formatted_values <- sapply(values, function(value) {
    if (is.na(value)) {
      return(NA_character_)  # Return NA_character_ for NA elements
    }

    # Format value with comma as thousands separator, without scientific notation
    formatted_value <- format(value, big.mark = ",", scientific = FALSE, nsmall = 0)

    # Remove any extraneous spaces
    formatted_value <- gsub(" ", "", formatted_value)

    formatted_value
  })

  formatted_values
}


#' Extract Numeric Values from Formatted Strings
#'
#' This function extracts numeric values from character strings that may contain optional
#' symbols (`<` or `>`), commas, or decimal points. It removes these symbols and formats
#' the strings to ensure proper numeric conversion. The function checks for disallowed
#' formats and throws an error if any values do not meet the specified pattern.
#'
#' @param values A character vector of numeric-like values.
#'   Values may contain optional `<` or `>` symbols at the start, commas as thousand
#'   separators, and decimal points.
#'
#' @return A numeric vector containing the extracted numeric values. If a value is
#'   incorrectly formatted, the function stops with an error message.
#'
#' @details
#' The function first checks for valid formats using a regular expression. Only values
#' matching the following pattern are accepted:
#' - Optional `<` or `>` symbol at the start.
#' - Optional negative sign (`-`) before the number.
#' - Digits in groups of three with commas as thousand separators.
#' - Optional decimal point with digits following.
#'
#' If the format is valid, the function removes the `<` or `>` symbols and commas,
#' and converts the cleaned string to a numeric value.
#' @noMd
#' @noRd
#'
extract_digits <- function(values) {

  if (is.numeric(values) || is.integer(values)) {
    return(values)
  }

  # Identify which values are NA
  is_na <- is.na(values)

  # For non-NA values, apply the pattern matching
  non_na_values <- values[!is_na]

  # Define the allowed pattern: optional '<' or '>', optional '-', digits, optional '.', digits
  allowed_pattern <- "^([<>]?) ?-?[0-9]{1,3}(,[0-9]{3})*(\\.[0-9]+)?$"

  # Check for disallowed characters or invalid formats in non-NA values
  if (any(!grepl(allowed_pattern, non_na_values))) {
    stop("Error: Values contain disallowed characters or invalid format.")
  }

  # Remove the '<' or '>' symbol if present at the start in non-NA values
  numeric_strings <- rep(NA_character_, length(values))
  numeric_strings[!is_na] <- gsub("^[<>] ?", "", non_na_values)

  # Remove commas from the cleaned strings
  numeric_strings <- gsub(",", "", numeric_strings)

  # Convert the cleaned strings to numeric
  numeric_values <- as.numeric(numeric_strings)

  # Check if any non-NA values could not be converted to numeric
  if (any(is.na(numeric_values[!is_na]))) {
    stop("Error: Some values could not be converted to numeric.")
  }

  return(numeric_values)
}

Try the countmaskr package in your browser

Any scripts or data that you put into this service are public.

countmaskr documentation built on April 10, 2026, 5:07 p.m.