Format numbers"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  tidy = TRUE
)

# to knit "child" Rmd files
knitr::opts_knit$set(root.dir = "../")

library(formatdown)
library(data.table)
library(knitr)

options(
  datatable.print.nrows = 15,
  datatable.print.topn = 3,
  datatable.print.class = TRUE
)
#| echo: false

formatdown_options(size = "small")

prefix_1 <- c("peta", "tera", "giga", "mega", "kilo")

symbol_1 <- c("P", "T", "G", "M", "k")
symbol_1 <- format_text(symbol_1, face = "italic")

x <- 10^seq(from = 15, to = 3, by = -3)
value_1 <- format_numbers(x, digits = 1, format = "engr", omit_power = NULL)
value_1 <- sub("1 \\\\times ", "", value_1)

prefix_2 <- c("milli", "micro", "nano", "pico", "femto")

symbol_2 <- c("m", "\\mu", "n", "p", "f")
symbol_2 <- format_text(symbol_2, face = "italic")

x <- 10^seq(from = -3, to = -15, by = -3)
value_2 <- format_numbers(x, 1, omit_power = NULL)
value_2 <- sub("1 \\\\times ", "", value_2)

DT <- data.table(prefix_1, symbol_1, value_1, prefix_2, symbol_2, value_2)
knitr::kable(DT,
  align = "lcllcl",
  col.names = rep(c("Prefix", "Symbol", "Value"), 2)
)

The rationale for the formatdown package is formatting numbers in power-of-ten notation in inline R code or tabulated columns of data frames. Other features of the package provide tools for typesetting non-power-of-ten columns to match. In this vignette, we discuss the primary formatting function format_numbers() and its convenience wrappers for scientific, engineering, and decimal notation.

Types of notation

Notation to represent large and small numbers depends on the mode of communication. In a computer script, for example, we might encode the Avogadro constant as N_A = 6.0221*10^23. The asterisk (*) and caret (\^) in this expression, however, communicate instructions to a computer, not syntactical mathematics. And while scientific E-notation (6.0221E+23) has currency in some discourse communities, power-of-ten notation, e.g., $\small N_A = 6.0221 \times 10^{23}$, is the conventional format for professional technical communication.

Power-of-ten notation is expressed,

$$ \small a \times 10^n, $$

where $\small a$ is the coefficient in decimal form and the exponent $\small n$ is an integer. Two formats are in common use [@Chase:2021, 63--67]:

The utility of the engineering form follows from the SI prefixes for physical units such as "mega-", "kilo-", "milli-", etc., corresponding to powers of 10 that are integer multiples of three.


Notes on syntax.   Programming symbols are not necessarily mathematical symbols:


Decimal subsets.   In a vector of numbers formatted in power-of-ten form, the decimal form may be preferred for any subset of values with exponents near zero, e.g., $\small n \in {-1, 0, 1, 2}$.

#| echo: false

x <- 3.12 * 10^seq(-3, 4)

sci <- format_sci(x, 3, omit_power = NA)
sci_omit <- format_sci(x, 3)
DT <- data.table(sci, sci_omit)

knitr::kable(DT,
  align = "r",
  caption = "Decimal form may be preferred for a subset",
  col.names = c(
    "scientific notation",
    "subset in decimal form"
  )
)


Decimal columns.   A table of numeric information can include columns formatted in both power-of-ten notation and decimal notation. For example, a table of atmospheric properties shown below has altitude in integer form, temperature in decimal form, and density in power-of-ten engineering notation (except for those values with exponents near zero).

#| echo: false

DT <- atmos[alt < 150, .(alt, temp, dens)]

DT$alt <- format_dcml(DT$alt, 2)
DT$temp <- format_dcml(DT$temp, 5)
DT$dens <- format_engr(DT$dens, 3)
knitr::kable(DT,
  align = "r",
  caption = "Properties of the atmosphere",
  col.names = c(
    "Altitude (km)",
    "Temperature (K)",
    "Density (kg/m$^3$)"
  )
)

formatdown_options(reset = TRUE)

The purpose of the decimal format in formatdown is to match the font face and size of decimal columns to those of the power-of-ten columns. If no power-of-ten columns are used, of course, decimal columns can be displayed as-is or formatted using other R tools.


Packages.   If you are writing your own script to follow along, we use the following packages in this vignette. Data frame operations are performed with data.table syntax. Some users may wish to translate the examples to use base R or dplyr syntax.

#| echo: true
#| eval: false

library("formatdown")
library("data.table")
library("knitr")

Markup {#markup}

We format numbers as inline math expressions delimited by $ ... $ or the optional \( ... \). For example, the Avogadro constant is marked up as

    $6.0221 \times 10^{23}$,

where the \times macro creates the multiplication symbol ($\small\times$). This math markup, as an inline equation in an R markdown document, renders as: $\small 6.0221 \times 10^{23}$. To program the markup, however, we enclose it in quote marks as a character string, that is,

    "$6.0221 \\times 10^{23}$",

which requires us to "escape" the backslash in \times by adding an extra backslash. When the optional font size argument is assigned, formatdown adds a LaTeX-style sizing macro such as \small or \large, for example,

    "$\\small 6.0221 \\times 10^{23}$",

where again the markup includes an extra backslash.

format_sci() {#format_sci}

Converts numbers to character strings in power-of-ten form,

    "$a \\times 10^{n}$"

where $\small a$ is the coefficient and $\small n$ is the exponent. format_sci() is a wrapper for the more general function format_numbers(). For a subset of values with exponents near zero, e.g., $\small n \in {-1, 0, 1, 2}$, the output is in decimal form,

    "$a$"


Usage.  

format_sci(x,
           digits = 4,
           ...,
           omit_power = c(-1, 2),
           set_power = NULL,
           delim          = formatdown_options("delim"),
           size           = formatdown_options("size"),
           decimal_mark   = formatdown_options("decimal_mark"),
           small_mark     = formatdown_options("small_mark"),
           small_interval = formatdown_options("small_interval"), 
           whitespace     = formatdown_options("whitespace"))


Examples.   These early examples are shown with default arguments. Arguments are explored more fully starting with Numeric input section.

# 1. Avogadro constant
L <- 6.0221E+23
format_sci(L)

# 2. Elementary charge
e <- 1.602176634e-19
format_sci(e)

Examples 1 and 2 (in inline code chunks) render as,

  1. The Avogadro constant is $\small L =$ r format_sci(L, size = "small") $\small \mathit{mol}^{-1}$.
  2. The elementary charge constant is $\small e =$ r format_sci(e, size = "small") $\small C$.

format_engr() {#format_engr}

Similar to format_sci() except using engineering notation, i.e., exponents are multiples of 3.


Usage.

format_engr(x,
            digits = 4,
            ...,
            omit_power = c(-1, 2),
            set_power = NULL,
            delim          = formatdown_options("delim"),
            size           = formatdown_options("size"),
            decimal_mark   = formatdown_options("decimal_mark"),
            small_mark     = formatdown_options("small_mark"),
            small_interval = formatdown_options("small_interval"), 
            whitespace     = formatdown_options("whitespace"))


Examples.   (with default arguments)

# 3. Avogadro constant
format_engr(L)

# 4. Elementary charge
format_engr(e)

Examples 3 and 4 render as,

  1. The Avogadro constant is $\small L =$ r format_engr(L, size = "small") $\small \mathit{mol}^{-1}$.
  2. The elementary charge constant is $\small e =$ r format_engr(e, size = "small") $\small C$.

format_dcml() {#format_dcml}

A wrapper for the more general function format_numbers(); converts numbers to character strings in decimal form,

    "$a$"

where $\small a$ is the decimal value.


Usage.

format_dcml(x,
            digits = 4,
            ...,
            size           = formatdown_options("size"),
            delim          = formatdown_options("delim"),
            decimal_mark   = formatdown_options("decimal_mark"),
            big_mark       = formatdown_options("big_mark"),
            big_interval   = formatdown_options("big_interval"),
            small_mark     = formatdown_options("small_mark"),
            small_interval = formatdown_options("small_interval"), 
            whitespace     = formatdown_options("whitespace"))


Examples.   (with default arguments)

# 5. Speed of light in a vacuum
c <- 299792458
format_dcml(c)

# 6. Molar gas constant
R <- 8.31446261815324
format_dcml(R)

Examples 5 and 6 render as,

  1. The speed of light in a vacuum is $\small c =$ r format_dcml(c, size = "small") $\small\mathit{m/s}$.
  2. The molar gas constant is $\small R =$ r format_dcml(R, size = "small") $\small\mathit{J}\cdot\mathit{K}^{-1}\mathit{mol}^{-1}$.

format_numbers() {#format_numbers}

format_numbers() is the general-purpose formatting function called by format_sci(), format_engr(), and format_dcml(). The general function can be used instead of the convenience functions simply by setting its format argument to "sci", "engr" (default), or "dcml".


Usage.

format_numbers(x,
               digits = 4,
               format = "engr",
               ...,
               omit_power = c(-1, 2),
               set_power = NULL,
               delim          = formatdown_options("delim"),
               size           = formatdown_options("size"),
               decimal_mark   = formatdown_options("decimal_mark"),
               big_mark       = formatdown_options("big_mark"),
               small_mark     = formatdown_options("small_mark"),
               big_interval   = formatdown_options("big_interval"),
               small_interval = formatdown_options("small_interval"), 
               whitespace     = formatdown_options("whitespace"))

Examples.   Reproducing some of the earlier examples using format_numbers().

# 7. Scientific
format_numbers(L, format = "sci")

# 8. Engineering
format_numbers(e, format = "engr")

# 9. Decimal
format_numbers(R, format = "dcml")

Examples 7--9 render as,

  1. The Avogadro constant is $\small L =$ r format_numbers(L, format = "sci", size = "small") $\small \mathit{mol}^{-1}$.
  2. The elementary charge constant is $\small e =$ r format_numbers(e, format = "engr", size = "small") $\small C$.
  3. The molar gas constant is $\small R =$ r format_dcml(R, size = "small") $\small\mathit{J}\cdot\mathit{K}^{-1}\mathit{mol}^{-1}$.

Numeric input {#numeric-input}

This section begins our detailed discussion of arguments.

Scalar input.   Generally used with inline R code. For example, the following R markdown sentence, which includes some math markup and some inline R code,

    The Avogadro constant is $L = $ `r format_sci(L)` $\mathit{mol}^{-1}$. 

renders as: The Avogadro constant is $\small L =$ r format_sci(L, size = "small") $\small\mathit{mol}^{-1}$.


Vector.   A vector of numbers (or a data frame column) is marked up as follows,

# 10. Sample vector
x <- c(2.3333e-5, 3.4444e-4, 5.2222e-2, 6.3333e-1, 8.1111e+1, 9.2222e+2, 2.4444e+4, 3.1111e+5, 4.2222e+6)
format_engr(x)

In a table, the output renders as,

#| echo: false
formatdown_options(size = "small")
DT <- data.table(x, format_engr(x))
knitr::kable(DT,
  align = "r",
  col.names = c("Unformatted", "Engr notation"),
  caption = "Example 10."
)
#| echo: false
formatdown_options(reset = TRUE)

For values with exponents $\small n\in{-1, 0, 1, 2}$, the default format is decimal; see Excluding exponents.

Units input

The units R package (website: Measurement Units for R) provides measurement units for R vectors, converting vectors of class "numeric" to class "units" [@Pebesma+Mailund+Hiebert:2016:units]. For example

# Number
x <- 10320
class(x)

# Convert to units class
units(x) <- "m"
x
class(x)

# Operations are reflected in the values and its units
y <- x^2
y

# Unit conversion is supported
z <- y
z
units(z) <- "ft^2"
z

If an input argument to format_numbers() (or its convenience functions) is of class "units", formatdown attempts to extract the units character string, format the number in the expected way, and append a units character string to the result. For example,

# 11. Units-class inputs
format_sci(x)
format_sci(y)
format_sci(z)

Example 11 renders as,

More complicated units can be managed. For example the Newtonian gravitational constant could be formatted as follows, where the exponents in the units definition are given in "implicit" form, that is, where $\small m^3 kg^{-1} s^{-2}$ is represented by "m3 kg-1 s-2".

    G        <- 6.6743e-11
    units(G) <- "m3 kg-1 s-2"
    format_sci(G)

Applying a similar procedure to several physical constants and collecting the results in a data frame yields,

#| echo: false
c <- 299792458
units(c) <- "m/s"
format_c <- format_sci(c, size = "small")

h <- 6.62607015e-34
units(h) <- "J/Hz"
format_h <- format_sci(h, size = "small")

mu <- 1.25663706212e-6
units(mu) <- "N A-2"
format_mu <- format_sci(mu, size = "small")

G <- 6.67430e-11
units(G) <- "m3 kg-1 s-2"
format_G <- format_sci(G, size = "small")

ke <- 8.9875517923e+9
units(ke) <- "N m2 C-2"
format_ke <- format_sci(ke, size = "small")

sigma <- 5.67037442e-8
units(sigma) <- "W m-2 K-4"
format_sigma <- format_sci(sigma, size = "small")

symbol <- c(
  "$\\small c$",
  "$\\small h$",
  "$\\small \\mu_0$",
  "$\\small G$",
  "$\\small k_e$",
  "$\\small \\sigma$"
)

quantity <- format_text(
  c(
    "speed of light in a vacuum",
    "Planck constant",
    "vacuum magnetic permeability",
    "Newtonian gravitational constant",
    "Coulomb constant",
    "Stefan-Boltzmann constant"
  ),
  size = "small"
)

formatted_value <- c(
  format_c,
  format_h,
  format_mu,
  format_G,
  format_ke,
  format_sigma
)

DT <- data.table(symbol, quantity, formatted_value)
knitr::kable(DT)

This table is constructed simply to illustrate how formatdown returns a variety of units-class values with units appended to the formatted number.

In a typical application, however, the numbers in a column have the same physical units and are formatted as a vector. For example,

#| echo: false
formatdown_options(size = "small")
# Example 12
DT <- air_meas[, .(temp, pres, sp_gas, dens)]

# Examine data
DT[]

# Assign units
units(DT$temp) <- "K"
units(DT$pres) <- "Pa"
units(DT$sp_gas) <- "J kg-1 K-1"
units(DT$dens) <- "kg m-3"

# Format one column at a time
DT$temp <- format_dcml(DT$temp)
DT$pres <- format_engr(DT$pres)

# Or format multiple columns in one pass
cols <- c("sp_gas", "dens")
DT[, (cols) := lapply(.SD, format_dcml), .SDcols = cols]

knitr::kable(DT, align = "r", caption = "Example 12.")
#| echo: false
formatdown_options(reset = TRUE)

Significant digits {#significant-digits}

Significant digits are applied to the input argument using the base R function signif() before additional formatting is applied. For example,

# 13. Significant digits
format_sci(e, digits = 5)
format_sci(e, digits = 4)
format_sci(e, digits = 3)

Example 13 renders as,

Formats {#formats}

The format argument appears in format_numbers() only. The default is "engr". The format is preset in the format_dcml(), format_engr(), and format_sci() convenience functions.

To compare the effects across many orders of magnitude, we display the same vector in different formats.

#| echo: false
formatdown_options(size = "small")
# 14. Comparing formats
x <- c(2.3333e-5, 3.4444e-4, 5.2222e-2, 6.3333e-1, 8.1111e+1, 9.2222e+2, 2.4444e+4, 3.1111e+5, 4.2222e+6)
dcml <- format_numbers(x, 3, format = "dcml")
sci <- format_numbers(x, 3, format = "sci")
engr <- format_numbers(x, 3, format = "engr")
DT <- data.table(dcml, sci, engr)
knitr::kable(DT,
  align = "r",
  col.names = c("decimal", "scientific", "engineering"),
  caption = "Example 14."
)
#| echo: false
formatdown_options(reset = TRUE)

The values displayed without powers-of-ten notation in the scientific and engineering columns are determined by the omit_power argument discussed next.

Excluding a range of exponents {#excluding-exponents}

When specifying power-of-ten notation, numbers with exponents lying within the range of the omit_power argument are typeset in decimal form. In engineering notation, the exponent is checked for lying within the range before and after the conversion to multiple-of-3 exponents.

To illustrate, we compare two omit_power settings in both scientific and engineering formats. In some columns, we set omit_power = NULL, which imposes power-of-ten notation on the entire vector.

#| echo: false
formatdown_options(size = "small")
# 15. Effects of omit_power
DT <- atmos[3:12, .(pres)]
DT[, sci_all := format_sci(pres, 3, omit_power = NULL)]
DT[, sci_omit := format_sci(pres, 3, omit_power = c(-1, 0))]
DT[, engr_all := format_engr(pres, 3, omit_power = NULL)]
DT[, engr_omit := format_engr(pres, 3, omit_power = c(-1, 0))]
knitr::kable(DT,
  align = "r",
  col.names = c(
    "Unformatted",
    "all scientific",
    "scientific w/ omit",
    "all engineering",
    "engineering w/ omit"
  ),
  caption = "Example 15."
)

Comments:


If a single value is assigned, e.g., omit_power = 0, the argument is interpreted as c(0, 0).

# 16. Omit power used for a single value of exponent
DT <- atmos[3:12, .(pres)]
DT[, sci_all := format_sci(pres, 3, omit_power = NULL)]
DT[, sci_omit := format_sci(pres, 3, omit_power = 0)]
DT[, engr_all := format_engr(pres, 3, omit_power = NULL)]
DT[, engr_omit := format_engr(pres, 3, omit_power = 0)]
knitr::kable(DT,
  align = "r",
  col.names = c(
    "Unformatted",
    "all scientific",
    "scientific w/ omit",
    "all engineering",
    "engineering w/ omit"
  ),
  caption = "Example 16."
)
#| echo: false
formatdown_options(reset = TRUE)


Setting omit_power = c(-Inf, Inf) yields the same decimal result as format = "dcml" and overrides any other format setting. For example,

# 17. Different ways of creating a decimal format
(y <- 6.78e-3)

(p <- format_numbers(y, 3, "sci", omit_power = c(-Inf, Inf)))

(q <- format_numbers(y, 3, "dcml"))

(r <- format_dcml(y, 3))

all.equal(p, q)
all.equal(p, r)

Example 17 (all cases) renders as,

Enforcing a specific exponent {#enforcing-exponent}

When values in a table column span only a few orders of magnitude, an audience is sometimes better served by setting the notation to a constant power of ten. For example, here we show numbers in scientific format and compare to columns in which the exponents are set to fixed values. Assigning a value to set_power overrides omit_power and format.

#| echo: false
formatdown_options(size = "small")
# 18. set_power argument
DT <- atmos[alt <= 40, .(alt, pres, dens)]
DT[, sci_pres := format_sci(pres, 3, omit_power = c(-1, 2))]
DT[, set_pres := format_sci(pres, 3, omit_power = c(-1, 2), set_power = 3)]
DT[, sci_dens := format_engr(dens, 3, omit_power = c(-1, 2))]
DT[, set_dens := format_engr(dens, 3, omit_power = c(-1, 2), set_power = -2)]
DT[, pres := NULL]
DT[, dens := NULL]
knitr::kable(DT,
  align = "r",
  col.names = c("Altitude (km)", "Pressure (Pa)", "with set_power", "Density (kg/m$^{3}$)", "with set_power"),
  caption = "Example 18."
)
#| echo: false
formatdown_options(reset = TRUE)

Options

Arguments assigned using formatdown_options() are described in the Global settings article.

References



Try the formatdown package in your browser

Any scripts or data that you put into this service are public.

formatdown documentation built on May 29, 2024, 8:21 a.m.