fmt_chem: Format chemical formulas

View source: R/format_data.R

fmt_chemR Documentation

Format chemical formulas

Description

fmt_chem() lets you format chemical formulas or even chemical reactions in the table body. Often the input text will be in a common form representing single compounds (like "C2H4O", for acetaldehyde) but chemical reactions can be used (e.g., ⁠2CH3OH -> CH3OCH3 + H2O"⁠). So long as the text within the targeted cells conforms to gt's specialized chemistry notation, the appropriate conversions will occur. Details pertaining to chemistry notation can be found in the section entitled How to use gt's chemistry notation.

Usage

fmt_chem(data, columns = everything(), rows = everything())

Arguments

data

The gt table data object

⁠obj:<gt_tbl>⁠ // required

This is the gt table object that is commonly created through use of the gt() function.

columns

Columns to target

⁠<column-targeting expression>⁠ // default: everything()

Can either be a series of column names provided in c(), a vector of column indices, or a select helper function (e.g. starts_with(), ends_with(), contains(), matches(), num_range() and everything()).

rows

Rows to target

⁠<row-targeting expression>⁠ // default: everything()

In conjunction with columns, we can specify which of their rows should undergo formatting. The default everything() results in all rows in columns being formatted. Alternatively, we can supply a vector of row captions within c(), a vector of row indices, or a select helper function (e.g. starts_with(), ends_with(), contains(), matches(), num_range(), and everything()). We can also use expressions to filter down to the rows we need (e.g., ⁠[colname_1] > 100 & [colname_2] < 50⁠).

Value

An object of class gt_tbl.

Targeting cells with columns and rows

Targeting of values is done through columns and additionally by rows (if nothing is provided for rows then entire columns are selected). The columns argument allows us to target a subset of cells contained in the resolved columns. We say resolved because aside from declaring column names in c() (with bare column names or names in quotes) we can use tidyselect-style expressions. This can be as basic as supplying a select helper like starts_with(), or, providing a more complex incantation like

where(~ is.numeric(.x) && max(.x, na.rm = TRUE) > 1E6)

which targets numeric columns that have a maximum value greater than 1,000,000 (excluding any NAs from consideration).

By default all columns and rows are selected (with the everything() defaults). Cell values that are incompatible with a given formatting function will be skipped over, like character values and numeric ⁠fmt_*()⁠ functions. So it's safe to select all columns with a particular formatting function (only those values that can be formatted will be formatted), but, you may not want that. One strategy is to format the bulk of cell values with one formatting function and then constrain the columns for later passes with other types of formatting (the last formatting done to a cell is what you get in the final output).

Once the columns are targeted, we may also target the rows within those columns. This can be done in a variety of ways. If a stub is present, then we potentially have row identifiers. Those can be used much like column names in the columns-targeting scenario. We can use simpler tidyselect-style expressions (the select helpers should work well here) and we can use quoted row identifiers in c(). It's also possible to use row indices (e.g., c(3, 5, 6)) though these index values must correspond to the row numbers of the input data (the indices won't necessarily match those of rearranged rows if row groups are present). One more type of expression is possible, an expression that takes column values (can involve any of the available columns in the table) and returns a logical vector. This is nice if you want to base formatting on values in the column or another column, or, you'd like to use a more complex predicate expression.

How to use gt's chemistry notation

The chemistry notation involves a shorthand of writing chemical formulas and chemical reactions, if needed. It should feel familiar in its basic usage and the more advanced typesetting tries to limit the amount of syntax needed. It's always best to show examples on usage:

  • "CH3O2" and "(NH4)2S" will render with subscripted numerals

  • Charges can be expressed with terminating "+" or "-", as in "H+" and "[AgCl2]-"; if any charges involve the use of a number, the following incantations could be used: "CrO4^2-", "Fe^n+", "Y^99+", "Y^{99+}" (the final two forms produce equivalent output)

  • Stoichiometric values can be included with whole values prepending formulas (e.g., "2H2O2") or by setting them off with a space, like this: "2 H2O2", "0.5 H2O", "1/2 H2O", "(1/2) H2O"

  • Certain standalone, lowercase letters or combinations thereof will be automatically stylized to fit conventions; "NO_x" and "x Na(NH4)HPO4" will have italicized 'x' characters and you can always italicize letters by surrounding with "*" (as in "*n* H2O" or "*n*-C5H12")

  • Chemical isotopes can be rendered using either of these two constructions preceding an element: "^{227}_{90}Th" or "^227_90Th"; nuclides can be represented in a similar manner, here are two examples: "^{0}_{-1}n^{-}", "^0_-1n-"

  • Chemical reactions can use "+" signs and a variety of reaction arrows: (1) "A -> B", (2) "A <- B", (3) "A <-> B", (4) "A <--> B", (5) "A <=> B", (6) "A <=>> B", or (7) "A <<=> B"

  • Center dots (useful in addition compounds) can be added by using a single "." or "*" character, surrounded by spaces; here are two equivalent examples "KCr(SO4)2 . 12 H2O" and "KCr(SO4)2 * 12 H2O"

  • Single and double bonds can be shown by inserting a "-" or "=" between adjacent characters (i.e., these shouldn't be at the beginning or end of the markup); two examples: "C6H5-CHO", "CH3CH=CH2"

  • as with units notation, Greek letters can be inserted by surrounding the letter name with ":"; here's an example that describes the delta value of carbon-13: ":delta: ^13C"

Examples

Let's use the reactions dataset and create a new gt table. The table will be filtered down to only a few rows and columns. The column cmpd_formula contains chemical formulas and the formatting of those will be performed by fmt_chem(). Certain column labels with chemical names (o3_k298 and no3_k298) can be handled within cols_label() by using surrounding the text with "{{%"/"%}}".

reactions |>
  dplyr::filter(cmpd_type == "terminal monoalkene") |>
  dplyr::filter(grepl("^1-", cmpd_name)) |>
  dplyr::select(cmpd_name, cmpd_formula, ends_with("k298")) |>
  gt() |>
  tab_header(title = "Gas-phase reactions of selected terminal alkenes") |>
  tab_spanner(
    label = "Reaction Rate Constant at 298 K",
    columns = ends_with("k298")
  ) |>
  fmt_chem(columns = cmpd_formula) |>
  fmt_scientific() |>
  sub_missing() |>
  cols_label(
    cmpd_name = "Alkene",
    cmpd_formula = "Formula",
    OH_k298 = "OH",
    O3_k298 = "{{%O3%}}",
    NO3_k298 = "{{%NO3%}}",
    Cl_k298 = "Cl"
  ) |>
  opt_align_table_header(align = "left")
This image of a table was generated from the first code example in the `fmt_chem()` help file.

Taking just a few rows from the photolysis dataset, let's create a new gt table. The cmpd_formula and products columns both contain text in chemistry notation (the first has compounds, and the second column has the products of photolysis reactions). These columns will be formatted by fmt_chem(). The compound formulas will be merged with the compound names with cols_merge().

photolysis |>
  dplyr::filter(cmpd_name %in% c(
    "hydrogen peroxide", "nitrous acid",
    "nitric acid", "acetaldehyde",
    "methyl peroxide", "methyl nitrate",
    "ethyl nitrate", "isopropyl nitrate"
  )) |>
  dplyr::select(-c(l, m, n, quantum_yield, type)) |>
  gt() |>
  tab_header(title = "Photolysis pathways of selected VOCs") |>
  fmt_chem(columns = c(cmpd_formula, products)) |>
  cols_nanoplot(
    columns = sigma_298_cm2,
    columns_x_vals = wavelength_nm,
    expand_x = c(200, 400),
    new_col_name = "cross_section",
    new_col_label = "Absorption Cross Section",
    options = nanoplot_options(
      show_data_points = FALSE,
      data_line_stroke_width = 4,
      data_line_stroke_color = "black",
      show_data_area = FALSE
    )
  ) |>
  cols_merge(
    columns = c(cmpd_name, cmpd_formula),
    pattern = "{1}, {2}"
  ) |>
  cols_label(
    cmpd_name = "Compound",
    products = "Products"
  ) |>
  opt_align_table_header(align = "left")
This image of a table was generated from the second code example in the `fmt_chem()` help file.

fmt_chem() can handle the typesetting of nuclide notation. Let's take a subset of columns and rows from the nuclides dataset and make a new gt table. The contents of the nuclide column contains isotopes of hydrogen and carbon and this is placed in the table stub. Using fmt_chem() makes it so that the subscripted and superscripted values are properly formatted to the convention of formatting nuclides.

nuclides |>
  dplyr::filter(element %in% c("H", "C")) |>
  dplyr::mutate(nuclide = gsub("[0-9]+$", "", nuclide)) |>
  dplyr::select(nuclide, atomic_mass, half_life, decay_1, is_stable) |>
  gt(rowname_col = "nuclide") |>
  tab_header(title = "Isotopes of Hydrogen and Carbon") |>
  tab_stubhead(label = "Isotope") |>
  fmt_chem(columns = nuclide) |>
  fmt_scientific(columns = half_life) |>
  fmt_number(
    columns = atomic_mass,
    decimals = 4,
    scale_by = 1 / 1e6
  ) |>
  sub_missing(
    columns = half_life,
    rows = is_stable,
    missing_text = md("**STABLE**")
  ) |>
  sub_missing(columns = half_life, rows = !is_stable) |>
  sub_missing(columns = decay_1) |>
  data_color(
    columns = decay_1,
    target_columns = c(atomic_mass, half_life, decay_1),
    palette = "LaCroixColoR::PassionFruit",
    na_color = "white"
  ) |>
  cols_label_with(fn = function(x) tools::toTitleCase(gsub("_", " ", x))) |>
  cols_label(decay_1 = "Decay Mode") |>
  cols_width(
    stub() ~ px(70),
    c(atomic_mass, half_life, decay_1) ~ px(120)
  ) |>
  cols_hide(columns = c(is_stable)) |>
  cols_align(align = "center", columns = decay_1) |>
  opt_align_table_header(align = "left") |>
  opt_vertical_padding(scale = 0.5)
This image of a table was generated from the third code example in the `fmt_chem()` help file.

Function ID

3-20

Function Introduced

v0.11.0

See Also

Other data formatting functions: data_color(), fmt(), fmt_auto(), fmt_bins(), fmt_bytes(), fmt_country(), fmt_currency(), fmt_date(), fmt_datetime(), fmt_duration(), fmt_email(), fmt_engineering(), fmt_flag(), fmt_fraction(), fmt_icon(), fmt_image(), fmt_index(), fmt_integer(), fmt_markdown(), fmt_number(), fmt_partsper(), fmt_passthrough(), fmt_percent(), fmt_roman(), fmt_scientific(), fmt_spelled_num(), fmt_tf(), fmt_time(), fmt_units(), fmt_url(), sub_large_vals(), sub_missing(), sub_small_vals(), sub_values(), sub_zero()


gt documentation built on Sept. 11, 2024, 5:15 p.m.