Parsing Quantities
In quantities: Quantity Calculus for R Vectors

knitr::opts_chunk$set(collapse = T, comment = "#>", 
                      fig.width = 6, fig.height = 4, fig.align = "center")

Introduction

The BIPM (Bureau International des Poids et Mesures) is the international authority on measurement units and uncertainty. The Joint Committee for Guides in Metrology (JCGM), dependent on the BIPM together with other international standardisation bodies, maintains two fundamental guides in metrology: the VIM ("The International Vocabulary of Metrology -- Basic and General Concepts and Associated Terms") and the GUM ("Evaluation of Measurement Data -- Guide to the Expression of Uncertainty in Measurement"). The latter defines four ways of reporting standard uncertainty. For example, if we are reporting a nominal mass $m_S$ of 100 g with some uncertainty $u_c$:

1) $m_S$ = 100.02147 g, $u_c$ = 0.35 mg; that is, quantity an uncertainty are reported separatedly, and thus they may be expressed in different units. 2) $m_S$ = 100.02147(35) g, where the number in parentheses is the value of $u_c$ referred to the corresponding last digits of the reported quantity. 3) $m_S$ = 100.02147(0.00035) g, where the number in parentheses is the value of $u_c$ expressed in the unit of the reported quantity. 4) $m_S$ = (100.02147 $\pm$ 0.00035), where the number following the symbol $\pm$ is the value of $u_c$ in the unit of the reported quantity.

The second scheme is the most compact one, and it is the default reporting mode in the errors package. The fourth scheme is also supported given that it is a very extended notation, but the GUM discourages its use to prevent confusion with confidence intervals.

In the same lines, the BIMP also publishes the International System of Units (SI), which consist of seven base units and derived units, many of them with special names and symbols. Units are reported after the corresponding quantity using products of powers of symbols (e.g., 1 N = 1 m kg s-2).

Available parsers

The quantities package provides three methods that parse units and uncertainty following the GUM's recommendations:

parse_quantities(): The returned value is always a quantities object.
- If no uncertainty was found, a zero error is assumed for all values.
- If no units were found, all values are supposed to be unitless.
parse_errors(): The returned value is always an errors object.
- If no uncertainty was found, a zero error is assumed for all values.
- If units were found, a warning is emitted.
parse_units(): The returned value is always a units object.
- If uncertaint was found, a warning is emitted.
- If no units were found, all values are supposed to be unitless.

Given a rectangular data file, such as a CSV file, it can be read with any CSV reader (e.g., base read.csv, readr's read_csv or data.table's fread). Then, a proper parser can be used to convert columns as required.

(d.quantities <- d.units <- d.errors <- read.csv(textConnection("
quantities,        units,  errors
1.02(5) g,         1.02 g, 1.02(5)
2.51(0.01) V,      2.51 V, 2.51(0.01)
(3.23 +/- 0.12) m, 3.23 m, 3.23 +/- 0.12"), stringsAsFactors=FALSE))

library(quantities)

for (name in names(d.quantities)) {
  message(name)
  d.quantities[[name]] <- parse_quantities(d.quantities[[name]])
  d.units[[name]] <- parse_units(d.units[[name]])
  d.errors[[name]] <- parse_errors(d.errors[[name]])
}

d.quantities
d.units
d.errors