Measurement units in R

knitr::opts_chunk$set(collapse = TRUE, fig.asp = 0.7, fig.width = 7)

This vignette is identical to @rj, except for two changes:

Abstract

We briefly review SI units, and discuss R packages that deal with measurement units, their compatibility and conversion. Built upon the UNIDATA udunits library, we introduce the package units that provides a class for maintaining unit metadata. When used in expression, it automatically converts units, and simplifies units of results when possible; in case of incompatible units, errors are raised. The class flexibly allows expansion beyond predefined units. Using units may eliminate a whole class of potential scientific programming mistakes. We discuss the potential and limitations of computing with explicit units.

Introduction

Two quotes from @cobb -- Data are not just numbers, they are numbers with a context and in data analysis, context provides meaning -- illustrate that for a data analysis to be meaningful, knowledge of the data's context is needed. Pragmatic aspects of this context include who collected or generated the data, how this was done, and for which purpose [@scheider]; semantic aspects concern what the data represents: which aspect of the world do the data refer to, when and where were they measured, and what a value of 1 means.

R does allow for keeping some context with data, for instance

Furthermore, if spatial objects as defined in package sp [@sp] have a proper coordinate reference system set, they can be transformed to other datums, or converted to various flat (projected) representations of the Earth [@iliffe].

In many cases however, R drops contextual information. As an example, we look at annual global land-ocean temperature index (from http://climate.nasa.gov/vital-signs/global-temperature/) since 1960:

temp_data = subset(read.table("647_Global_Temperature_Data_File.txt", 
    header=TRUE)[1:2], Year >= 1960)
temp_data$date = as.Date(paste0(temp_data$Year, "-01-01"))
temp_data$time = as.POSIXct(temp_data$date)
Sys.setenv(TZ="UTC")
head(temp_data, 3)
year_duration = diff(temp_data$date)
mean(year_duration)

Here, the time difference units are reported for the difftime object year_duration, but if we would use it in a linear algebra operation

year_duration %*% rep(1, length(year_duration)) / length(year_duration)

the unit is dropped. Similarly, for linear regression coefficients we see

coef(lm(Annual_Mean ~ date, temp_data))
coef(lm(Annual_Mean ~ time, temp_data))

where the unit of change is in degrees Celsius but either per day (date) or per second (time). For purely mathematical manipulations, R often strips context from numbers when it is carried in attributes, the linear algebra routines being a prime example.

Most variables are somehow attributed with information about their units, which specify what the value 1 of this variable represents. This may be counts of something, e.g. 1 apple, but it may also refer to some physical unit, such as distance in meter. This article discusses how strong unit support can be introduced in R.

SI

The BIPM (Bureau International des Poids et Mesures) is "the intergovernmental organization through which Member States act together on matters related to measurement science and measurement standards. Its recommended practical system of units of measurement is the International System of Units (Système International d'Unités, with the international abbreviation SI) (https://www.bipm.org/en/measurement-units/)".

@si describe the SI units, where, briefly, SI units

base quantities, SI units and their symbols (from @si, p. 23):

| Base quantity | | SI base unit | | | -----------|--------|----------|------| | Name |Symbol|Name | Symbol | length | $l,x,r,$ etc.| meter | m | | mass | $m$ | kilogram | kg | | time, duration | $t$ | second | s | | electric current| $I, i$ | ampere | A | | thermodynamic temperature | $T$ | kelvin | K | | amount of substance | $n$ | mole | mol | | luminous intensity | $I_v$ | candela | cd |

Related work in R

Several R packages provide unit conversions. For instance, measurements [@measurements] provides a collection of tools to make working with physical measurements easier. It converts between metric and imperial units, or calculates a dimension's unknown value from other dimensions' measurements. It does this by the conv_unit function:

library(measurements)
conv_unit(2.54, "cm", "inch")
conv_unit(c("101 44.32","3 19.453"), "deg_dec_min", "deg_min_sec")
conv_unit(10, "cm_per_sec", "km_per_day")

but uses for instance kph instead of km_per_hour, and then m3_per_hr for flow -- unit names seem to come from convention rather than systematic composition. Object conv_unit_options contains all 173 supported units, categorized by the physical dimension they describe:

names(conv_unit_options)
conv_unit_options$volume

Function conv_dim allows for the conversion of units in products or ratios, e.g.

conv_dim(x = 100, x_unit = "m", trans = 3, trans_unit = "ft_per_sec", y_unit = "min")

computes how many minutes it takes to travel 100 meters at 3 feet per second.

Package NISTunits [@NISTunits] provides fundamental physical constants (Quantity, Value, Uncertainty, Unit) for SI and non-SI units, plus unit conversions, based on the data from NIST (National Institute of Standards and Technology). The package provides a single function for every unit conversion; all but 5 from its 896 functions are of the form NISTxxxTOyyy where xxx and yyy refer to two different units. For instance, converting from W m$^{-2}$ to W inch$^{-2}$ is done by

library(NISTunits)
NISTwattPerSqrMeterTOwattPerSqrInch(1:5)

Both measurements and NISTunits are written entirely in R.

UNIDATA's udunits library

Udunits, developed by UCAR/UNIDATA, advertises itself on its web page as: "The udunits package supports units of physical quantities. Its C library provides for arithmetic manipulation of units and for conversion of numeric values between compatible units. The package contains an extensive unit database, which is in XML format and user-extendable."

Unlike the measurements and NISTunits, the underlying udunits2 C library parses units as expressions, and bases its logic upon the convertibility of expressions, rather than the comparison of fixed strings:

m100_a = paste(rep("m", 100), collapse = "*")
dm100_b = "dm^100"
units::ud_are_convertible(m100_a, dm100_b)

This has the advantage that through complex computations, intermediate objects can have units that are arbitrarily complex, and that can potentially be simplified later on. It also means that the package practically supports an unlimited amount of derived units.

Udunits versus the Unified Code for Units of Measure (UCUM)

Another set of encodings for measurement units is the Unified Code for Units of Measure (UCUM, @ucum). A dedicated web site\footnote{\url{http://coastwatch.pfeg.noaa.gov/erddap/convert/units.html}} describes the details of the differences between udunits and UCUM, and provides a conversion service between the two encoding sets.

The UCUM website refers to some Java implementations, but some of the links seem to be dead. UCUM is the preferred encoding for standards from the Open Geospatial Consortium. udunits on the other hand is the units standard of choice by the climate science community, and is adopted by the CF (Climate and Forecast) conventions, which mostly uses NetCDF. NetCDF [@netcdf] is a binary data format that is widely used for atmospheric and climate model predictions.

The udunits library is a C library that has strong support from UNIDATA, and we decided to build our developments on this, rather than on Java implementations of UCUM with a less clear provenance.

Handling data with units in R: the units package

The units package builds units objects from scratch, e.g. where

library(units)
x = set_units(1:5, m/s)
str(x)

represents speed values in m/s. The units m and s are resolved from the udunits2 C library (but could be user-defined units).

Units can be used in arbitrary R expressions like

set_units(1:3, m/s^2)

Several manipulations with units objects will now be illustrated. Manipulations that do not involve unit conversion are for instance addition:

x = set_units(1:3, m/s)
x + 2 * x

Explicit unit conversion is done by assigning new units:

(x = set_units(x, cm/s))
as.numeric(x)

similar to the behaviour of difftime objects, this modifies the numeric values without modifying their meaning (what the numbers refer to).

When mixing units in sums, comparisons or concatenation, units are automatically converted to those of the first argument:

y = set_units(1:3, km/h)
x + y
y + x
x == y
c(y, x)

where c(y, x) concatenates y and x after converting x to the units of y. Derived units are created where appropriate:

x * y
x^3

and meaningful error messages appear when units are not compatible:

e = try(z <- x + x * y)
attr(e, "condition")[[1]]

The full set of methods and method groups for units objects is shown by

methods(class = "units")

where the method groups

When possible, new units are simplified:

a = set_units(1:10, m/s)
b = set_units(1:10, h)
a * b
ustr1 = paste(rep("m", 101), collapse = "*")
ustr2 = "dm^100"
as_units(ustr1) / as_units(ustr2)

Units are printed as simple R expressions, e.g.

set_units(1, m^5/s^4)

Another way to print units commonly seen in Climate and Forecast Conventions\footnote{CF, \url{http://cfconventions.org/Data/cf-standard-names/34/build/cf-standard-name-table.html}} is m2 s-1 for m$^2$/s. These are not R expressions, but they can be parsed by as_units, and created by deparse_unit:

as_units("m2 s-1")
deparse_unit(set_units(1, m^2*s^-1))

The plot and hist methods add units to default axis labels, an example is shown in the following figures. For ggplot2 plots [@ggplot2], automatic unit placement in default axis label is also provided; demo(ggplot2) gives an example.

library(units)
units_options(negative_power = TRUE)
# initialize variables with units:
mtcars$consumption = set_units(mtcars$mpg, mi/gallon)
# "in" is also a reserved R keyword, and so needs back-quotes:
mtcars$displacement = set_units(mtcars$disp, `in`^3)
# convert to SI:
mtcars$consumption = set_units(mtcars$consumption, km/l)
mtcars$displacement = set_units(mtcars$displacement, cm^3)
par(mar = par("mar") + c(0, .3, 0, 0))
with(mtcars, plot(1/displacement, 1/consumption))
library(ggplot2)
ggplot(mtcars) + geom_point(aes(x = 1/displacement, y = 1/consumption))

Automatic conversion between units and difftime is provided:

(dt = diff(Sys.time() + c(0, 1, 1+60, 1+60+3600))) # class difftime
(dt.u = as_units(dt))
identical(as_difftime(dt.u), dt)

as well as to and from POSIXct or Date:

(t1 <- as_units(as.POSIXct("2017-08-20 17:03:00")))
(t2 <- as_units(as.POSIXct("2017-08-20 17:03:00"), "hours since 2017-08-20"))
(d1 <- as_units(as.Date("2017-08-20")))
as.POSIXct(t1)
as.Date(d1)

Objects of class units can be used as columns in data.frame objects, as well as in tbl_df [@tibble]. They can also be matrix or array, with the constraint that a single unit holds for all elements.

Discussion and conclusions

The units R package provides a new class, units, for numeric data with associated measurement units. Operations on objects of this class retain the unit metadata and provide automated dimensional analysis: dimensions are taken into consideration in computations and comparisons. Combining different units that are compatible triggers automatic unit conversion, derived units are automatically generated and simplified where possible, and meaningful error messages are given when a user tries to add objects with incompatible units. This verifies that computations are not only syntactically and numerically allowed, but also semantically, and in the case of physical units, physically allowed, which may support code verification and provenance tracking. Using this package may eliminate a whole class of potential scientific programming mistakes.

Where the R packages measurements and NISTunits provide conversion between a fixed number of units, with the help of the udunits2 C library and unit database, R package units handles arbitrarily complex derived units. By treating units as expressions it can derive, convert and simplify units. In addition, beyond the SI units packaged, units handles user-defined units.

Data in units vectors can be stored as columns in data.frame or tbl_df objects, and can be converted to and from difftime. When units objects have associated time and location information, they could be stored in spatial or spatio-temporal objects provided by sp or spacetime [@spacetime] as these store attribute data in data.frame slots, but for instance not in zoo [@zoo] or xts [@xts] objects, as these latter two set the class attribute of a vector or matrix.

Despite all standardization efforts, units may still be ambiguous, or subject to interpretation. For instance for the duration of one year NISTunits gives us an answer that depends on whether we want a common, leap, Gregorian, Julian, tropical or siderial year (@lang, see also demo(year)). This illustrates that those who apply unit conversion should be aware of possible pitfalls. Support for calendars in udunits seems not as well developed as in R.

Future work includes extending packages that read external data from formats, databases or interfaces with support for measurement unit information into R, preserving the measurement unit information. Examples would be interfaces to HDF5 (e.g., h5, @h5), RNetCDF [@RNetCDF] or sos4R [@sos4R]. It would be nice to see units of measurements propagate into units of regression coefficient estimates.

Acknowledgements

We acknowledge three anonymous reviewers and the handling editor for their constructive comments, and Thomas Lin Pedersen for implementing the ggplot extensions in package ggforce (ported to the units package since v0.8-0) that automatically add units to default ggplot axis labels.

References



Try the units package in your browser

Any scripts or data that you put into this service are public.

units documentation built on Sept. 14, 2023, 1:06 a.m.