Statement of Need

Oceanographic field experiments often employ a suite of instrument types, each reporting data in a different format. Many of these formats are complex and difficult to decode. Although manufacturers usually provide software for accessing data produced by their instruments, it tends to be proprietary and closed-source. This is a problem for researchers seeking to analyse their data in novel ways, or combine data from multiple instruments. The oce package [@kelley__aut_oce_2021] addresses such issues by providing functions that handle dozens of data formats. In addition, it facilitates specialized calculations and data displays that are particular to the discipline. Since oce is written in the R language [@ihaka_r_1996;@r_core_team_introduction_2021], it forms a link to an array of more general tools that oceanographers may need in their work [@kelley_oceanographic_2018].

Overview

The oce package has been hosted on CRAN [@noauthor_comprehensive_2021] since the year 2009. The CRAN version, which is updated once or twice a year, may be installed by typing install.packages("oce") in an R console. Users who need newer features may use remotes::install_github("dankelley/oce",ref="develop") to download and build the development branch. Those wishing to view or participate in the development process are welcome to do so, at \url{https://github.com/dankelley/oce}.

The package has functions for decoding many data formats. These functions return S4 objects with slots holding (a) the data, (b) related metadata, and (c) a log of oce functions that made the object. This is illustrated by executing the following in an R session, for a built-in object creating by reading a profiling instrument called a CTD.

library(oce)
options(width=70)
knitr::opts_chunk$set(fig.path="", dev="png", dpi=300, pointsize=8)
data(ctd)
library(oce)                           # load library
data(ctd)                              # load a built-in sample file
slotNames(ctd)                         # see 'slot' names

The next step after loading an object, or reading it from a data file, is often to get a textual overview with summary(ctd), or a graphical overview, e.g. with plot(ctd) producing Figure 1. It is also common to exert fine-grained control of graphical representations, with e.g. plot(ctd, which="temperature") to plot just the temperature variation with depth (results not shown here). The variations of other properties may be shown by setting which appropriately, and this argument can also be used to specify other types of plots, in addition to the depth-variation form.

plot(ctd)

Besides this "ctd" subclass, oce supports dozens of other subclasses that cover a wide range of oceanographic instrumentation. In every case, the same "summary()" and "plot()" function calls provide textual and graphical representations of the data. This specialization of these two generic functions simplifies analysis considerably. For example, if PATTERN is a regular expression that specifies a set of data files, whether of a single instrument type or multiple instrument types, then

for (file in list.files(PATTERN)) {
    d <- read.oce(file)
    summary(d)
    plot(d)
}

will provide information about each data file of interest, forming a good first stage of analysis.

Oce also provides other generic functions, including subset() for focusing on subsets of data, handleFlags() for processing data-quality flags, and [[ for accessing data. The last of these is particularly worthy of note, for two reasons.

  1. [[ finds information regardless of where it is stored in the object. For example, a CTD does not measure longitude and latitude, but if these things are known, they are stored in the metadata slot, not the data slot. Other objects might have longitude in the data slot. This detail is immaterial to users, because [[ looks in both slots. Therefore, code written for one object type will often work for another type.

  2. [[ can access not just information stored within the object, but also things that can be calculated from that information. For example, CTD files typically hold information which seawater density may be computed [@millero_history_2010;@mcdougall_getting_2011], and so [[ is set up to compute it, if requested. This same scheme works for other computable elements.

The [[ function acts as a sort of bridge from the oceanographic realm to the general R realm, with its thousands of useful and well-vetted packages. This reduces the need to create new tools, letting analysts focus on oceanography, not coding.

Example: Tidal Analysis

A more detailed example may help to solidify some of the key aspects of oce. Many readers will have an interest in tides, so we will work with a year-long record of sea level, $\eta=\eta(t)$ in Halifax Harbour, in the year 2003, during which the city was struck by Hurricane Juan.

Consider the code given below, which produces Figure 2. A built-in sealevel file is used, to make a reproducible example, but replacing the data() call with a read.sealevel() call will handle data files in standard formats. Note that the tidem() function is fairly sophisticated with over 500 lines of R code being used to apply the specialized procedures of tidal analysis [@godin_analysis_1972;@pawlowicz_classical_2002;@foreman_versatile_2009]. Readers who see that the function evokes the lm() function for linear models, may not be surprised that oce provides a function named predict(), for generating tidal predictions.

library(oce)                           # load library
data(sealevel)                         # use built-in example dataset
t <- sealevel[["time"]]                # extract time
eta <- sealevel[["elevation"]]         # extract sea level
m <- tidem(sealevel)                   # fit tidal model
etaDetided <- eta - predict(m)         # de-tide observations
par(mfrow=c(2, 1))                     # set up a two-panel plot
oce.plot.ts(t, eta, xaxs="i",          # top: observed sea level
    grid=TRUE, ylab="Sea level [m]")
oce.plot.ts(t, etaDetided, xaxs="i",   # bottom: de-tided sea level
    grid=TRUE, ylab="De-tided sea level [m]")

A comparison of the panels in Figure 2 reveals that tides explain much of the sea level variation in Halifax Harbour. The lower panel illustrates an increase of detided variance during the winter months, as expected at a northern mid-latitude. More surprising is the large spike towards the end of September. This is a result of Hurricane Juan, which swept over Halifax at that time, causing a storm surge of approximately 1.5m that, along with high waves, caused major damage in the harbour [@xu_extreme_2012]. (Readers might find it informative to supply an xlim argument to the plot calls, to narrow in on the event.)

Conclusions

The oce package provides for many aspects of oceanographic analysis, having evolved in an open-source environment for more than a decade. The developers have benefited from a supportive user community, members of which have contributed insightful bug reports and suggestions for improvements. New features are added continually, to handle new instrument types, new data repositories, and new methods. Physical oceanography is a major focus of the package, but we hope this paper will generate interest in other communities, ranging from climatologists to those in marine disciplines such as chemistry and biology. Our other goal is to encourage the development of new R packages, such as argoFloats [@kelley_argofloats_2021], that build upon oce.

References



dankelley/oce documentation built on April 18, 2024, 9:51 a.m.