rename: Rename variables according to a specified dictionary
In dankelley/oce: Analysis of Oceanographic Data

rename

R Documentation

Rename variables according to a specified dictionary

Description

There are many conventions for naming oceanographic variables, and this function provides a way to map names in data files to names to be used in an object created from those files.

Usage

rename(x, dictionary = "ioos", debug = 0)

Arguments

`x`	either an oce object, the elements of which will be renamed, or NULL. In the latter case, the dictionary is returned as a data frame, which can be useful for users who want to use `rbind()` to append dictionary elements of their own, thus customizing the action of `rename()`.
`dictionary`	either a string or a data frame. If a string, then it is either the name of a built-in vocabulary (either `"ioos"` or `"sbe"`) or the name of a CSV file that defines a dictionary in a four-column format as described in ‘Details’. If it is a data frame, then it must hold four columns that follow the same pattern as in the CSV style.
`debug`	an integer specifying whether debugging information is to be printed during the processing. This is a general parameter that is used by many `oce` functions. Generally, setting `debug=0` turns off the printing, while higher values suggest that more information be printed. If one function calls another, it usually reduces the value of `debug` first, so that a user can often obtain deeper debugging by specifying higher `debug` values.

Details

The dictionary format, whether read from a built-in CSV file, or from a user-supplied CSV file, or as a data frame, contains four character-valued columns, as follows.

The original name of a variable in the data slot of x. This is used in matching such names against targets. Matches may be in the form of equality, or regexp match. In the latter case, a ⁠#⁠ character may be used as an abbreviation for a digit. Note that ^ is inserted at the start of the value, and $ at the end, before searching for a match with grep().
The desired oce-convention name to be used for a match. Many files will yield duplicates, e.g. for multiple temperature sensors, so unduplicateNames() is called after all names are processed, to avoid problems.
The unit for the column, typically in a format handled by expression(). Note that this value is ignored if the object already holds stated units for the quantity in question.
The scale for the column (again, only used if the object does not already hold a scale).

The built-in dictionaries are stored in locations

system.file("extdata", "dictionary_codas.csv", package = "oce")
system.file("extdata", "dictionary_ioos.csv", package = "oce")
system.file("extdata", "dictionary_sbe.csv", package = "oce")

The data for these come from References 1, 2 and 3, respectively. The format is simple, consisting of 4 columns, with no header. The column entries are as follows.

The first column holds a specialized regular expression for the variable name as stored in the datafile. This is conventional, except that ⁠#⁠ is a stand-in for the regular expression ⁠[0-9]⁠ (that is, a single digit). Formulating these expressions requires a bit of care, so it can make sense to look at the dictionary_sbe.csv file to get some hints.
The second column holds the oce name.
The third column is the unit.
The fourth column is the scale.

In many cases, the third and fourth columns are empty, and even if values are provided, they will be superceded by values within the data file.

As an example, the entry

PSALST##,salinity,,PSS-78

indicates that a variable named "PSALT" followed by 2 digits is to be renamed as "salinity", that the unit (if not already defined within x) is to be blank, and that the scale (again, if not already defined within x) is to be "PSS-78".

History and Plans

This function was written in late September, 2024. It is likely to evolve through the remaining months of 2024, after real-world testing by the developers.

Author(s)

Dan Kelley

References

CODAS naming convention https://currents.soest.hawaii.edu/docs/adcp_doc/UHDAS_OPERATIONS/UHDAS_atsea/adcp_access/read_netCDF.html
IOOS naming convention https://cfconventions.org/Data/cf-standard-names/78/build/cf-standard-name-table.html
The SBE names come from a processing manual that was once at ⁠http://www.seabird.com/document/sbe-data-processing-manual⁠, but as of summer 2018, this no longer seems to be provided by SeaBird. A web search will turn up copies of the manual that have been put online by various research groups and data-archiving agencies. On 2018-07-05, the latest version was named SBEDataProcessing_7.26.4.pdf and had release date 12/08/2017; this was the reference version used in coding oce.

Examples

library(oce)
# Example 1: made-up data
d <- new("oce")
d <- oceSetData(d, "S", c(30, 31))
d <- oceSetData(d, "T", c(10, 11))
dictText <- "S,salinity,,
T,temperature,degree*C,ITS-90"
dictionary <- read.csv(text = dictText, header = FALSE)
rename(d, dictionary)
#
# Example 2: a CIOOS NetCDF file. Note that this file
# is downloaded and removed at the end; in practice,
# it is likely that the file might be retained locally.
if (requireNamespace("curl")) {
    file <- tempfile(fileext = ".nc") # removed later
    server <- "https://cioosatlantic.ca/erddap/files"
    program <- "bio_atlantic_zone_monitoring_program_ctd"
    subprogram <- "Bedford%20Basin%20Monitoring%20Program"
    year <- 2023
    cast <- 1
    url <- sprintf(
        "%s/%s/%s/%s/CTD_BCD%s667_%03d_1_DN.ODF.nc",
        server, program, subprogram, year, year, cast
    )
    t <- try(curl::curl_download(url, file), silent = TRUE)
    if (!inherits(t, "try-error")) {
        d <- read.netcdf(file)
        summary(d)
        dd <- rename(d, "ioos")
        summary(dd)
    } else {
        message("Cannot connect to ", url)
    }
    unlink(file)
}

dankelley/oce documentation built on June 11, 2025, 6:11 p.m.