remap_vars: Remap Variables

View source: R/Data_handling.R

remap_varsR Documentation

Remap Variables

Description

Extracts and renames specified columns of a data frame, computes mean in case of regular expression pattern matching multiple column names or initializes one if missing.

Usage

remap_vars(x, new, source, regexp = FALSE, qc = NULL, na.rm = TRUE)

Arguments

x

data frame

new

A character vector of new column names for remapping.

source

A vector of x column names matching new to remap. If regexp = TRUE, character vector containing regular expressions.

regexp

A logical value. If FALSE (the default), source will be interpreted literally. If TRUE, source elements will be used as grep patterns.

qc

A character string. A regular expression grep pattern identifying x column names that carry quality control information for respective source.

na.rm

A logical value indicating whether NA values should be stripped before the computation proceeds. na.rm is used only if regexp = TRUE and multiple columns identified by source are combined by averaging.

Details

New data frame is created based on x and specified source. Original x names are changed according to respective new elements and kept as varnames attributes for traceability. Accordingly, if qc is specified, quality control (QC) columns are marked by "qc_" prefix.

qc is specified as the character string pattern that distinguishes QC columns from the actual respective variables. Ideally, prefix should be used for QC columns. E.g. in the case of "var" and "qcode_var", qc = "qcode_". QC column can be also marked by suffix. E.g. in the case of "var_qcode", qc = "_qcode". The atypical case of QC marked by both prefix and suffix can be handled too. E.g. in the case of "prefix_var_suffix", qc = "prefix_|_suffix". In case of other exceptions, new and source can be used to define the QC remapping explicitly.

If regexp = FALSE (the default), strictly one variable (column) will be remapped to new name. The source elements must exactly match x names, otherwise expected column is initialized with NAs. If qc is specified, strictly one respective quality control column will be renamed or skipped if not present.

If regexp = TRUE, multiple columns can match the source element regular expression pattern. In that case rowMeans are produced and names of averaged columns kept as varnames attributes for traceability. Similarly, also quality control flags are averaged over available columns if qc is specified. Note that variable names need to have unique patterns in order to achieve expected results. E.g. precipitation abbreviated as P will have overlap with PAR; instead, Precip or sumP can be used.

varnames attribute is expected. If not automatically assigned to x through read_eddy when read from a file, they should be assigned before remapping to keep documentation (especially if multiple columns are combined to a single one).

Value

A data frame with attributes varnames and units assigned to each respective column.

See Also

varnames.

Examples

# Simulate soil temperature profile at different depths/positions
Ts_profile <- data.frame(
  timestamp = seq(c(ISOdate(2023,1,1,0,30)), by = "30 mins", length.out = 5)
  )
head(Ts_profile)
set.seed(42) # makes random numbers reproducible
cm_0 <- paste0("Ts_0.00_", c("N", "E", "S", "W"))
Ts_profile[cm_0] <- data.frame(replicate(4, rnorm(5)))
head(Ts_profile)
cm_10 <- paste0("Ts_0.10_", c("N", "E", "S", "W"))
Ts_profile[cm_10] <- data.frame(replicate(4, rnorm(5, 5)))
head(Ts_profile)
Ts_profile$Ts_0.20_E <- rnorm(5, 10)
head(Ts_profile)
Ts_profile[paste0("qc_", c(cm_0, cm_10, "Ts_0.20_E"))] <- 0
varnames(Ts_profile) <- names(Ts_profile)
str(Ts_profile)
Ts_profile <- Ts_profile[sample(varnames(Ts_profile))]
head(Ts_profile)

# Literal remapping with regexp = FALSE
literal_remapping <- data.frame(
  orig_varname = c("timestamp", "Ts_0.00_N", "Ts_0.10_N", "Ts_0.20_E"),
  renamed_varname = c("TIMESTAMP", "TS_1_1_1", "TS_1_2_1", "TS_2_3_1")
  )
literal_remapping

rmap1 <- remap_vars(Ts_profile,
                    literal_remapping$renamed_varname,
                    literal_remapping$orig_varname,
                    qc = "qc_")
str(rmap1)

# Remapping based on string patterns with regexp = TRUE
regexp_remapping <- data.frame(
  orig_varname = c("timestamp", "Ts_0.00", "Ts_0.10", "Ts_0.20"),
  renamed_varname = c("TIMESTAMP", "Tsoil_0cm", "Tsoil_10cm", "Tsoil_20cm")
  )
regexp_remapping

rmap2 <- remap_vars(Ts_profile,
                    regexp_remapping$renamed_varname,
                    regexp_remapping$orig_varname,
                    regexp = TRUE,
                    qc = "qc_")
# Notice that if pattern matches multiple columns, they are averaged
str(rmap2)


lsigut/openeddy documentation built on Aug. 5, 2023, 12:25 a.m.