check_manually: Manual Data Flagging

View source: R/Quality_checking.R

check_manuallyR Documentation

Manual Data Flagging

Description

A wrapper for exclude that allows visual inspection of selected variables in a data frame and manual flagging of values to be discarded. Saving and reloading of results is supported.

Usage

check_manually(
  x,
  path = ".",
  vars,
  qc_prefix = "qc_",
  qc_suffix = NULL,
  interactive = FALSE,
  siteyear = NULL,
  tname = "timestamp",
  shift.by = NULL,
  with_units = FALSE,
  win_size = 672,
  format = "%Y-%m-%d %H:%M"
)

Arguments

x

A data frame.

path

A string. Specifies a path to directory where results should be saved.

vars

A character vector, matrix or data frame providing names of variables in data frame x that will be inspected. If character vector, each value is iteratively used as argument x in exclude. If matrix or data frame, first, second and third column are respectively interpreted as arguments x (quality checked variable), y and z (auxiliary variables) in exclude and used iteratively across rows. If auxiliary variables are not needed for certain combinations (vars rows), provide NA values.

qc_prefix, qc_suffix

A string. Quality control columns corresponding to vars names are required in x in format qc_prefix+vars+qc_suffix. If vars is matrix or data frame, qc_prefix and qc_suffix is applied only for the first column. Set to NULL if either qc_prefix or qc_suffix is not applicable.

interactive

A logical value. If TRUE, manual checking will be provided in an interactive session. If FALSE, previously created file with manual flags will be reloaded or NULL will be returned.

siteyear

A string. Unique label for the saved manual_QC CSV file if no file with "manual_QC" pattern was found in path.

tname

A string. Name of variable in x with date-time information.

shift.by

An integer value specifying the time shift (in seconds) to be applied to the date-time information of the reloaded manual QC if present.

with_units

A logical value indicating whether read (or written) data frame with manual flags includes (should include) also units.

win_size

An integer. Number of values displayed per plot.

format

A string. Format of tname date-time information.

Details

Automatic reload of previously saved results in path with filename including pattern "manual_QC" is attempted. If found, timestamp is merged with date-time information in x if not identical and quality control is combined with the new flags marked by user (flag 2 marks data exclusion). Proper alignment of timestamps can be assured by shift.by. path is also used for saving file ("manual_QC" pattern) with results. Actual flagging allows to run exclude over all vars. Each variable is required to have associated quality control column in format qc_prefix+vars+qc_suffix.

Function can be run in two modes. If interactive = TRUE, attempt to load previously saved manual QC will be performed, user will be allowed to flag data manually in interactive session and save (merged) results. If you just want to reload previously saved results, use interactive = FALSE.

Value

A data frame with flags 0 (marking accepted points) and flags 2 (marking excluded points). Results can be written to path.

If interactive = FALSE, and no file in path with pattern "manual_QC", NULL is returned.

See Also

exclude, locator, combn_QC, strptime_eddy, merge.

Examples

## Not run: 
# prepare mock data
set.seed(87)
NEE <- sin(seq(pi / 2, 2.5 * pi, length = 48)) * 10
NEE[NEE > 5] <- 5
t <- seq(ISOdate(2020, 7, 1, 0, 15), ISOdate(2020, 7, 14, 23, 45), "30 mins")
PAR <- (-NEE + 5) * 100
Tair <- rep(-cos(seq(0, 2 * pi, length = 48)), 14)
Tair <- Tair * 2 + 15 + seq(0, 5, length = 48 * 14)
Rn <- PAR / 2 - 50
H <- Rn * 0.7
LE <- Rn * 0.3

# combine into data frame
a <- data.frame(
  timestamp = t,
  H = H + rnorm(48 * 14),
  qc_H = sample(c(0:2, NA), 672, replace = TRUE, prob = c(5, 3, 2, 1)),
  LE = LE + rnorm(48 * 14),
  qc_LE = sample(c(0:2, NA), 672, replace = TRUE, prob = c(5, 3, 2, 1)),
  NEE = NEE + rnorm(48 * 14),
  qc_NEE = sample(c(0:2, NA), 672, replace = TRUE, prob = c(5, 3, 2, 1)),
  PAR = PAR,
  Tair = Tair,
  Rn = Rn
)

# introduce outliers
a$H[c(97, 210, 450, 650)] <- c(-300, 2000, -800, 3200)
a$LE[c(88, 182, 350, 550)] <- c(900, -400, -1000, 2000)
a$NEE[c(10, 152, 400, 500)] <- c(50, -100, 70, -250)

# single variable example without auxiliary variables
man <- check_manually(a, vars = "H", interactive = TRUE,
                      siteyear = "MySite2022")
summary_QC(man, names(man)[-1])

# multiple vars provided as vector (without auxiliary variables)
man <- check_manually(a, vars = c("H", "LE", "NEE"),
                      interactive = TRUE)

# multiple vars provided as matrix (including auxiliary variables)
man <- check_manually(a,
                      vars = cbind(
                        c("H", "LE", "NEE"), # main variables (x)
                        c("Rn", "Rn", "PAR") # auxiliary variables (y)
                      ),
                      interactive = TRUE)

# two sets of auxiliary variables
# - "missing_var" not present in "a", thus handled as if NA was provided
man <- check_manually(a,
                      vars = cbind(
                        c("H", "LE", "NEE"),      # main variables (x)
                        c("Rn", "Rn", NA),        # auxiliary variables (y)
                        c("missing_var", "H", NA) # auxiliary variables (z)
                      ),
                      interactive = TRUE)

# multiple vars provided as data frame (including two sets of auxiliary vars)
man <- check_manually(a,
                      vars = data.frame(
                        x = c("H", "LE", "NEE"),
                        y = c("Rn", "Rn", "PAR"),
                        z = c("LE", "H", "Tair")
                      ),
                      interactive = TRUE)

## End(Not run)


lsigut/openeddy documentation built on Aug. 5, 2023, 12:25 a.m.