subset_data: Subset a 'data.frame', 'inspect', or 'inspect.ft' object

View source: R/subset_data.R

subset_dataR Documentation

Subset a data.frame, inspect, or inspect.ft object

Description

subset_data subsets a data.frame, inspect, or inspect.ft object based on a given set of criteria. The function is ideal for passing only selected regions of data to other functions such as calc_rate() and auto_rate(), either by saving the output as a new object or via the use of pipes (⁠%>%⁠ or ⁠%>%⁠). It is also very useful in analysis of intermittent-flow data, where in a loop each replicate can be extracted and passed to an analytical function such as calc_rate or auto_rate. See examples and vignettes.

Usage

subset_data(x, from = NULL, to = NULL, by = "time", quiet = TRUE)

Arguments

x

data.frame, inspect, or inspect.ft object. The data from which to produce a subset.

from

numeric. The lower bounds of the subset based on the by input.

to

numeric. The upper bounds of the subset based on the by input.

by

string. "time", "row", or "oxygen". Method by which to apply the from and to inputs.

quiet

logical. Controls if a summary of the output is printed to the console. Default is TRUE.

Details

The function can subset data based on ranges of "time", "oxygen", or "row". For data frames, to subset by "time" or "oxygen" the time data is assumed to be in the first column, and oxygen data in the second column. For inspect() and inspect.ft() objects, the data will have been coerced to this structure already. In these cases the ⁠$dataframe⁠ element in the output is replaced by the subset, and in inspect.ft the ⁠$data⁠ element is also subset and replaced. Note for inspect.ft objects, the oxygen data in column 2 will be either out.oxy data or delta.oxy data depending on what was inspected. The function can subset any data frame by row.

When multiple columns are present, for example time in column 1, and multiple columns of oxygen data, the subset object will include all columns. In the case of subsetting by = "oxygen", subsetting is based on the first column of oxygen data only (i.e. column 2), and all subsequent columns are subset between the same rows regardless of oxygen values.

For all methods, if exact matching values of from and to are not present in the data, the closest values are used. For "time" and "row" subsetting, from and to should be in the correct order. No warning or messages are given if the input values are outside those in the data frame. For instance, if to = 100 and there are only 50 rows in the data, the last row (50) will be used instead. The same for from and to time values outside those in the data frame.

For "oxygen" subsetting, from and to are generally interchangeable, and the function will subset data between the first and last occurrences (or closest occurrences) of these values. It works best with generally increasing or decreasing oxygen data, and results may vary with other data such as intermittent flow data or those in inspect.ft objects.

Note for inspect and inspect.ft object inputs: after subsetting the locations of any data issues highlighted when the object was originally inspected will no longer be accurate. If these are important, best practice is to subset the original dataframe, and then process the subset through inspect or inspect.ft.

A summary of the subset can be printed to the console if the default quiet = FALSE is changed to TRUE.

More

For additional help, documentation, vignettes, and more visit the respR website at https://januarharianto.github.io/respR/

Value

Output: If the input is an inspect, or inspect.ft object, the output is an object of the same class containing the subset data. For data.frame inputs the output is a data.table of the subset.

Examples


# Subset by time:
x <- subset_data(squid.rd, from = 2000, to = 4000, by = "time")

# Subset by oxygen:
subset_data(sardine.rd, from = 94, to = 91, by = "oxygen")

# Subset by row:
subset_data(flowthrough.rd, from = 10, to = 750, by = "row")

# Subset multiple columns:
# In this case subsetting is based on the first two columns
subset_data(flowthrough.rd, from = 50, to = 600, by = "time")

# Pass (via piping) only a subset of a dataset to inspect() and auto_rate()
subset_data(sardine.rd, from = 94, to = 91, by = "oxygen") %>%
   inspect(time = 1, oxygen = 2) %>%
   auto_rate()
   

respR documentation built on May 29, 2024, 7:14 a.m.