| select_rate | R Documentation |
The functions in respR are powerful, but outputs can be large
and difficult to explore, especially when there are hundreds to thousands
of results, for example the output of auto_rate on large datasets, or the
outputs of calc_rate.int from long intermittent-flow experiments.
The select_rate and select_rate.ft functions help explore, reorder, and
filter convert_rate and convert_rate.ft results according to various
criteria. For example, extracting only positive or negative rates, only the
highest or lowest rates, only those from certain data regions, and numerous
other methods that allow advanced filtering of results so the final
selection of rates is well-defined towards the research question of
interest. This also allows for highly consistent reporting of results and
rate selection criteria.
Multiple selection criteria can be applied by saving the output and
processing it through the function multiple times using different methods,
or alternatively via piping (%>% or %>%). See Examples.
Note: when choosing a method, keep in mind that to remain
mathematically consistent, respR outputs oxygen consumption (i.e.
respiration) rates as negative values. This is particularly important in
the difference between highest/lowest and minimum/maximum methods. See
Details.
When a rate result is omitted by the selection criteria, it is removed from
the $rate.output element of the convert_rate object, and the associated
data in $summary (i.e. that row) is removed. Some methods can also be
used with an n = NULL input to reorder the $rate and $summary
elements in various ways.
The summary table $rank column is context-specific, and what it
represents depends on the type of experiment analysed or the function used
to determine the rates. If numeric values were converted, it is the order
in which they were entered. Similarly, if calc_rate was used, it is the
order of rates as entered using from and to (if multiple rates were
determined). For auto_rate it relates to the method input. For example
it indicates the kernel density ranking if the linear method was used,
the ascending or descending ordering by absolute rate value if lowest or
highest were used, or the numerical order if minimum or maximum were
used. For intermittent-flow experiments analysed via calc_rate.int and
auto_rate.int these will be ranked within each replicate as indicated
in the $rep column. The $rep and $rank columns can be used to keep
track of selection or reordering because the original values will be
retained unchanged through selection or reordering operations. The original
order can always be restored by using method = "rep" or method = "rank"
with n = NULL. In both these cases the $summary table and
$rate.output will be reordered by $rep (if used) then $rank to
restore the original ordering.
Note that if you are analysing intermittent-flow data and used
auto_rate.int but changed the n input to output more than one rate
result per replicate, the selection or reordering operations will not take
any account of this. You should carefully consider if or why you need to
output multiple rates per replicate in the first place. If you have, you
can perform selection on individual replicates by using method = "rep" to
select individual replicates then apply additional selection criteria.
select_rate(x, method = NULL, n = NULL)
select_rate.ft(x, method = NULL, n = NULL)
x |
list. An object of class |
method |
string. Method by which to select or reorder rate results. For most methods matching results are retained in the output. See Details. |
n |
numeric. Number, percentile, or range of results to retain or omit
depending on |
These are the current methods by which rates in convert_rate
objects can be selected. Matching results are retained in the output.
Some methods can also be used to reorder the results. Note that the methods
selecting by rate value operate on the $rate.output element, that is the
final converted rate value.
positive, negativeSelects all positive (>0) or negative (<0) rates. n is ignored.
Useful, for example, in respirometry on algae where both oxygen consumption
and production rates are recorded. Note, respR outputs oxygen consumption
(i.e. respiration) rates as negative values, production rates as
positive.
nonzero, zeroRetains all nonzero rates (i.e. removes any zero rates), or retains
only zero rates (i.e. removes all rates with any value). n is
ignored.
lowest, highestThese methods can only be used when rates all have the same sign, that is
are all negative or all positive. These select the lowest and highest
absolute rate values. For example, if rates are all negative, method = 'highest' will retain the highest magnitude rates regardless of the
sign. n should be an integer indicating the number of lowest/highest
rates to retain. If n = NULL the results will instead be reordered by
lowest or highest rate without any removed. See minimum and maximum
options for extracting numerically lowest and highest rates.
lowest_percentile, highest_percentileThese methods can also only be used when rates all have the same sign.
These retain the n'th lowest or highest percentile of absolute rate
values. For example, if rates are all negative method = 'highest_percentile' will retain the highest magnitude n'th percentile
regardless of the sign. n should be a percentile value between 0 and 1.
For example, to extract the lowest 10th percentile of absolute rate values,
you would enter method = 'lowest_percentile', n = 0.1.
minimum, maximumIn contrast to lowest and highest, these are strictly numerical
options which take full account of the sign of the rate, and can be used
where rates are a mix of positive and negative. For example, method = 'minimum' will retain the minimum numerical value rates, which would
actually be the highest oxygen uptake rates. n is an integer indicating
how many of the min/max rates to retain. If n = NULL the results will
instead be reordered by minimum or maximum rate without any removed.
minimum_percentile, maximum_percentileLike min and max these are strictly numerical inputs which retain the
n'th minimum or maximum percentile of the rates and take full account of
the sign. Here n should be a percentile value between 0 and 1. For
example, if rates are all negative (i.e. typical uptake rates), to extract
the lowest 10th percentile of rates, you would enter method = 'maximum_percentile', n = 0.1. This is because the lowest negative rates
are numerically the maximum rates (highest/lowest percentile methods
would be a better option in this case however).
rateAllows you to enter a value range of output rates to be retained. Matching
regressions in which the rate value falls within the n range (inclusive)
are retained. n should be a vector of two values. For example, to retain
only rates where the rate value is between 0.05 and 0.08: method = 'rate', n = c(0.05, 0.08). Note this operates on the $rate.output
element, that is converted rate values.
rep, rankThese refer to the respective columns of the $summary table. For these,
n should be a numeric vector of integers of rep or rank values to
retain. To retain a range use regular R syntax, e.g. n = 1:10. If n = NULL no results will be removed, instead the results will be reordered
ascending by rep (if it contains values) then rank. Essentially this
restores the original ordering if other reordering operations have been
performed.
The values in these columns depend on the functions used to calculate
rates. If calc_rate was used, rep is NA and rank is the order of
rates as entered using from and to (if multiple rates were determined).
For auto_rate, rep is NA and rank relates to the method input.
For example it indicates the kernel density ranking if the linear method
was used, the ascending or descending ordering by absolute rate value if
lowest or highest were used, or by numerical order if minimum or
maximum were used. If calc_rate.int or auto_rate.int were used, rep
indicates the replicate number and the rank column represents rank
within the relevant replicate, and will generally be filled with the
value 1. Therefore you need to adapt your selection criteria
appropriately towards which of these columns is relevant.
rep_omit, rank_omitThese refer to the rep and rank columns of the $summary table and
allow you to exclude rates from particular replicate or rank values. For
these, n should be a numeric vector of integers of rep or rank values
to OMIT. To omit a range use regular R syntax, e.g. n = 1:10.
rsq, row, time, densityThese methods refer to the respective columns of the $summary data frame.
For these, n should be a vector of two values. Matching regressions in
which the respective parameter falls within the n range (inclusive) are
retained. To retain all rates with a R-Squared 0.90 or above: method = 'rsq', n = c(0.9, 1). The row and time ranges refer to the
$row-$endrow or $time-$endtime columns and the original raw data
($dataframe element of the convert_rate object), and can be used to
constrain results to rates from particular regions of the data (although
usually a better option is to subset_data() prior to analysis). Note
time is not the same as duration - see later section - and row refers
to rows of the raw data, not rows of the summary table - see manual
method for this. For all of these methods, if n = NULL no results will be
removed, instead the results will be reordered by that respective column
(descending for rsq and density, ascending for row, and time).
intercept, slopeThese methods are similar to the above and refer to the intercept_b0 and
slope_b1 summary table columns. Note these linear model coefficients
represent different things in flowthrough vs. other analyses. In
non-flowthrough analyses slopes represent rates and coefficients such as a
high r-squared are important. In flowthrough, slopes represent the
stability of the data region, in that the closer the slope is to zero, the
less the delta oxygen values in that region vary, which is an indication of
a region of stable rates. In addition, intercept values close to the
calculated mean delta of the region also indicate a region of stable rates.
Therefore these methods are chiefly useful in selection of flowthrough
results, for example slopes close to zero. If n = NULL no results will be
removed, instead the results will be reordered by ascending value by that
column.
time_omit, row_omitThese methods refer to the original data, and are intended to exclude
rates determined over particular data regions. This is useful in the case
of, for example, a data anomaly such as a spike or sensor dropout. For
these inputs, n are values (a single value, multiple values, or a range)
indicating data timepoints or rows of the original data to exclude. Only
rates (i.e. regressions) which do not utilise those particular values are
retained in the output. For example, if an anomaly occurs precisely at
timepoint 3000, time_omit = 3000 means only rates determined solely over
regions before or after this will be retained. If it occurs over a range
this can be entered as, time_omit = c(3000,3200). If you want to exclude
a regular occurrence, for example the flushes in intermittent-flow
respirometry, or any other non-continuous values they can be entered as a
vector, e.g. row_omit = c(1000, 2000, 3000). Note this last option can be
extremely computationally intensive when the vector or dataset is large, so
should only be used when a range cannot be entered as two values, which is
much faster. For both methods, input values must match exactly to values
present in the dataset.
oxygenThis can be used to constrain rate results to regions of the data based on
oxygen values. n should be a vector of two values in the units of oxygen
in the raw data. Only rate regressions in which all datapoints occur within
this range (inclusive) are retained. Any which use even a single value
outside of this range are excluded. Note the summary table columns oxy
and endoxy refer to the first and last oxygen values in the rate
regression, which should broadly indicate which results will be removed or
retained, but this method examines every oxygen value in the regression,
not just first and last.
oxygen_omitSimilar to time_omit and row_omit above, this can be used to omit
rate regressions which use particular oxygen values. For this n are
values (single or multiple) indicating oxygen values in the original raw
data to exclude. Every oxygen value used by each regression is checked, and
to be excluded an n value must match exactly to one in the data.
Therefore, note that if a regression is fit across the data region where
that value would occur, it is not necessarily excluded unless that exact
value occurs. You need to consider the precision of the data values
recorded. For example, if you wanted to exclude any rate using an oxygen
value of 7, but your data are recorded to two decimals, a rate fit across
these data would not be excluded: c(7.03, 7.02, 7.01, 6.99, 6.98, ...).
To get around this you can use regular R syntax to input vectors at the
correct precision, such as seq, e.g. seq(from = 7.05, to = 6.96, by = -0.01). This can be used to input ranges of oxygen values to exclude.
durationThis method allows selection of rates which have a specific duration range.
Here, n should be a numeric vector of two values. Use this to set minimum
and maximum durations in the time units of the original data. For example,
n = c(0,500) will retain only rates determined over a maximum of 500 time
units. To retain rates over a minimum duration, set this using the minimum
value plus the maximum duration or simply infinity. For example, for rates
determined over a minimum of 500 time units n = c(500,Inf))
manualThis method simply allows particular rows of the $summary data frame to
be manually selected to be retained. For example, to keep only the top row
method = 'manual', n = 1. To keep multiple rows use regular R selection
syntax: n = 1:3, n = c(1,2,3), n = c(5,8,10), etc. No value of n
should exceed the number of rows in the $summary data frame. Note this is
not necessarily the same as selecting by the rep or rank methods, as
the table could already have undergone selection or reordering.
manual_omitAs above, but this allows particular rows of the $summary data frame to
be manually selected to be omitted.
overlapThis method removes rates which overlap, that is regressions which are
partly or completely fit over the same rows of the original data. This is
useful in particular with auto_rate results. The auto_rate linear
method may identify multiple linear regions, some of which may
substantially overlap, or even be completely contained within others. In
such cases summary operations such as taking an average of the rate values
may be questionable, as certain values will be weighted higher due to these
multiple, overlapping results. This method removes overlapping rates, using
n as a threshold to determine degree of permitted overlap. It is
recommended this method be used after all other selection criteria have
been applied, as it is quite aggressive about removing rates, and can be
very computationally intensive when there are many results.
While it can be used with auto_rate results determined via the rolling,
lowest, or highest methods, by their nature these methods produce all
possible overlapping regressions, ordered in various ways, so other
selection methods are more appropriate. The overlap method is generally
intended to be used in combination with the auto_rate linear results,
but may prove useful in other analyses.
Permitted overlap is determined by n, which indicates the proportion of
each particular regression which must overlap with another for it to be
regarded as overlapping. For example, n = 0.2 means a regression would
have to overlap with at least one other by at least 20% of its total length
to be regarded as overlapping.
The "overlap" method performs two operations:
First, regardless of the n value, any rate regressions which are
completely contained within another are removed. This is also the only
operation if n = 1.
Secondly, for each regression in $summary starting from the bottom of the
summary table (usually the lowest ranked result, but this depends on the
analysis used and if any reordering has been already occurred), the
function checks if it overlaps with any others (accounting for n). If
not, the next lowest is checked, and the function progresses up the summary
table until it finds one that does. The first to be found overlapping is
then removed, and the process repeats starting again from the bottom of the
summary table. If no reordering to the results has occurred, this means
lower ranked results are removed first. This is repeated iteratively until
only non-overlapping rates (accounting for n) remain.
If n = 0, only rates which do not overlap at all, that is share no
data, are retained. If n = 1, only rates which are 100% contained within
at least one other are removed.
Several methods can be used to reorder results rather than select them, by
not entering an n input (that is, letting the n = NULL default be
applied). Several of these methods are named the same as those in
auto_rate for consistency and have equivalent outcomes, so this allows
results to be reordered to the equivalent of that method's results without
re-running the auto_rate analysis.
The "row" and "rolling" methods reorder sequentially by the starting
row of each regression ($row column).
The "time" method reorders sequentially by the starting time of each
regression ($time column).
"linear" and "density" are essentially identical, reordering by the
$density column. This metric is only produced by the auto_rate linear
method, so will not work with any other results.
"rep" or "rank" both reorder by the $rep then $rank columns. What
these represents is context dependent - see Replicate and Rank columns
section above. Each summary row rep and rank value is retained
unchanged regardless of how the results are subsequently selected or
reordered, so this will restore the original ordering after other methods
have been applied.
"rsq" reorders by $rsq from highest value to lowest.
"intercept" and "slope" reorder by the $intercept_b0 and $slope_b1
columns from lowest value to highest.
"highest" and "lowest" reorder by absolute values of the $rate.output
column, that is highest or lowest in magnitude regardless of the sign. They
can only be used when rates all have the same sign.
"maximum" and "minimum" reorder by numerical values of the
$rate.output column, that is maximum or minimum in numerical value taking
account of the sign, and can be used when rates are a mix of negative and
positive.
For convert_rate objects which contain rates which have been converted
from numeric values, the summary table will contain a limited amount of
information, so many of the selection or reordering methods will not work.
In this case a warning is given and the original input is returned.
There is no plotting functionality in select_rate. However since the
output is a convert_rate object it can be plotted. See the Plot
section in help("convert_rate"). To plot straight after a selection
operation, pipe or enter the output in plot(). See Examples.
This help file can be found online here, where it is much easier to read.
For additional help, documentation, vignettes, and more visit the respR
website at https://januarharianto.github.io/respR/
The output of select_rate is a list object which retains the
convert_rate class, with an additional convert_rate_select class
applied.
It contains two additional elements: $original contains the original,
unaltered convert_rate object, which will be retained unaltered through
multiple selection operations, that is even after processing through the
function multiple times. $select_calls contains the calls for every
selection operation that has been applied to the $original object, from
the first to the most recent. These additional elements ensure the output
contains the complete, reproducible history of the convert_rate object
having been processed.
## Object to filter
ar_obj <- inspect(intermittent.rd, plot = FALSE) %>%
auto_rate(plot = FALSE) %>%
convert_rate(oxy.unit = "mg/L",
time.unit = "s",
output.unit = "mg/h",
volume = 2.379) %>%
summary()
## Select only negative rates
ar_subs_neg <- select_rate(ar_obj, method = "negative") %>%
summary()
## Select only rates over 1000 seconds duration
ar_subs_dur <- select_rate(ar_obj, method = "duration", n = c(1000, Inf)) %>%
summary()
## Reorder rates sequentially (i.e. by starting row)
ar_subs_dur <- select_rate(ar_obj, method = "row") %>%
summary()
## Select rates with r-squared higher than 0.99,
## then select the lowest 10th percentile of the remaining rates,
## then take the mean of those
inspect(squid.rd, plot = FALSE) %>%
auto_rate(method = "linear",
plot = FALSE) %>%
convert_rate(oxy.unit = "mg/L",
time.unit = "s",
output.unit = "mg/h",
volume = 2.379) %>%
summary() %>%
select_rate(method = "rsq", n = c(0.99, 1)) %>%
select_rate(method = "lowest_percentile", n = 0.1) %>%
mean()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.