Nothing
#' @title Select rate results based on a range of criteria
#'
#' @description The functions in `respR` are powerful, but outputs can be large
#' and difficult to explore, especially when there are hundreds to thousands
#' of results, for example the output of `auto_rate` on large datasets, or the
#' outputs of `calc_rate.int` from long intermittent-flow experiments.
#'
#' The `select_rate` and `select_rate.ft` functions help explore, reorder, and
#' filter `convert_rate` and `convert_rate.ft` results according to various
#' criteria. For example, extracting only positive or negative rates, only the
#' highest or lowest rates, only those from certain data regions, and numerous
#' other methods that allow advanced filtering of results so the final
#' selection of rates is well-defined towards the research question of
#' interest. This also allows for highly consistent reporting of results and
#' rate selection criteria.
#'
#' Multiple selection criteria can be applied by saving the output and
#' processing it through the function multiple times using different methods,
#' or alternatively via piping (`%>%` or `%>%`). See Examples.
#'
#' *Note:* when choosing a `method`, keep in mind that to remain
#' mathematically consistent, `respR` outputs oxygen consumption (i.e.
#' respiration) rates as negative values. This is particularly important in
#' the difference between `highest/lowest` and `minimum/maximum` methods. See
#' Details.
#'
#' When a rate result is omitted by the selection criteria, it is removed from
#' the `$rate.output` element of the `convert_rate` object, and the associated
#' data in `$summary` (i.e. that row) is removed. Some methods can also be
#' used with an `n = NULL` input to reorder the `$rate` and `$summary`
#' elements in various ways.
#'
#' ## Replicate and Rank columns
#'
#' The summary table `$rank` column is context-specific, and what it
#' represents depends on the type of experiment analysed or the function used
#' to determine the rates. If numeric values were converted, it is the order
#' in which they were entered. Similarly, if `calc_rate` was used, it is the
#' order of rates as entered using `from` and `to` (if multiple rates were
#' determined). For `auto_rate` it relates to the `method` input. For example
#' it indicates the kernel density ranking if the `linear` method was used,
#' the ascending or descending ordering by absolute rate value if `lowest` or
#' `highest` were used, or the numerical order if `minimum` or `maximum` were
#' used. For intermittent-flow experiments analysed via `calc_rate.int` and
#' `auto_rate.int` these will be ranked *within* each replicate as indicated
#' in the `$rep` column. The `$rep` and `$rank` columns can be used to keep
#' track of selection or reordering because the original values will be
#' retained unchanged through selection or reordering operations. The original
#' order can always be restored by using `method = "rep"` or `method = "rank"`
#' with `n = NULL`. In both these cases the `$summary` table and
#' `$rate.output` will be reordered by `$rep` (if used) then `$rank` to
#' restore the original ordering.
#'
#' Note that if you are analysing intermittent-flow data and used
#' `auto_rate.int` but changed the `n` input to output more than one rate
#' result per replicate, the selection or reordering operations will not take
#' any account of this. You should carefully consider if or why you need to
#' output multiple rates per replicate in the first place. If you have, you
#' can perform selection on individual replicates by using `method = "rep"` to
#' select individual replicates then apply additional selection criteria.
#'
#' @details These are the current methods by which rates in `convert_rate`
#' objects can be selected. Matching results are *retained* in the output.
#' Some methods can also be used to reorder the results. Note that the methods
#' selecting by rate value operate on the `$rate.output` element, that is the
#' final converted rate value.
#'
#' ## `positive`, `negative`
#'
#' Selects all `positive` (>0) or `negative` (<0) rates. `n` is ignored.
#' Useful, for example, in respirometry on algae where both oxygen consumption
#' and production rates are recorded. Note, `respR` outputs oxygen consumption
#' (i.e. respiration) rates as *negative* values, production rates as
#' *positive*.
#'
#' ## `nonzero`, `zero`
#'
#' Retains all `nonzero` rates (i.e. removes any zero rates), or retains
#' *only* `zero` rates (i.e. removes all rates with any value). `n` is
#' ignored.
#'
#' ## `lowest`, `highest`
#'
#' These methods can only be used when rates all have the same sign, that is
#' are all negative or all positive. These select the lowest and highest
#' ***absolute*** rate values. For example, if rates are all negative, `method
#' = 'highest'` will retain the highest magnitude rates regardless of the
#' sign. `n` should be an integer indicating the number of lowest/highest
#' rates to retain. If `n = NULL` the results will instead be reordered by
#' lowest or highest rate without any removed. See `minimum` and `maximum`
#' options for extracting *numerically* lowest and highest rates.
#'
#' ## `lowest_percentile`, `highest_percentile`
#'
#' These methods can also only be used when rates all have the same sign.
#' These retain the `n`'th lowest or highest percentile of ***absolute*** rate
#' values. For example, if rates are all negative `method =
#' 'highest_percentile'` will retain the highest magnitude `n`'th percentile
#' regardless of the sign. `n` should be a percentile value between 0 and 1.
#' For example, to extract the lowest 10th percentile of absolute rate values,
#' you would enter `method = 'lowest_percentile', n = 0.1`.
#'
#' ## `minimum`, `maximum`
#'
#' In contrast to `lowest` and `highest`, these are *strictly numerical*
#' options which take full account of the sign of the rate, and can be used
#' where rates are a mix of positive and negative. For example, `method =
#' 'minimum'` will retain the minimum numerical value rates, which would
#' actually be the highest oxygen uptake rates. `n` is an integer indicating
#' how many of the min/max rates to retain. If `n = NULL` the results will
#' instead be reordered by minimum or maximum rate without any removed.
#'
#' ## `minimum_percentile`, `maximum_percentile`
#'
#' Like `min` and `max` these are *strictly numerical* inputs which retain the
#' `n`'th minimum or maximum percentile of the rates and take full account of
#' the sign. Here `n` should be a percentile value between 0 and 1. For
#' example, if rates are all negative (i.e. typical uptake rates), to extract
#' the lowest 10th percentile of rates, you would enter `method =
#' 'maximum_percentile', n = 0.1`. This is because the *lowest* negative rates
#' are numerically the *maximum* rates (`highest/lowest` percentile methods
#' would be a better option in this case however).
#'
#' ## `rate`
#'
#' Allows you to enter a value range of output rates to be retained. Matching
#' regressions in which the rate value falls within the `n` range (inclusive)
#' are retained. `n` should be a vector of two values. For example, to retain
#' only rates where the `rate` value is between 0.05 and 0.08: `method =
#' 'rate', n = c(0.05, 0.08)`. Note this operates on the `$rate.output`
#' element, that is converted rate values.
#'
#' ## `rep`, `rank`
#'
#' These refer to the respective columns of the `$summary` table. For these,
#' `n` should be a numeric vector of integers of `rep` or `rank` values to
#' retain. To retain a range use regular R syntax, e.g. `n = 1:10`. If `n =
#' NULL` no results will be removed, instead the results will be reordered
#' ascending by `rep` (if it contains values) then `rank`. Essentially this
#' restores the original ordering if other reordering operations have been
#' performed.
#'
#' The values in these columns depend on the functions used to calculate
#' rates. If `calc_rate` was used, `rep` is `NA` and `rank` is the order of
#' rates as entered using `from` and `to` (if multiple rates were determined).
#' For `auto_rate`, `rep` is `NA` and `rank` relates to the `method` input.
#' For example it indicates the kernel density ranking if the `linear` method
#' was used, the ascending or descending ordering by absolute rate value if
#' `lowest` or `highest` were used, or by numerical order if `minimum` or
#' `maximum` were used. If `calc_rate.int` or `auto_rate.int` were used, `rep`
#' indicates the replicate number and the `rank` column represents rank
#' *within* the relevant replicate, and will generally be filled with the
#' value `1`. Therefore you need to adapt your selection criteria
#' appropriately towards which of these columns is relevant.
#'
#' ## `rep_omit`, `rank_omit`
#'
#' These refer to the `rep` and `rank` columns of the `$summary` table and
#' allow you to exclude rates from particular replicate or rank values. For
#' these, `n` should be a numeric vector of integers of `rep` or `rank` values
#' to OMIT. To omit a range use regular R syntax, e.g. `n = 1:10`.
#'
#' ## `rsq`, `row`, `time`, `density`
#'
#' These methods refer to the respective columns of the `$summary` data frame.
#' For these, `n` should be a vector of two values. Matching regressions in
#' which the respective parameter falls within the `n` range (inclusive) are
#' retained. To retain all rates with a R-Squared 0.90 or above: `method =
#' 'rsq', n = c(0.9, 1)`. The `row` and `time` ranges refer to the
#' `$row`-`$endrow` or `$time`-`$endtime` columns and the original raw data
#' (`$dataframe` element of the `convert_rate` object), and can be used to
#' constrain results to rates from particular regions of the data (although
#' usually a better option is to `subset_data()` prior to analysis). Note
#' `time` is not the same as `duration` - see later section - and `row` refers
#' to rows of the raw data, **not** rows of the summary table - see `manual`
#' method for this. For all of these methods, if `n = NULL` no results will be
#' removed, instead the results will be reordered by that respective column
#' (descending for `rsq` and `density`, ascending for `row`, and `time`).
#'
#' ## `intercept`, `slope`
#'
#' These methods are similar to the above and refer to the `intercept_b0` and
#' `slope_b1` summary table columns. Note these linear model coefficients
#' represent different things in flowthrough vs. other analyses. In
#' non-flowthrough analyses slopes represent rates and coefficients such as a
#' high r-squared are important. In flowthrough, slopes represent the
#' stability of the data region, in that the closer the slope is to zero, the
#' less the delta oxygen values in that region vary, which is an indication of
#' a region of stable rates. In addition, intercept values close to the
#' calculated mean delta of the region also indicate a region of stable rates.
#' Therefore these methods are chiefly useful in selection of flowthrough
#' results, for example slopes close to zero. If `n = NULL` no results will be
#' removed, instead the results will be reordered by ascending value by that
#' column.
#'
#' ## `time_omit`, `row_omit`
#'
#' These methods refer to the original data, and are intended to *exclude*
#' rates determined over particular data regions. This is useful in the case
#' of, for example, a data anomaly such as a spike or sensor dropout. For
#' these inputs, `n` are values (a single value, multiple values, or a range)
#' indicating data timepoints or rows of the original data to exclude. Only
#' rates (i.e. regressions) which *do not* utilise those particular values are
#' retained in the output. For example, if an anomaly occurs precisely at
#' timepoint 3000, `time_omit = 3000` means only rates determined solely over
#' regions before or after this will be retained. If it occurs over a range
#' this can be entered as, `time_omit = c(3000,3200)`. If you want to exclude
#' a regular occurrence, for example the flushes in intermittent-flow
#' respirometry, or any other non-continuous values they can be entered as a
#' vector, e.g. `row_omit = c(1000, 2000, 3000)`. Note this last option can be
#' extremely computationally intensive when the vector or dataset is large, so
#' should only be used when a range cannot be entered as two values, which is
#' much faster. For both methods, input values must match exactly to values
#' present in the dataset.
#'
#' ## `oxygen`
#'
#' This can be used to constrain rate results to regions of the data based on
#' oxygen values. `n` should be a vector of two values in the units of oxygen
#' in the raw data. Only rate regressions in which all datapoints occur within
#' this range (inclusive) are retained. Any which use even a single value
#' outside of this range are excluded. Note the summary table columns `oxy`
#' and `endoxy` refer to the first and last oxygen values in the rate
#' regression, which should broadly indicate which results will be removed or
#' retained, but this method examines *every* oxygen value in the regression,
#' not just first and last.
#'
#' ## `oxygen_omit`
#'
#' Similar to `time_omit` and `row_omit` above, this can be used to *omit*
#' rate regressions which use particular oxygen values. For this `n` are
#' values (single or multiple) indicating oxygen values in the original raw
#' data to exclude. Every oxygen value used by each regression is checked, and
#' to be excluded an `n` value must match *exactly* to one in the data.
#' Therefore, note that if a regression is fit across the data region where
#' that value would occur, it is not necessarily excluded unless that *exact
#' value* occurs. You need to consider the precision of the data values
#' recorded. For example, if you wanted to exclude any rate using an oxygen
#' value of `7`, but your data are recorded to two decimals, a rate fit across
#' these data would *not* be excluded: `c(7.03, 7.02, 7.01, 6.99, 6.98, ...)`.
#' To get around this you can use regular R syntax to input vectors at the
#' correct precision, such as seq, e.g. `seq(from = 7.05, to = 6.96, by =
#' -0.01)`. This can be used to input ranges of oxygen values to exclude.
#'
#' ## `duration`
#'
#' This method allows selection of rates which have a specific duration range.
#' Here, `n` should be a numeric vector of two values. Use this to set minimum
#' and maximum durations in the time units of the original data. For example,
#' `n = c(0,500)` will retain only rates determined over a maximum of 500 time
#' units. To retain rates over a minimum duration, set this using the minimum
#' value plus the maximum duration or simply infinity. For example, for rates
#' determined over a minimum of 500 time units `n = c(500,Inf)`)
#'
#' ## `manual`
#'
#' This method simply allows particular rows of the `$summary` data frame to
#' be manually selected to be retained. For example, to keep only the top row
#' `method = 'manual', n = 1`. To keep multiple rows use regular `R` selection
#' syntax: `n = 1:3`, `n = c(1,2,3)`, `n = c(5,8,10)`, etc. No value of `n`
#' should exceed the number of rows in the `$summary` data frame. Note this is
#' not necessarily the same as selecting by the `rep` or `rank` methods, as
#' the table could already have undergone selection or reordering.
#'
#' ## `manual_omit`
#'
#' As above, but this allows particular rows of the `$summary` data frame to
#' be manually selected to be *omitted*.
#'
#' ## `overlap`
#'
#' This method removes rates which overlap, that is regressions which are
#' partly or completely fit over the same rows of the original data. This is
#' useful in particular with `auto_rate` results. The `auto_rate` `linear`
#' method may identify multiple linear regions, some of which may
#' substantially overlap, or even be completely contained within others. In
#' such cases summary operations such as taking an average of the rate values
#' may be questionable, as certain values will be weighted higher due to these
#' multiple, overlapping results. This method removes overlapping rates, using
#' `n` as a threshold to determine degree of permitted overlap. It is
#' recommended this method be used after all other selection criteria have
#' been applied, as it is quite aggressive about removing rates, and can be
#' *very* computationally intensive when there are many results.
#'
#' While it can be used with `auto_rate` results determined via the `rolling`,
#' `lowest`, or `highest` methods, by their nature these methods produce *all
#' possible* overlapping regressions, ordered in various ways, so other
#' selection methods are more appropriate. The `overlap` method is generally
#' intended to be used in combination with the `auto_rate` `linear` results,
#' but may prove useful in other analyses.
#'
#' Permitted overlap is determined by `n`, which indicates the proportion of
#' each particular regression which must overlap with another for it to be
#' regarded as overlapping. For example, `n = 0.2` means a regression would
#' have to overlap with at least one other by at least 20% of its total length
#' to be regarded as overlapping.
#'
#' The `"overlap"` method performs two operations:
#'
#' First, regardless of the `n` value, any rate regressions which are
#' completely contained within another are removed. This is also the only
#' operation if `n = 1`.
#'
#' Secondly, for each regression in `$summary` starting from the bottom of the
#' summary table (usually the lowest ranked result, but this depends on the
#' analysis used and if any reordering has been already occurred), the
#' function checks if it overlaps with any others (accounting for `n`). If
#' not, the next lowest is checked, and the function progresses up the summary
#' table until it finds one that does. The first to be found overlapping is
#' then removed, and the process repeats starting again from the bottom of the
#' summary table. If no reordering to the results has occurred, this means
#' lower ranked results are removed first. This is repeated iteratively until
#' only non-overlapping rates (accounting for `n`) remain.
#'
#' If `n = 0`, only rates which do not overlap at all, that is share *no*
#' data, are retained. If `n = 1`, only rates which are 100% contained within
#' at least one other are removed.
#'
#' ## Reordering results
#'
#' Several methods can be used to reorder results rather than select them, by
#' not entering an `n` input (that is, letting the `n = NULL` default be
#' applied). Several of these methods are named the same as those in
#' `auto_rate` for consistency and have equivalent outcomes, so this allows
#' results to be reordered to the equivalent of that method's results without
#' re-running the `auto_rate` analysis.
#'
#' The `"row"` and `"rolling"` methods reorder sequentially by the starting
#' row of each regression (`$row` column).
#'
#' The `"time"` method reorders sequentially by the starting time of each
#' regression (`$time` column).
#'
#' `"linear"` and `"density"` are essentially identical, reordering by the
#' `$density` column. This metric is only produced by the `auto_rate` `linear`
#' method, so will not work with any other results.
#'
#' `"rep"` or `"rank"` both reorder by the `$rep` then `$rank` columns. What
#' these represents is context dependent - see **Replicate and Rank columns**
#' section above. Each summary row `rep` and `rank` value is retained
#' unchanged regardless of how the results are subsequently selected or
#' reordered, so this will restore the original ordering after other methods
#' have been applied.
#'
#' `"rsq"` reorders by `$rsq` from highest value to lowest.
#'
#' `"intercept"` and `"slope"` reorder by the `$intercept_b0` and `$slope_b1`
#' columns from lowest value to highest.
#'
#' `"highest"` and `"lowest"` reorder by absolute values of the `$rate.output`
#' column, that is highest or lowest in magnitude regardless of the sign. They
#' can only be used when rates all have the same sign.
#'
#' `"maximum"` and `"minimum"` reorder by numerical values of the
#' `$rate.output` column, that is maximum or minimum in numerical value taking
#' account of the sign, and can be used when rates are a mix of negative and
#' positive.
#'
#' ## Numeric input conversions
#'
#' For `convert_rate` objects which contain rates which have been converted
#' from numeric values, the summary table will contain a limited amount of
#' information, so many of the selection or reordering methods will not work.
#' In this case a warning is given and the original input is returned.
#'
#' ## Plot
#'
#' There is no plotting functionality in `select_rate`. However since the
#' output is a `convert_rate` object it can be plotted. See the **Plot**
#' section in `help("convert_rate")`. To plot straight after a selection
#' operation, pipe or enter the output in `plot()`. See Examples.
#'
#' ## More
#'
#' This help file can be found online
#' [here](https://januarharianto.github.io/respR/reference/select_rate.html),
#' where it is much easier to read.
#'
#' For additional help, documentation, vignettes, and more visit the `respR`
#' website at <https://januarharianto.github.io/respR/>
#'
#' @return The output of `select_rate` is a `list` object which retains the
#' `convert_rate` class, with an additional `convert_rate_select` class
#' applied.
#'
#' It contains two additional elements: `$original` contains the original,
#' unaltered `convert_rate` object, which will be retained unaltered through
#' multiple selection operations, that is even after processing through the
#' function multiple times. `$select_calls` contains the calls for every
#' selection operation that has been applied to the `$original` object, from
#' the first to the most recent. These additional elements ensure the output
#' contains the complete, reproducible history of the `convert_rate` object
#' having been processed.
#'
#' @param x list. An object of class `convert_rate` or `convert_rate_select`.
#' @param method string. Method by which to select or reorder rate results. For
#' most methods matching results are *retained* in the output. See Details.
#' @param n numeric. Number, percentile, or range of results to retain or omit
#' depending on `method`. Default is `NULL`, in which case some methods will
#' instead reorder the results. See Details.
#'
#' @export
#'
#' @importFrom glue glue
#' @importFrom data.table between
#' @importFrom stats quantile
#' @importFrom dplyr arrange desc
#'
#' @examples
#' \donttest{
#' ## Object to filter
#' ar_obj <- inspect(intermittent.rd, plot = FALSE) %>%
#' auto_rate(plot = FALSE) %>%
#' convert_rate(oxy.unit = "mg/L",
#' time.unit = "s",
#' output.unit = "mg/h",
#' volume = 2.379) %>%
#' summary()
#'
#' ## Select only negative rates
#' ar_subs_neg <- select_rate(ar_obj, method = "negative") %>%
#' summary()
#'
#' ## Select only rates over 1000 seconds duration
#' ar_subs_dur <- select_rate(ar_obj, method = "duration", n = c(1000, Inf)) %>%
#' summary()
#'
#' ## Reorder rates sequentially (i.e. by starting row)
#' ar_subs_dur <- select_rate(ar_obj, method = "row") %>%
#' summary()
#'
#' ## Select rates with r-squared higher than 0.99,
#' ## then select the lowest 10th percentile of the remaining rates,
#' ## then take the mean of those
#' inspect(squid.rd, plot = FALSE) %>%
#' auto_rate(method = "linear",
#' plot = FALSE) %>%
#' convert_rate(oxy.unit = "mg/L",
#' time.unit = "s",
#' output.unit = "mg/h",
#' volume = 2.379) %>%
#' summary() %>%
#' select_rate(method = "rsq", n = c(0.99, 1)) %>%
#' select_rate(method = "lowest_percentile", n = 0.1) %>%
#' mean()
#' }
select_rate <- function(x, method = NULL, n = NULL){
## Save function call for output
call <- match.call()
# Checks ------------------------------------------------------------------
## Check for valid convert_rate or convert_rate.ft object
if(!(class.val(x, cnvr = TRUE, cnvr.ft = TRUE)))
stop("select_rate: Input is not a 'convert_rate' or 'convert_rate.ft' object")
## Specify a method
if(is.null(method)) stop("select_rate: Please specify a 'method'")
## Validate method
if(!is.null(method) && !(method %in% c("overlap",
"duration",
"density",
"manual",
"manual_omit",
"time",
"time_omit",
"row",
"row_omit",
"oxygen",
"oxygen_omit",
"rsq",
"intercept",
"slope",
"rep",
"rep_omit",
"rank",
"rank_omit",
"rate",
"rate.output",
"maximum_percentile",
"minimum_percentile",
"maximum",
"minimum",
"lowest_percentile",
"highest_percentile",
"lowest",
"highest",
"zero",
"nonzero",
"negative",
"positive",
"rolling",
"linear"))) stop("select_rate: 'method' input not recognised")
## If numeric conversions, then disallow methods where summary table values are NA
## Can identify as numeric input because dataframe will be NULL
if(is.null(x$dataframe) && method %in% c("overlap",
"duration",
"density",
"time",
"time_omit",
"row",
"row_omit",
"oxygen",
"oxygen_omit",
"rsq",
"intercept",
"slope",
"rolling",
"linear")) stop(glue::glue("select_rate: The '{method}' method is not accepted for 'convert_rate' objects which have been created using numeric inputs."))
## Disallow `density` or `linear` methods for anything other than `auto_rate` linear objects
# If x$dataframe is NOT null (i.e. it is an object input rather than numerics)
if(!(is.null(x$dataframe))) {
# If summary$density is not all NA then it has to be "linear" ar object
# edge case for empty objects too
is.linear <- !(all(is.na(x$summary$density))) || nrow(x$summary) == 0
# If it is NOT linear and method = "density" or "linear", stop
if(method %in% c("density", "linear") && !is.linear)
stop(glue::glue("select_rate: The '{method}' method is only accepted for rates determined in 'auto_rate' via the 'linear' method."))
}
## Disallow 'oxygen' methods from calc_rate.bg objects
## Can't imagine anyone ever actually doing this, but may as well be thorough
if(!(is.null(x$dataframe)) && inherits(x$inputs$x, "calc_rate.bg")) {
if(method %in% c("oxygen", "oxygen_omit"))
stop(glue::glue("select_rate: The '{method}' method is not accepted for 'calc_rate.bg' objects because rates may come from different columns of the dataframe."))
}
## Message for pos and neg rates found
if(any(x$rate.output > 0) && any(x$rate.output < 0)) message("select_rate: Object contains both negative and positive rates. Ensure the chosen `method` is appropriate.")
# Extract data ------------------------------------------------------------
## Save unaltered input for output
input_x <- x
## Extract original raw data
raw_df <- input_x$dataframe
# message here for numerics? - as dataframe will be NULL
# Reordering --------------------------------------------------------------
# Sequential by row
# equiv to "rolling" in auto_rate
# identical to method = "row" with n = NULL
if(method == "rolling"){
message("select_rate: Reordering results by 'rolling' method. 'n' input ignored...")
summ <- x$summary
summ <- arrange(summ, row)
keep <- 0:nrow(summ) # needs to be zero to handle empty summary tables
reordered <- TRUE
}
if(is.null(n) && method == "row"){
message("select_rate: Reordering results by 'row' method.")
summ <- x$summary
summ <- arrange(summ, row)
keep <- 0:nrow(summ)
reordered <- TRUE
}
if(is.null(n) && method == "time"){
message("select_rate: Reordering results by 'time' method.")
summ <- x$summary
summ <- arrange(summ, time)
keep <- 0:nrow(summ)
reordered <- TRUE
}
if(is.null(n) && method == "intercept"){
message("select_rate: Reordering results by 'intercept' method.")
summ <- x$summary
summ <- arrange(summ, intercept_b0)
keep <- 0:nrow(summ)
reordered <- TRUE
}
if(is.null(n) && method == "slope"){
message("select_rate: Reordering results by 'slope' method.")
summ <- x$summary
summ <- arrange(summ, slope_b1)
keep <- 0:nrow(summ)
reordered <- TRUE
}
# by linear method - this is identical to density with n = NULL below
if(method == "linear"){
message("select_rate: Reordering results by 'linear' method. 'n' input ignored...")
summ <- x$summary
summ <- arrange(summ, desc(density))
keep <- 0:nrow(summ)
reordered <- TRUE
}
if(is.null(n) && method == "density"){
message("select_rate: Reordering results by 'density' method.")
summ <- x$summary
summ <- arrange(summ, desc(density))
keep <- 0:nrow(summ)
reordered <- TRUE
}
# Reorder by rep and/or rank column - essentially restores to original order
# should be identical to above if "linear" method
if(is.null(n) && method %in% c("rep", "rank")){
message(glue::glue("select_rate: Reordering results by '{method}' method."))
summ <- x$summary
summ <- arrange(summ, rep, rank)
keep <- 0:nrow(summ)
reordered <- TRUE
}
# Reorder by rsq column
if(is.null(n) && method == "rsq"){
message("select_rate: Reordering results by 'rsq' method.")
summ <- x$summary
summ <- arrange(summ, desc(rsq))
keep <- 0:nrow(summ)
reordered <- TRUE
}
# By lowest absolute value
if(is.null(n) && method == "lowest"){
if(any(x$rate.output > 0) && any(x$rate.output < 0)) stop(glue::glue("select_rate: Object contains both negative and positive rates. \n'{method}' method is intended to find {method} rate in absolute value amongst rates all having the same sign. \nSee 'positive' and 'negative' or 'maximum' and 'minimum' options."))
message("select_rate: Reordering results by 'lowest' method.")
summ <- x$summary
# reorder ascending or descending based on sign
if(any(summ$rate.output < 0)) summ <- arrange(summ, desc(rate.output)) else
summ <- arrange(summ, rate.output)
keep <- 0:nrow(summ)
reordered <- TRUE
}
# By highest absolute value
if(is.null(n) && method == "highest"){
if(any(x$rate.output > 0) && any(x$rate.output < 0)) stop(glue::glue("select_rate: Object contains both negative and positive rates. \n'{method}' method is intended to find {method} rate in absolute value amongst rates all having the same sign. \nSee 'positive' and 'negative' or 'maximum' and 'minimum' options."))
message("select_rate: Reordering results by 'highest' method.")
summ <- x$summary
# reorder ascending or descending based on sign
if(any(summ$rate.output < 0)) summ <- arrange(summ, rate.output) else
summ <- arrange(summ, desc(rate.output))
keep <- 0:nrow(summ)
reordered <- TRUE
}
# By minimum numerical value
if(is.null(n) && method == "minimum"){
message("select_rate: Reordering results by 'minimum' method.")
summ <- x$summary
summ <- arrange(summ, rate.output)
keep <- 0:nrow(summ)
reordered <- TRUE
}
# By maximum numerical value
if(is.null(n) && method == "maximum"){
message("select_rate: Reordering results by 'maximum' method.")
summ <- x$summary
summ <- arrange(summ, desc(rate.output))
keep <- 0:nrow(summ)
reordered <- TRUE
}
# Selection --------------------------------------------------------------
# Positive rates only -----------------------------------------------------
if(method == "positive"){
message("select_rate: Selecting all positive rate values. 'n' input ignored...")
keep <- which(x$rate.output > 0)
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# Negative rates only -----------------------------------------------------
if(method == "negative"){
message("select_rate: Selecting all negative rate values. 'n' input ignored...")
keep <- which(x$rate.output < 0)
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# Nonzero rates only -----------------------------------------------------
if(method == "nonzero"){
message("select_rate: Selecting all non-zero rate values. 'n' input ignored...")
keep <- which(x$rate.output != 0)
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# Zero rates only ---------------------------------------------------------
if(method == "zero"){
message("select_rate: Selecting all zero rate values. 'n' input ignored...")
keep <- which(x$rate.output == 0)
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# lowest rates ------------------------------------------------------------
## Note these are NOT lowest *numerical*, but lowest *absolute* values
## E.g. if all negative will return the highest/least negative n values
if(!is.null(n) && method == "lowest"){
if(any(x$rate.output > 0) && any(x$rate.output < 0)) stop(glue::glue("select_rate: Object contains both negative and positive rates. \n'{method}' method is intended to find {method} rate in absolute value amongst rates all having the same sign. \nSee 'positive' and 'negative' or 'maximum' and 'minimum' options."))
if(n %% 1 != 0 || n < 0) stop("select_rate: For 'lowest' method 'n' must contain only positive integers.")
if(n > length(x$rate.output)) message("select_rate: 'n' input is greater than number of rates in $summary. Nothing to remove.")
message(glue::glue("select_rate: Selecting lowest {n} *absolute* rate values..."))
## if all negative return HIGHEST numerical n
if(all(x$rate.output <= 0)) keep <- sort(tail(order(x$rate.output), n))
## if all positive return LOWEST numerical n
if(all(x$rate.output >= 0)) keep <- sort(head(order(x$rate.output), n))
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# highest rates -----------------------------------------------------------
## Note these are NOT highest *numerical*, but highest *absolute* values
if(!is.null(n) && method == "highest"){
if(any(x$rate.output > 0) && any(x$rate.output < 0)) stop(glue::glue("select_rate: Object contains both negative and positive rates. \n'{method}' method is intended to find {method} rate in absolute value amongst rates all having the same sign. \nSee 'positive' and 'negative' or 'maximum' and 'minimum' options."))
if(n %% 1 != 0 || n < 0) stop("select_rate: For 'highest' method 'n' must contain only positive integers.")
if(n > length(x$rate.output)) message("select_rate: 'n' input is greater than number of rates in $summary. Nothing to remove.")
message(glue::glue("select_rate: Selecting highest {n} *absolute* rate values..."))
## if all negative return LOWEST numerical n
if(all(x$rate.output <= 0)) keep <- sort(head(order(x$rate.output), n))
## if all positive return HIGHEST numerical n
if(all(x$rate.output >= 0)) keep <- sort(tail(order(x$rate.output), n))
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# lowest percentile rates -------------------------------------------------
if(method == "lowest_percentile") {
if(any(x$rate.output > 0) && any(x$rate.output < 0)) stop(glue::glue("select_rate: Object contains both negative and positive rates. \n'{method}' method is intended to find {method} rate in absolute value amongst rates all having the same sign. \nSee 'positive' and 'negative' or 'maximum_percentile' and 'minimum_percentile' options."))
if(n <= 0 || n >= 1) stop("select_rate: For 'percentile' methods 'n' must be between 0 and 1.")
message(glue::glue("select_rate: Selecting lowest {n*100}th percentile of *absolute* rate values..."))
## if all negative return HIGHEST numerical nth percentile
if(all(x$rate.output <= 0)) {
cutoff <- stats::quantile(x$rate.output, 1-n)
keep <- sort(which(x$rate.output %in% x$rate.output[x$rate.output >= cutoff]))
}
## if all positive return LOWEST numerical nth percentile
if(all(x$rate.output >= 0)) {
cutoff <- stats::quantile(x$rate.output, n)
keep <- sort(which(x$rate.output %in% x$rate.output[x$rate.output <= cutoff]))
}
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# highest percentile rates ------------------------------------------------
if(method == "highest_percentile") {
if(any(x$rate.output > 0) && any(x$rate.output < 0)) stop(glue::glue("select_rate: Object contains both negative and positive rates. \n'{method}' method is intended to find {method} rate in absolute value amongst rates all having the same sign. \nSee 'positive' and 'negative' or 'maximum_percentile' and 'minimum_percentile' options."))
if(n <= 0 || n >= 1) stop("select_rate: For 'percentile' methods 'n' must be between 0 and 1.")
message(glue::glue("select_rate: Selecting highest {n*100}th percentile of *absolute* rate values..."))
## if all negative return LOWEST numerical nth percentile
if(all(x$rate.output <= 0)) {
cutoff <- stats::quantile(x$rate.output, n)
keep <- sort(which(x$rate.output %in% x$rate.output[x$rate.output <= cutoff]))
}
## if all positive return LOWEST numerical nth percentile
if(all(x$rate.output >= 0)) {
cutoff <- stats::quantile(x$rate.output, 1-n)
keep <- sort(which(x$rate.output %in% x$rate.output[x$rate.output >= cutoff]))
}
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# min n rates -------------------------------------------------------------
## These are lowest *numerical* values.
## i.e can mix -ve and +ve
## min = lowest/most negative
if(!is.null(n) && method == "minimum") {
if(n %% 1 != 0 || n < 0) stop("select_rate: For 'minimum' method 'n' must contain only positive integers.")
if(n > length(x$rate.output)) message("select_rate: 'n' input is greater than number of rates in $summary. Nothing to remove.")
message(glue::glue("select_rate: Selecting minimum {n} *numerical* rate values..."))
keep <- sort(head(order(x$rate.output), n))
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# max n rates -------------------------------------------------------------
if(!is.null(n) && method == "maximum") {
if(n %% 1 != 0 || n < 0) stop("select_rate: For 'maximum' method 'n' must contain only positive integers.")
if(n > length(x$rate.output)) message("select_rate: 'n' input is greater than number of rates in $summary. Nothing to remove.")
message(glue::glue("select_rate: Selecting maximum {n} *numerical* rate values..."))
keep <- sort(tail(order(x$rate.output), n))
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# min percentile rates ----------------------------------------------------
if(method == "minimum_percentile") {
if(n <= 0 || n >= 1) stop("select_rate: For 'percentile' methods 'n' must be between 0 and 1.")
message(glue::glue("select_rate: Selecting minimum {n*100}th percentile *numerical* rate values..."))
cutoff <- stats::quantile(x$rate.output, n)
keep <- sort(which(x$rate.output %in% x$rate.output[x$rate.output <= cutoff]))
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# max percentile rates ----------------------------------------------------
if(method == "maximum_percentile") {
if(n <= 0 || n >= 1) stop("select_rate: For 'percentile' methods 'n' must be between 0 and 1.")
message(glue::glue("select_rate: Selecting maximum {n*100}th percentile *numerical* rate values..."))
cutoff <- stats::quantile(x$rate.output, 1-n) ## NOTE DIFFERENCE TO ABOVE
keep <- sort(which(x$rate.output %in% x$rate.output[x$rate.output >= cutoff]))
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# rate --------------------------------------------------------------------
if(method == "rate" || method == "rate.output"){
if(length(n) != 2) stop("select_rate: For 'rate' method 'n' must be a vector of two values.")
message(glue::glue("select_rate: Selecting rates with values between {n[1]} and {n[2]}..."))
n_order <- sort(n)
keep <- sort(which(data.table::between(x$rate.output, n_order[1], n_order[2])))
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# rsq ---------------------------------------------------------------------
if(!is.null(n) && method == "rsq"){
if(length(n) != 2) stop("select_rate: For 'rsq' method 'n' must be a vector of two values.")
message(glue::glue("select_rate: Selecting rates with rsq values between {n[1]} and {n[2]}..."))
n_order <- sort(n)
keep <- sort(which(data.table::between(x$summary$rsq, n_order[1], n_order[2])))
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# intercept ---------------------------------------------------------------------
if(!is.null(n) && method == "intercept"){
if(length(n) != 2) stop("select_rate: For 'intercept' method 'n' must be a vector of two values.")
message(glue::glue("select_rate: Selecting rates with intercept values between {n[1]} and {n[2]}..."))
n_order <- sort(n)
keep <- sort(which(data.table::between(x$summary$intercept_b0, n_order[1], n_order[2])))
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# slope ---------------------------------------------------------------------
if(!is.null(n) && method == "slope"){
if(length(n) != 2) stop("select_rate: For 'slope' method 'n' must be a vector of two values.")
message(glue::glue("select_rate: Selecting rates with slope values between {n[1]} and {n[2]}..."))
n_order <- sort(n)
keep <- sort(which(data.table::between(x$summary$slope_b1, n_order[1], n_order[2])))
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# rep --------------------------------------------------------------------
if(!is.null(n) && method == "rep"){
if(all(is.na(x$summary$rep)))
stop("select_rate: All 'rep' are NA so nothing to select!")
if(!(is.numeric(n)) || any(n %% 1 != 0))
stop("select_rate: For 'rep' method 'n' must contain only positive integers.")
if(any(!(dplyr::between(n, min(x$summary$rep), max(x$summary$rep)))))
stop("select_rate: Input for 'n': One or more 'rep' inputs out of range of 'summary$rep' values.")
message(glue::glue("select_rate: Selecting rates from entered 'rep' replicates..."))
n_order <- sort(n)
keep <- sort(unique(unlist(lapply(n_order, function(z) which(z == x$summary$rep)))))
summ <- x$summary
reordered <- FALSE
}
# rep_omit --------------------------------------------------------------------
if(!is.null(n) && method == "rep_omit"){
if(all(is.na(x$summary$rep)))
stop("select_rate: All 'rep' are NA so nothing to select!")
if(!(is.numeric(n)) || any(n %% 1 != 0))
stop("select_rate: For 'rep_omit' method 'n' must contain only positive integers.")
if(any(!(dplyr::between(n, min(x$summary$rep), max(x$summary$rep)))))
stop("select_rate: Input for 'n': One or more 'rep_omit' inputs out of range of 'summary$rep' values.")
message(glue::glue("select_rate: Selecting rates which *ARE NOT* from 'rep_omit' replicates..."))
n_order <- sort(n)
remove <- sort(unique(unlist(lapply(n_order, function(z) which(z == x$summary$rep)))))
keep <- (1:nrow(x$summary))[-remove]
summ <- x$summary
reordered <- FALSE
}
# rank --------------------------------------------------------------------
if(!is.null(n) && method == "rank"){
if(!(all(is.na(x$summary$rep))))
message("select_rate: Note there are multiple replicates present in these results, which may have multiple ranks *within* them. \nEnsure the 'rank' method is the correct one for what you want to do.")
if(!(is.numeric(n)) || any(n %% 1 != 0))
stop("select_rate: For 'rank' method 'n' must contain only positive integers.")
if(any(!(dplyr::between(n, min(x$summary$rank), max(x$summary$rank)))))
stop("select_rate: Input for 'n': One or more 'rank' inputs out of range of 'summary$rank' values.")
message(glue::glue("select_rate: Selecting rates with entered 'rank' values..."))
n_order <- sort(n)
keep <- sort(unique(unlist(lapply(n_order, function(z) which(z == x$summary$rank)))))
summ <- x$summary
reordered <- FALSE
}
# rank_omit --------------------------------------------------------------------
if(!is.null(n) && method == "rank_omit"){
if(!(all(is.na(x$summary$rep))))
message("select_rate: Note there are multiple replicates present in these results, which may have multiple ranks *within* them. \nEnsure the 'rank_omit' method is the correct one for what you want to do.")
if(!(is.numeric(n)) || any(n %% 1 != 0))
stop("select_rate: For 'rank_omit' method 'n' must contain only positive integers.")
if(any(!(dplyr::between(n, min(x$summary$rank), max(x$summary$rank)))))
stop("select_rate: Input for 'n': One or more 'rank_omit' inputs out of range of 'summary$rank' values.")
message(glue::glue("select_rate: Selecting rates with ranks which *ARE NOT* in 'rank_omit' input..."))
n_order <- sort(n)
remove <- sort(unique(unlist(lapply(n_order, function(z) which(z == x$summary$rank)))))
keep <- (1:nrow(x$summary))[-remove]
summ <- x$summary
reordered <- FALSE
}
# row ---------------------------------------------------------------------
if(!is.null(n) && method == "row"){
if(length(n) != 2) stop("select_rate: For 'row' method 'n' must be a vector of two values.")
if(any(n > dim(raw_df)[1])) stop("select_rate: Input for 'n': row inputs out of data frame range.")
message(glue::glue("select_rate: Selecting rates which occur only between original data rows {n[1]} and {n[2]}..."))
n_order <- sort(n)
keep1 <- which(x$summary$row >= n_order[1])
keep2 <- which(x$summary$endrow <= n_order[2])
keep <- keep1[keep1 %in% keep2]
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# row_omit ----------------------------------------------------------------
if(method == "row_omit"){
if(!(is.numeric(n)) || any(n %% 1 != 0))
stop("select_rate: For 'row_omit' method 'n' must contain only positive integers.")
message(glue::glue("select_rate: Selecting rates which *DO NOT* use original data row(s) in 'n' input..."))
if(any(n > dim(raw_df)[1])) stop("select_rate: Input for 'n': row inputs out of data frame range.")
n_order <- sort(n)
# if 1 or 2 values in n, simple selection
if(length(n_order) == 1) n_order <- c(n_order, n_order) ## if single value make 2-vec for code simplicity
if(length(n_order) == 2) {
keep1 <- which(x$summary$row > n_order[2]) ## start is AFTER n
keep2 <- which(x$summary$endrow < n_order[1]) ## end is BEFORE n
keep <- c(keep1, keep2)
# otherwise do this possibly very slow loop
} else {
message("select_rate: For 'row_omit' method using multiple 'n' inputs can be VERY SLOW. A range of lower and upper values is MUCH faster.")
# progress bar object
progress <- txtProgressBar(min = 0, # Minimum value of the progress bar
max = length(n_order), # Maximum value of the progress bar
style = 3, # Progress bar style (also available style = 1 and style = 2)
width = NA, # Progress bar width. Defaults to getOption("width")
char = "=")
# for each n, go through summary table by row
# if it occurs between row and endrow then that row of summary gets removed
remove <- c() # empty obj for loop results
for(i in n_order){
remove_i <- which(apply(x$summary, 1, function(z) any(i %in% z[7]:z[8])))
remove <- sort(unique(c(remove, remove_i)))
setTxtProgressBar(progress, which(i == n_order))
}
close(progress)
cat("\n")
# if nothing to remove, above outputs integer(0), so this is for that...
if(length(remove) > 0) keep <- (1:nrow(x$summary))[-remove] else
keep <- 1:nrow(x$summary)
}
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# time --------------------------------------------------------------------
if(!is.null(n) && method == "time"){
if(length(n) != 2) stop("select_rate: For 'time' method 'n' must be a vector of two values.")
if(any(n < range(raw_df[[1]], na.rm = TRUE)[1]) || any(n > range(raw_df[[1]], na.rm = TRUE)[2]))
stop("select_rate: Input for 'n': time inputs out of time data range.")
message(glue::glue("select_rate: Selecting rates which occur only between times {n[1]} and {n[2]}..."))
n_order <- sort(n)
keep1 <- which(x$summary$time >= n_order[1])
keep2 <- which(x$summary$endtime <= n_order[2])
keep <- keep1[keep1 %in% keep2]
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# time_omit ---------------------------------------------------------------
if(method == "time_omit"){
if(!(is.numeric(n)))
stop("select_rate: For 'time_omit' method 'n' must contain only numeric values of time.")
message(glue::glue("select_rate: Selecting rates which *DO NOT* use time value(s) in 'n' input..."))
if(any(n < range(raw_df[[1]], na.rm = TRUE)[1]) || any(n > range(raw_df[[1]], na.rm = TRUE)[2]))
stop("select_rate: Input for 'n': time inputs out of time data range.")
n_order <- sort(n)
if(length(n_order) == 1) n <- c(n_order, n_order) ## if single value make 2-vec for code simplicity
# if(any(n_order < range(x$dataframe[[1]], na.rm = TRUE)[1]) || any(n_order > range(x$dataframe[[1]], na.rm = TRUE)[2]))
# stop("select_rate: Input for 'n': time inputs out of time data range.")
# if a range, single selection
if(length(n_order) == 2){
# which regs span n?
keep1 <- which(x$summary$time > n_order[2]) ## start is AFTER n
keep2 <- which(x$summary$endtime < n_order[1]) ## end is BEFORE n
keep <- c(keep1, keep2)
# otherwise do this possibly very slow loop
} else {
message("select_rate: For 'time_omit' method, selecting multiple 'n' inputs can be VERY SLOW. \nIf possible, specifying a range using lower and upper values is much faster.")
# progress bar object
progress <- txtProgressBar(min = 0, # Minimum value of the progress bar
max = length(n_order), # Maximum value of the progress bar
style = 3, # Progress bar style (also available style = 1 and style = 2)
width = NA, # Progress bar width. Defaults to getOption("width")
char = "=")
remove <- c() # empty obj for loop results
for(i in n_order){
remove_i <- which(apply(x$summary, 1, function(z)
data.table::between(i, z[9], z[10], incbounds = TRUE)))
remove <- sort(unique(c(remove, remove_i)))
setTxtProgressBar(progress, which(i == n_order))
}
close(progress)
cat("\n")
# if nothing to remove, above outputs integer(0), so this is for that...
if(length(remove) > 0) keep <- (1:nrow(x$summary))[-remove] else
keep <- 1:nrow(x$summary)
}
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# oxygen ------------------------------------------------------------------
if(method == "oxygen"){
if(length(n) != 2) stop("select_rate: For 'oxygen' method 'n' must be a vector of two values.")
message(glue::glue("select_rate: Selecting rates which occur only between oxygen values {n[1]} and {n[2]}..."))
n_order <- sort(n)
# which rates have at least one oxygen value below first value
remove1 <- which(mapply(function(p,q) {any(raw_df[p:q,2] <= n_order[1])},
p = x$summary$row,
q = x$summary$endrow)
)
# which rates have at least one oxygen value above second value
remove2 <- which(mapply(function(p,q) {any(raw_df[p:q,2] >= n_order[2])},
p = x$summary$row,
q = x$summary$endrow)
)
# combine
remove <- sort(unique(c(remove1, remove2)))
# Convert to keep
# if nothing to remove, above outputs integer(0), so this is for that...
if(length(remove) > 0) keep <- (1:nrow(x$summary))[-remove] else
keep <- 1:nrow(x$summary)
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# oxygen_omit -------------------------------------------------------------
if(method == "oxygen_omit"){
if(!(is.numeric(n)))
stop("select_rate: For 'oxygen_omit' method 'n' must contain only numeric values of oxygen.")
message(glue::glue("select_rate: Selecting rates which *DO NOT* use oxygen value(s) in 'n' input..."))
n_order <- sort(n)
# which rates have at least one oxygen_omit value?
remove <- c()
for(z in n_order) {
remove1 <- which(mapply(function(p,q) {any(raw_df[p:q,2] == z)},
p = x$summary$row,
q = x$summary$endrow))
remove <- sort(unique(c(remove1, remove)))
}
# Convert to keep
# if nothing to remove, above outputs integer(0), so this is for that...
if(length(remove) > 0) keep <- (1:nrow(x$summary))[-remove] else
keep <- 1:nrow(x$summary)
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# manual ------------------------------------------------------------------
if(method == "manual"){
# check within range of summary length
if(!all(n %in% 1:nrow(x$summary))) stop("select_rate: For 'manual' method: 'n' values are out of range of $summary data.frame rows...")
message(glue::glue("select_rate: Selecting specified rows of the '$summary' table..."))
n_order <- sort(n)
keep <- n_order
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# manual_omit ------------------------------------------------------------------
if(method == "manual_omit"){
# check within range of summary length
if(!all(n %in% 1:nrow(x$summary))) stop("select_rate: For 'manual' method: 'n' values are out of range of $summary data.frame rows...")
message(glue::glue("select_rate: Selecting rows of the '$summary' table which are NOT in the 'manual_omit' input..."))
n_order <- sort(n)
keep <- (1:nrow(x$summary))[-n_order]
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# density ------------------------------------------------------------------
if(!is.null(n) && method == "density"){
if(length(n) != 2) stop("select_rate: For 'density' method 'n' must be a vector of two values.")
n_order <- sort(n)
message(glue::glue("select_rate: Selecting rates with density values between {n[1]} and {n[2]}..."))
keep <- sort(which(data.table::between(x$summary$density, n_order[1], n_order[2])))
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# duration ----------------------------------------------------------------
if(method == "duration"){
if(length(n) != 2) stop("select_rate: For 'duration' method 'n' must be a vector of two values.")
n_order <- sort(n)
message(glue::glue("select_rate: Selecting rates with duration between {n_order[1]} and {n_order[2]}..."))
durations <- x$summary$endtime-x$summary$time
keep1 <- which(durations >= n_order[1])
keep2 <- which(durations <= n_order[2])
keep <- keep1[keep1 %in% keep2]
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
# overlap -------------------------------------------------------------
if(method == "overlap"){
if(is.null(n)) {
n <- 0
message("select_rate: 'overlap' method - applying default 'n = 0', no overlapping permitted.")
}
if(n < 0 || n > 1)
stop("select_rate: For 'overlap' method 'n' must be between 0 and 1 inclusive.")
message(glue::glue("select_rate: The 'overlap' method can be computationally intensive and may take some time."))
message(glue::glue("select_rate: Selecting rates which overlap by at least {n*100}% ..."))
## extract summary df
df <- x$summary
#### Remove overlapping
## add row width col
df$row_width <- df$endrow - df$row
overlaps <- TRUE # to get while loop started
## while at least one regression is overlapping,
## go from bottom row to top
## the one closest the bottom which is overlapping gets removed
## then repeat while loop
while(any(overlaps)){
## object to save logical test results to
overlaps <- rep(NA, nrow(df))
## loop from last row to first
for(i in nrow(df):1) {
start <- df$row[i] # start of reg in data
end <- df$endrow[i] # end of reg in data
width <- df$row_width[i] # width of reg
overlap <- round(width * n) # allowed overlap
overlaps[i] <- any(df$row[-i]+overlap <= end & df$endrow[-i]-overlap >= start)
}
## if any are inside (contained), go to last one and remove it
if(any(overlaps)) {
remove <- tail(which(overlaps), 1)
df <- df[-remove,]
}
keep <- which(x$summary$rank %in% df$rank)
keep <- sort(keep)
summ <- x$summary
reordered <- FALSE
}
} ### end overlap
# overlap_new -------------------------------------------------------------
# Rewrote this. Can't remember why or what main difference is.
# But doesn't seem to work quite as well.
# Leaving here for now
# if(method == "overlap_new"){
#
# if(is.null(n)) {
# n <- 0
# message("select_rate: 'overlap' method applying default 'n = 0', no overlapping permitted.")
# }
# if(n < 0 || n > 1)
# stop("select_rate: For 'overlap' method 'n' must be between 0 and 1 inclusive.")
# message(glue::glue("select_rate: The 'overlap' method can be computationally intensive and may take some time."))
# message(glue::glue("select_rate: Selecting rates which overlap by at least {n*100}% ..."))
#
# ##### First - remove contained
#
# ## extract summary df
# df <- x$summary
# df$row_width <- df$endrow - df$row
#
# insides <- TRUE # to get while loop started
#
# ## while at least one regression is inside another,
# ## go from bottom row to top
# ## the one closest the bottom which is contained gets removed
# ## then repeat while loop
# while(any(insides)){
# ## object to save logical test results to
# insides <- rep(NA, nrow(df))
# ## loop from last row to first
# for(i in nrow(df):1) {
# start <- df$row[i] # start of reg in data
# end <- df$endrow[i] # end of reg in data
# insides[i] <- any(start >= df$row[-i] & end <= df$endrow[-i])
# }
#
# ## if any are inside (contained), go to last one and remove it
# if(any(insides)) {
# remove <- tail(which(insides), 1)
# df <- df[-remove,]
# }
# }
#
# ## second
#
# max_overlap <- data.frame(rank = NA,
# max_overlap = 1)
#
# while(any(max_overlap[[2]] > n)) {
#
# max_overlap <- data.frame(rank = NA,
# max_overlap = NA)
# for(i in 1:nrow(df)){
# rank <- df[i,1]
# start <- df$row[i] # start of reg in data
# end <- df$endrow[i] # end of reg in data
# width <- df$row_width[i]
#
# overlap_props <- data.frame(t(apply(df[-i,], 1, function(z) {
# c(z[1],
# prop = length(intersect(start:end, z[6]:z[7]))/width)
# })))
# max_overlap[i,] <- cbind(rank,
# max(overlap_props$prop))
#
# }
# # because which.max returns only first result
# remove <- tail(which(max_overlap$max_overlap == max(max_overlap$max_overlap)),1)
#
# df <- df[-remove,]
#
# cat("done", nrow(max_overlap))
# } # end of while loop
#
# keep <- which(x$summary$rank %in% df$rank)
# keep <- sort(keep)
# }
### end overlap_new
# Select object -------------------------------------------------
output <- x
output$summary <- summ[keep,]
output$rate.output <- summ$rate.output[keep]
## save original convert_rate object if it isn't already there
if(is.null(output$original)) output$original <- input_x
## save selection criteria
if(is.null(output$select_calls)) {
output$select_calls <- list()
output$select_calls[[1]] <- call
} else {
output$select_calls[[(length(output$select_calls)+1)]] <- call
}
## Add custom ADDITIONAL class
## Maybe different way of doing this, but:
## 1. for analysis documentation purposes new object should have indication it was an
## object that was manipulated, i.e. not original
## 2. can't replace class because generic S3 functions will stop working
## (unless we just duplicate these for new class)
if(inherits(output, "convert_rate") && !("convert_rate_select" %in% class(output))) class(output) <- c(class(output), "convert_rate_select")
if(inherits(output, "convert_rate.ft") && !("convert_rate.ft_select" %in% class(output))) class(output) <- c(class(output), "convert_rate.ft_select")
## Message
if(reordered) message(glue::glue(
"\n----- Reordering complete. {length(input_x$rate.output)} rate(s) reordered by '{method}' method -----\n\n")) else
message(glue::glue(
"\n----- Selection complete. {length(input_x$rate.output) - length(keep)} rate(s) removed, {length(keep)} rate(s) remaining -----\n\n"))
## Return
return(invisible(output))
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.