summ_order: Summarize list of pdqr-functions with order

View source: R/summ_order.R

summ_orderR Documentation

Summarize list of pdqr-functions with order

Description

Functions for ordering the set of pdqr-functions supplied in a list. This might be useful for doing comparative statistical inference for several groups of data.

Usage

summ_order(f_list, method = "compare", decreasing = FALSE)

summ_sort(f_list, method = "compare", decreasing = FALSE)

summ_rank(f_list, method = "compare")

Arguments

f_list

List of pdqr-functions.

method

Method to be used for ordering. Should be one of "compare", "mean", "median", "mode", "midrange".

decreasing

If TRUE ordering is done decreasingly.

Details

Ties for all methods are handled so as to preserve the original order.

Method "compare" is using the following ordering relation: pdqr-function f is greater than g if and only if P(f >= g) > 0.5, or in code summ_prob_true(f >= g) > 0.5 (see pdqr methods for "Ops" group generic family for more details on comparing pdqr-functions). This method orders input based on this relation and order() function. Notes:

  • This relation doesn't define strictly ordering because it is not transitive: there can be pdqr-functions f, g, and h, for which f is greater than g, g is greater than h, and h is greater than f (but should be otherwise). If not addressed, this might result into dependence of output on order of the input. It is solved by first preordering f_list based on method "mean" and then calling order().

  • Because comparing two pdqr-functions can be time consuming, this method becomes rather slow as number of f_list elements grows.

Methods "mean", "median", "mode", and "midrange" are based on summ_center(): ordering of f_list is defined as ordering of corresponding measures of distribution's center.

Value

summ_order() works essentially like order(). It returns an integer vector representing a permutation which rearranges f_list in desired order.

summ_sort() returns a sorted (in desired order) variant of f_list.

summ_rank() returns a numeric vector representing ranks of f_list elements: 1 for the "smallest", length(f_list) for the "biggest".

See Also

Other summary functions: summ_center(), summ_classmetric(), summ_distance(), summ_entropy(), summ_hdr(), summ_interval(), summ_moment(), summ_prob_true(), summ_pval(), summ_quantile(), summ_roc(), summ_separation(), summ_spread()

Examples

d_fun <- as_d(dunif)
f_list <- list(a = d_fun, b = d_fun + 1, c = d_fun - 1)
summ_order(f_list)
summ_sort(f_list)
summ_rank(f_list)

# All methods might give different results on some elaborated pdqr-functions
# Methods "compare" and "mean" are not equivalent
non_mean_list <- list(
  new_d(data.frame(x = c(0.56, 0.815), y = c(1, 1)), "continuous"),
  new_d(data.frame(x = 0:1, y = c(0, 1)), "continuous")
)
summ_order(non_mean_list, method = "compare")
summ_order(non_mean_list, method = "mean")

# Methods powered by `summ_center()` are not equivalent
m <- c(0, 0.2, 0.1)
s <- c(1.1, 1.2, 1.3)
dlnorm_list <- lapply(seq_along(m), function(i) {
  as_d(dlnorm, meanlog = m[i], sdlog = s[i])
})
summ_order(dlnorm_list, method = "mean")
summ_order(dlnorm_list, method = "median")
summ_order(dlnorm_list, method = "mode")

# Method "compare" handles inherited non-transitivity. Here third element is
# "greater" than second (`P(f >= g) > 0.5`), second - than first, and first
# is "greater" than third.
non_trans_list <- list(
  new_d(data.frame(x = c(0.39, 0.44, 0.46), y = c(17, 14, 0)), "continuous"),
  new_d(data.frame(x = c(0.05, 0.3, 0.70), y = c(4, 0, 4)), "continuous"),
  new_d(data.frame(x = c(0.03, 0.40, 0.80), y = c(1, 1, 1)), "continuous")
)
summ_sort(non_trans_list)
## Output doesn't depend on initial order
summ_sort(non_trans_list[c(2, 3, 1)])

pdqr documentation built on May 31, 2023, 8:48 p.m.