... are hard:
Serfling
gives definition based on reasonable looking spatial depth / "directional
quantiles" for functions, for specific constant "functional directions"
$u(t) \equiv \pm c \in \mathbb{R}$ :\
\
So these can also be used to define empirical quantiles relative to a dataset
of curves, not just distributions.\
pros: such quantiles won't necessarily be elements of the dataset, IIUC, they
can interpolate the functions in the dataset\
cons:\
- computation seems (very) hard, for something that might not even work
reliably since the problem is not particularly well posed...?\
- no internal diagnostics, AFAICT\
Define simple functional empirical quantiles based on "how much of the function is below/above how many other functions in the set":
the function whose MEI is the $q \in [0,1]$ sample quantile of the MEIs in the dataset is the $q$ functional quantile. That's it, that's the entire idea. (We're using the simplest version of empirical quantiles here, i.e., direct inverse of ECDF, so no interpolation/averaging/etc)
computation of this is very easy -- about the same complexity as Sun/Genton's efficient MBD-2 implementation: just pointwise ranks of function evals and then averaging those per function.
# simple functional "order statistics" based on "modified epigraph index" mei_order <- function(x, arg = seq_len(ncol(x))) { ranks <- apply(x, 2, rank, na.last = "keep", ties.method = "min") ranks <- nrow(x) - ranks + 1 weights <- { # assign half interval length to 2nd/nxt-to-last points to 1st and last point # assign other half intervals to intermediate points lengths <- diff(arg) / 2 (c(lengths, 0) + c(0, lengths)) / diff(range(arg)) } colSums(t(ranks) * weights, na.rm = TRUE) / nrow(x) } pc1 <- function(mei, mbd) { pc1 <- svd(cbind(mei, mbd))$u[, 1] # flip s.t. high means higher, don'T flip if cor NA (constant) sign <- sign(cor(mei, pc1)) if (!is.na(sign)) pc1 <- sign * pc1 pc1 } pc1_order <- function(x, arg = seq_len(ncol(x))) { mbd <- tidyfun::mbd(x, arg = arg) #rank? mei <- tidyfun::mei(x, arg = arg) #rank? pc1(mei, mbd) } tf_quantile <- function(tf, q, method = c("mei", "pc1")) { assert_numeric(q, lower = 0, upper = 1, any.missing = FALSE) method <- match.arg(method) #! ... should check tfd_reg or tfb tf_mat <- as.matrix(tf) arg <- attr(tf_mat, "arg") rank <- switch(method, "MEI" = mei_order(tf_mat, arg = arg), "pc1" = pc1_order(tf_mat, arg = arg) ) rank_q <- quantile(rank, q, type = 1) #!! type 1: return observed vals only, no interp tf_q <- tf[match(rank_q, rank)] names(tf_q) <- names(rank_q) tf_q }
mbd
is modified 2-band depth, "median" and "quartiles" in middle panel
according to upper definition in gold and blue, and MBD-median in black)
library(tidyverse) library(patchwork) devtools::load_all() #tidyfun n <- 70 g <- 50 grid <- seq(0, 1, l = g) examples <- list( par = tfd(outer(1:n, 1:g, `+`), arg = grid), cross = tfd(outer(seq(-n/2, n/2, length = n), seq(-1, 1, length = g), `*`), arg = grid), cross2 = tfd(rbind(outer(seq(0, 1, l = n), grid, `+`), outer(seq(0, 1, l = n), rev(grid), `+`) ), arg = grid), #outer(seq(-n/2, n/2, length = n), seq(-1, 1, length = g), `*`) + matrix(seq(-n/2, n/2, length = n) * rep(c(1, -1), l = n), nrow = n, ncol = g), gp = tf_rgp(n, grid), gp_shift = tf_rgp(n, grid, scale = .05) + seq(0, 10, l = n), gp_shift_skew = tf_rgp(n, grid, scale = .05) + 10 * qchisq(seq(0.01, 0.99, length = n), df = 1), phase_shift1 = sapply(seq(.2, .8, l = n), function(x) dnorm(grid, mean = x, sd = .1)) %>% t %>% tfd(arg = grid), phase_shift2 = sapply(seq(.2, .8, l = n), function(x) dnorm(2 * grid - 1, mean = x, sd = .1)) %>% t %>% tfd(arg = 2 * grid - 1), phase_stretch = sapply(seq(1, 6, l = n), function(x) dnorm(pbeta(grid, 1, x), mean = .3, sd = .3)) %>% t %>% tfd(arg = grid), cca50 = dti_df$cca[1:100] %>% tfd(arg = seq(0, 1, l = 93)), rcst50 = dti_df$rcst[1:100] %>% tfd(arg = seq(0.23, 1, l = 93)), acti50_zoom500550 = chf_df$activity[1:50] %>% tf_zoom(500, 550), acti50_smooth = chf_df$activity[1:50] %>% tf_smooth() )
for (ex in names(examples)) { ex_data <- tibble(x = examples[[ex]], mei = mei_order(as.matrix(examples[[ex]])), mbd = tf_depth(x)) %>% mutate(pc1 = pc1(mei, mbd)) #ex_quart <- tibble(x = tf_quantile(examples[[ex]], c(0.25, 0.5, 0.75)), q = factor(c(0.25, 0.5, 0.75))) #ex_pc1 <- tibble(x = tf_quantile(examples[[ex]], c(0.25, 0.5, 0.75)), method = "pc1", q = factor(c(0.25, 0.5, 0.75))) p1 <- ggplot() + geom_spaghetti(data = ex_data, aes(y = x, col = mei)) + scale_color_gradient2(mid = "gray80", midpoint = 1/2, limits = c(0, 1)) + labs(title = ex, subtitle = paste0("MEI range:", paste(round(range(ex_data$mei), 3), collapse = "-"))) + theme_minimal() + theme(legend.position = "bottom") if(FALSE) { p2 <- ggplot() + geom_spaghetti(data = ex_data, aes(y = x), col = "grey80") + geom_spaghetti(data = ex_quart, aes(y = x, col = q), lwd = 1) + geom_spaghetti(data = ex_data %>% filter(mbd == max(mbd)), aes(y = x), lwd = 1, col = "black", alpha = .5) + scale_color_manual(values = c("cyan", "gold", "blue"), guide = guide_none()) + theme_minimal() + labs(subtitle = paste0("MEI quartiles, MBD median")) } p3 <- ggplot(ex_data) + geom_spaghetti(aes(y = x, col = mbd)) + scale_color_viridis_c() + theme_minimal() + theme(legend.position = "bottom") p4 <- ggplot(ex_data) + geom_point(aes(x = mbd, y = mei, col = pc1)) + scale_color_viridis_c() p5 <- ggplot() + geom_spaghetti(data = ex_data, aes(y = x, col = pc1)) + scale_color_viridis_c() + theme_minimal() + theme(legend.position = "bottom") p6 <- ggplot(ex_data) + geom_point(aes(y = mbd, x = pc1)) + scale_color_viridis_c() p7 <- ggplot(ex_data) + geom_point(aes(y = mei, x = pc1)) + scale_color_viridis_c() print((p1 + p4) / (p3 + p5) / (p6 + p7)) }
cross
, phase_shift1
desiderata for better f-quantiles: - symmetric (1-q/2, q/2)-quantiles should not cross (prob. impossible if data cross a lot and fun. quantiles are (very similar to) actual observed curves) - that
f-quantile by removing deepest curves
idea: for scalar data, if we remove the middle q% of data, the min/max of the remainder are (1-q/2, q/2) quantiles. So:
f-quantiles via function bands/envelopes
something different: use data-driven directions for spatial depth
IIUC, Serfling uses constant negative/positive functions as directions, which makes sense to induce an "above-below" ordering of functions. But that's maybe not a sensible ask for data with lots of crossings, dominant phase var, etc.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.